bioinformatics tool

  • 文章类型: Journal Article
    由于高通量基因组和蛋白质组测序方法,生物序列数据库的快速扩展留下了相当数量的已鉴定蛋白质序列,具有不清楚或不完整的功能注释。未知功能域(DUF)是缺乏功能注释但存在于许多蛋白质中的蛋白质域。为了解决为DUF找到功能注释的挑战,我们开发了一种计算方法,通过利用位置特异性迭代基本局部比对搜索工具(PSI-BLAST)和数据挖掘技术,可以有效地识别和注释这些神秘的蛋白质结构域。我们的管道确定了DUF的假定潜在功能,从而减少已知序列和功能之间的差距。该工具还可以采用用户输入序列来注释。我们在从Pfam获得的5111个独特的DUF序列上执行了我们的管道,产生了2007年的推定注释。这些注释随后被合并到一个综合数据库中,并与名为“AnnoDUF”的基于Web的服务器连接。AnnoDUF可免费访问学术和工业用户,通过万维网链接http://bts。ibab.AC.in/annoduf.php。本研究中使用的所有脚本都上传到GitHub存储库,这些可以从https://github.com/BioToolSuite/AnnoDUF访问。
    The rapid expansion of biological sequence databases due to high-throughput genomic and proteomic sequencing methods has left a considerable number of identified protein sequences with unclear or incomplete functional annotations. Domains of unknown function (DUFs) are protein domains that lack functional annotations but are present in numerous proteins. To address the challenge of finding functional annotations for DUFs, we have developed a computational method that efficiently identifies and annotates these enigmatic protein domains by utilizing the position-specific iterative basic local alignment search tool (PSI-BLAST) and data mining techniques. Our pipeline identifies putative potential functionalities of DUFs, thereby decreasing the gap between known sequences and functions. The tool can also take user input sequences to annotate. We executed our pipeline on 5111 unique DUF sequences obtained from Pfam, resulting in putative annotations for 2007 of these. These annotations were subsequently incorporated into a comprehensive database and interfaced with a web-based server named \"AnnoDUF\". AnnoDUF is freely accessible to both academic and industrial users, via the World Wide Web at the link http://bts.ibab.ac.in/annoduf.php. All scripts used in this study are uploaded to the GitHub repository, and these can be accessed from https://github.com/BioToolSuite/AnnoDUF.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    基因表达谱技术彻底改变了细胞生物学,使研究人员能够识别与黑色素瘤的各种生物学属性相关的基因特征,如色素沉着状态,分化状态,增殖能力与侵袭能力,和疾病进展。尽管基因特征的发现显著增强了我们对黑素细胞表型的理解,协调独立研究和不同分析平台报告的众多签名仍然是一个挑战。目前用于分类黑素细胞基因特征的方法取决于确切的基因重叠和与未标准化的基线转录组的比较。在这项研究中,我们的目的是根据临床皮肤黑素瘤标本中相似的表达模式,将已发表的基因特征分类成簇.我们分析了来自6个基因表达库的近800个黑色素瘤样本,并开发了一个基因标签分类框架,该框架可抵抗跨分析平台的基因识别偏差和基线标准的不一致。使用39个经常引用的已发表基因签名,我们的分析揭示了与先前鉴定的表型相关的七类主要基因特征:分化,有丝分裂/MYC,AXL,失色症,神经,低代谢,和入侵。每个类别都与组成基因签名所代表的表型一致,并且我们的分类方法不依赖于签名之间的重叠基因。为了促进更广泛的应用,我们创建了WIMMS(什么是我的黑素细胞签名,可在https://wimms。tanlab.org/),一个用户友好的Web应用程序。WIMMS允许用户对任何基因签名进行分类,确定其与主要引用的签名的关系及其在七个主要类别中的代表性。
    Gene expression profiling technologies have revolutionized cell biology, enabling researchers to identify gene signatures linked to various biological attributes of melanomas, such as pigmentation status, differentiation state, proliferative versus invasive capacity, and disease progression. Although the discovery of gene signatures has significantly enhanced our understanding of melanocytic phenotypes, reconciling the numerous signatures reported across independent studies and different profiling platforms remains a challenge. Current methods for classifying melanocytic gene signatures depend on exact gene overlap and comparison with unstandardized baseline transcriptomes. In this study, we aimed to categorize published gene signatures into clusters based on their similar patterns of expression across clinical cutaneous melanoma specimens. We analyzed nearly 800 melanoma samples from six gene expression repositories and developed a classification framework for gene signatures that is resilient against biases in gene identification across profiling platforms and inconsistencies in baseline standards. Using 39 frequently cited published gene signatures, our analysis revealed seven principal classes of gene signatures that correlate with previously identified phenotypes: Differentiated, Mitotic/MYC, AXL, Amelanotic, Neuro, Hypometabolic, and Invasive. Each class is consistent with the phenotypes that the constituent gene signatures represent, and our classification method does not rely on overlapping genes between signatures. To facilitate broader application, we created WIMMS (what is my melanocytic signature, available at https://wimms.tanlab.org/), a user-friendly web application. WIMMS allows users to categorize any gene signature, determining its relationship to predominantly cited signatures and its representation within the seven principal classes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    结论:Mfind是分析微卫星存在对DNA条形码特异性影响的工具。我们发现被子植物中条形码熵与微卫星计数之间存在显着相关性。遗传条形码和微卫星是分类学和生物多样性研究中的一些鉴定方法。重要的是在微卫星定量与条形码中的遗传信息之间建立关系。为了阐明条形码中的遗传信息之间的关联(表示为香农的信息度量,SMI)和微卫星计数,分析了来自BOLD数据库(生命数据系统的条形码)的总共330,809个DNA条形码。开发了一种并行滑动窗口算法来计算条形码的香农熵,并将其与微卫星的量化进行了比较,如(AT)n,(AC)n,和(AG)n。微卫星搜索方法利用Java编程语言开发的算法,系统地检查了被子植物数据库中的遗传条形码。为此,开发了一个名为Find的计算工具,它的搜索方法是详细的。这项全面的研究揭示了条形码中的微卫星的广泛概述,揭示了微卫星计数和条形码信息的总和之间的逆相关性。Mfind工具的使用表明,当将熵视为度量时,微卫星的存在会影响条形码信息。这种影响可能归因于DNA条形码的简洁长度和微卫星的重复性,从而对条形码的熵产生直接影响。
    CONCLUSIONS: Mfind is a tool to analyze the impact of microsatellite presence on DNA barcode specificity. We found a significant correlation between barcode entropy and microsatellite count in angiosperm. Genetic barcodes and microsatellites are some of the identification methods in taxonomy and biodiversity research. It is important to establish a relationship between microsatellite quantification and genetic information in barcodes. In order to clarify the association between the genetic information in barcodes (expressed as Shannon\'s Measure of Information, SMI) and microsatellites count, a total of 330,809 DNA barcodes from the BOLD database (Barcode of Life Data System) were analyzed. A parallel sliding-window algorithm was developed to compute the Shannon entropy of the barcodes, and this was compared with the quantification of microsatellites like (AT)n, (AC)n, and (AG)n. The microsatellite search method utilized an algorithm developed in the Java programming language, which systematically examined the genetic barcodes from an angiosperm database. For this purpose, a computational tool named Mfind was developed, and its search methodology is detailed. This comprehensive study revealed a broad overview of microsatellites within barcodes, unveiling an inverse correlation between the sumz of microsatellites count and barcodes information. The utilization of the Mfind tool demonstrated that the presence of microsatellites impacts the barcode information when considering entropy as a metric. This effect might be attributed to the concise length of DNA barcodes and the repetitive nature of microsatellites, resulting in a direct influence on the entropy of the barcodes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    随着可用基因组信息的爆炸式增长,比较基因组学已成为理解微生物生态学和进化的主要方法。我们开发了DiGAlign(https://www.基因组。jp/digalign/),一个Web服务器,它提供了一个直观的界面比较基因组学的多功能功能。它允许用户通过简单地上传感兴趣的核苷酸序列来执行高度可定制的同位图可视化,从特定区域到微生物和病毒的全基因组景观。DiGAlign将为广泛的生物研究人员提供服务,特别是实验生物学家,具有多方面的特征,可以快速表征感兴趣的基因组序列并生成可发表的图。
    With the explosion of available genomic information, comparative genomics has become a central approach to understanding microbial ecology and evolution. We developed DiGAlign (https://www.genome.jp/digalign/), a web server that provides versatile functionality for comparative genomics with an intuitive interface. It allows the user to perform the highly customizable visualization of a synteny map by simply uploading nucleotide sequences of interest, ranging from a specific region to the whole genome landscape of microorganisms and viruses. DiGAlign will serve a wide range of biological researchers, particularly experimental biologists, with multifaceted features that allow the rapid characterization of genomic sequences of interest and the generation of a publication-ready figure.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    宏观单倍型结合了多种类型的分阶段DNA变异,增加法医鉴别力。高质量的长测序读数,例如,PacBioHiFi阅读,提供数据来检测多倍体和DNA混合物中的大型单倍型。然而,缺乏检测大型单倍型的生物信息学工具。在这项研究中,我们开发了一个生物信息学软件,MacroHapCaller,其中靶向基因座(即,短TRs[STR],单核苷酸多态性,以及插入和缺失)进行基因分型,并与新颖的算法结合以从长读数中调用宏观单倍型。MacroHapCaller使用物理阶段(即,read-backedphasing)toidentifymacrohapliptype,因此它可以检测给定样品的多等位基因大型单倍型。MacroHapCaller通过我们设计的靶向PacBioHiFi测序管道生成的数据进行了验证,在人类基准样品HG002和HG003中测序了有20个核心法医STR基因座的8kb扩增子区域。MacroHapCaller也在全基因组长读数测序数据中得到验证。与已知的基本事实相比,使用MacroHapCaller获得了可靠,准确的基因分型和阶段性的大型单倍型。与现有工具HipSTR和DeepVar相比,MacroHapCaller实现了更高或一致的基因分型准确性和更快的速度。MacroHapCaller能够从高通量测序数据中进行有效的宏观单倍型分析,并支持使用区分宏观单倍型的应用。
    Macrohaplotype combines multiple types of phased DNA variants, increasing forensic discrimination power. High-quality long-sequencing reads, for example, PacBio HiFi reads, provide data to detect macrohaplotypes in multiploidy and DNA mixtures. However, the bioinformatics tools for detecting macrohaplotypes are lacking. In this study, we developed a bioinformatics software, MacroHapCaller, in which targeted loci (i.e., short TRs [STRs], single nucleotide polymorphisms, and insertion and deletions) are genotyped and combined with novel algorithms to call macrohaplotypes from long reads. MacroHapCaller uses physical phasing (i.e., read-backed phasing) to identify macrohaplotypes, and thus it can detect multi-allelic macrohaplotypes for a given sample. MacroHapCaller was validated with data generated from our designed targeted PacBio HiFi sequencing pipeline, which sequenced ∼8-kb amplicon regions harboring 20 core forensic STR loci in human benchmark samples HG002 and HG003. MacroHapCaller also was validated in whole-genome long-read sequencing data. Robust and accurate genotyping and phased macrohaplotypes were obtained with MacroHapCaller compared with the known ground truth. MacroHapCaller achieved a higher or consistent genotyping accuracy and faster speed than existing tools HipSTR and DeepVar. MacroHapCaller enables efficient macrohaplotype analysis from high-throughput sequencing data and supports applications using discriminating macrohaplotypes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    检测BRCA1和BRCA2基因中的拷贝数变异(CNVs)和改变(CNAs)对于测试患者的靶向治疗适用性是必不可少的。然而,可用的生物信息学工具最初设计用于鉴定全基因组或外显子组(WES)NGS数据或靶向NGS数据中的CNVs/CNA,而不适应BRCA1/2基因.这些工具中的大多数都是在有限大小的样本队列中进行测试的,其用途仅限于特定的文库制备试剂盒或测序平台。我们开发了BRACNAC,在不同来源的NGS数据中检测BRCA1和BRCA2基因中CNVs和CNAs的新工具。该工具的基本机制涉及由CNV概率评估补充的各种覆盖归一化步骤。我们估计我们的工具的灵敏度和特异性为100%和94%,分别,曲线下面积(AUC)为94%。使用从用内部和市售文库制备试剂盒测试的213个卵巢癌和前列腺癌样品获得的NGS数据以及另外使用多重连接依赖性探针扩增(MLPA)(12个CNV阳性样品)进行估计。使用免费提供的WES和其他研究小组的目标NGS数据,我们证明了BRACNAC也可以用于这两种类型的数据,AUC高达99.9%。此外,我们根据每次NGS运行的最小样本数(≥20个样本)和CNV阴性样本的最小预期百分比(≥80%)确定了该工具的局限性.我们预计我们的发现将提高BRCA1/2诊断的有效性。BRACNAC在GitHub服务器上免费提供。
    Detecting copy number variations (CNVs) and alterations (CNAs) in the BRCA1 and BRCA2 genes is essential for testing patients for targeted therapy applicability. However, the available bioinformatics tools were initially designed for identifying CNVs/CNAs in whole-genome or -exome (WES) NGS data or targeted NGS data without adaptation to the BRCA1/2 genes. Most of these tools were tested on sample cohorts of limited size, with their use restricted to specific library preparation kits or sequencing platforms. We developed BRACNAC, a new tool for detecting CNVs and CNAs in the BRCA1 and BRCA2 genes in NGS data of different origin. The underlying mechanism of this tool involves various coverage normalization steps complemented by CNV probability evaluation. We estimated the sensitivity and specificity of our tool to be 100% and 94%, respectively, with an area under the curve (AUC) of 94%. The estimation was performed using the NGS data obtained from 213 ovarian and prostate cancer samples tested with in-house and commercially available library preparation kits and additionally using multiplex ligation-dependent probe amplification (MLPA) (12 CNV-positive samples). Using freely available WES and targeted NGS data from other research groups, we demonstrated that BRACNAC could also be used for these two types of data, with an AUC of up to 99.9%. In addition, we determined the limitations of the tool in terms of the minimum number of samples per NGS run (≥20 samples) and the minimum expected percentage of CNV-negative samples (≥80%). We expect that our findings will improve the efficacy of BRCA1/2 diagnostics. BRACNAC is freely available at the GitHub server.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    全基因组关联研究(GWAS)已被广泛用于鉴定与复杂性状相关的遗传变异。尽管它的成功和受欢迎,传统的GWAS方法具有各种局限性。出于这个原因,已经开发了更新的GWAS方法,包括使用泛基因组而不是参考基因组,以及利用单核苷酸多态性以外的标记,如结构变异和k聚体。近年来,基于k-mers的GWAS方法尤其受到研究人员的关注。然而,这些新的方法可能是复杂和具有挑战性的实施。这里,我们介绍kGWASflow,一个模块化的,用户友好,和可扩展的工作流,以使用k-mer执行GWAS。我们使用Snakemake和Conda等管理工具,将现有的kmersGWAS方法引入到更容易,更易于访问的工作流程中,并消除了因缺少依赖关系和版本冲突而带来的挑战。kGWASflow通过使用Snakemake自动化每个步骤并使用Docker等容器化工具来增加kmersGWAS方法的可重复性。工作流程包括补充组件,如质量控制,读取修整程序,并生成汇总统计数据。kGWASflow还提供了GWAS后分析选项,以确定性状相关k-mer的基因组位置和背景。kGWASflow可以应用于任何生物体,并且需要最少的编程技能。在GitHub(https://github.com/akcorut/kGWASflow)和Bioconda(https://anaconda.org/bioconda/kgwasflow)上免费提供kGWASflow。
    Genome-wide association studies (GWAS) have been widely used to identify genetic variation associated with complex traits. Despite its success and popularity, the traditional GWAS approach comes with a variety of limitations. For this reason, newer methods for GWAS have been developed, including the use of pan-genomes instead of a reference genome and the utilization of markers beyond single-nucleotide polymorphisms, such as structural variations and k-mers. The k-mers-based GWAS approach has especially gained attention from researchers in recent years. However, these new methodologies can be complicated and challenging to implement. Here, we present kGWASflow, a modular, user-friendly, and scalable workflow to perform GWAS using k-mers. We adopted an existing kmersGWAS method into an easier and more accessible workflow using management tools like Snakemake and Conda and eliminated the challenges caused by missing dependencies and version conflicts. kGWASflow increases the reproducibility of the kmersGWAS method by automating each step with Snakemake and using containerization tools like Docker. The workflow encompasses supplemental components such as quality control, read-trimming procedures, and generating summary statistics. kGWASflow also offers post-GWAS analysis options to identify the genomic location and context of trait-associated k-mers. kGWASflow can be applied to any organism and requires minimal programming skills. kGWASflow is freely available on GitHub (https://github.com/akcorut/kGWASflow) and Bioconda (https://anaconda.org/bioconda/kgwasflow).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    生物信息学在对抗由严重急性呼吸道综合症冠状病毒2(SARS-CoV-2)引起的2019年冠状病毒病(COVID-19)大流行的科学进步中一直发挥着至关重要的作用。新算法的进步,巨型数据技术,人工智能和深度学习帮助开发新的生物信息学工具来分析过去几年中每天增加的SARS-CoV-2数据。这些工具被应用于基因组分析,进化追踪,流行病学分析,蛋白质结构解释,病毒-宿主相互作用和临床表现的研究。为了促进未来的计算机分析,我们对数据库进行了总结,SARS-CoV-2研究中应用的网络服务和软件。这些用于SARS-CoV-2研究的数字资源也可能有助于其他冠状病毒和非冠状病毒病毒的研究。
    Bioinformatics has been playing a crucial role in the scientific progress to fight against the pandemic of the coronavirus disease 2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The advances in novel algorithms, mega data technology, artificial intelligence and deep learning assisted the development of novel bioinformatics tools to analyze daily increasing SARS-CoV-2 data in the past years. These tools were applied in genomic analyses, evolutionary tracking, epidemiological analyses, protein structure interpretation, studies in virus-host interaction and clinical performance. To promote the in-silico analysis in the future, we conducted a review which summarized the databases, web services and software applied in SARS-CoV-2 research. Those digital resources applied in SARS-CoV-2 research may also potentially contribute to the research in other coronavirus and non-coronavirus viruses.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:视觉序列徽标一直是生物信息学工具开发中的热门领域。用R语言编写的ggseqlogo自发布以来一直是最受欢迎的API。随着人工智能和深度学习的普及,Python是目前最流行的编程语言。生物信息学家使用的编程语言开始转向Python。在Python中提供类似于R中的API可以降低重新学习编程语言的学习成本。与R中的ggplot2相比,绘图框架在Python中不那么容易使用。plotnine(Python版本中的ggplot2)的出现使得在R和Python之间统一生物信息学可视化工具的编程方法成为可能。
    结果:这里,我们介绍plotnineSeqSuite,一个新的基于plotnine的Python包提供了一个类似ggseqlogo的API,用于编程绘制序列徽标,序列比对图和序列直方图。更准确地说,它支持自定义字母,颜色主题,和字体。此外,绘图层的类基于面向对象的设计,因此用户可以轻松地对其进行封装和扩展。
    结论:plotnineSeqSuite是第一个在Python中实现序列相关图形可视化的ggplot2风格的包。它增强了R和Python之间编程绘图的一致性。与已经出现的工具相比,plotnineSeqSuite支持的类别更加完整。plotnineSeqSuite的源代码可以在GitHub(https://github.com/caotianze/plotnineseqsuite)和PyPI(https://pypi.org/project/plotnineseqsuite)上获得,和文档主页在GitHub上免费提供(https://caotianze。github.io/plotnineseqsuite/)。
    BACKGROUND: The visual sequence logo has been a hot area in the development of bioinformatics tools. ggseqlogo written in R language has been the most popular API since it was published. With the popularity of artificial intelligence and deep learning, Python is currently the most popular programming language. The programming language used by bioinformaticians began to shift to Python. Providing APIs in Python that are similar to those in R can reduce the learning cost of relearning a programming language. And compared to ggplot2 in R, drawing framework is not as easy to use in Python. The appearance of plotnine (ggplot2 in Python version) makes it possible to unify the programming methods of bioinformatics visualization tools between R and Python.
    RESULTS: Here, we introduce plotnineSeqSuite, a new plotnine-based Python package provides a ggseqlogo-like API for programmatic drawing of sequence logos, sequence alignment diagrams and sequence histograms. To be more precise, it supports custom letters, color themes, and fonts. Moreover, the class for drawing layers is based on object-oriented design so that users can easily encapsulate and extend it.
    CONCLUSIONS: plotnineSeqSuite is the first ggplot2-style package to implement visualization of sequence -related graphs in Python. It enhances the uniformity of programmatic plotting between R and Python. Compared with tools appeared already, the categories supported by plotnineSeqSuite are much more complete. The source code of plotnineSeqSuite can be obtained on GitHub ( https://github.com/caotianze/plotnineseqsuite ) and PyPI ( https://pypi.org/project/plotnineseqsuite ), and the documentation homepage is freely available on GitHub at ( https://caotianze.github.io/plotnineseqsuite/ ).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    粘多糖VI(MaroteauxLamy综合征)是由于ARSB基因突变引起的N-乙酰半乳糖胺-4-硫酸酯酶的酶活性丧失而引起的代谢紊乱。突变的ARSB是溶酶体内GAG积累的起源,导致严重的生长畸形,引起溶酶体贮积病.这项研究的主要重点是通过应用生物信息学工具来预测保守性来识别有害变异,致病性,稳定性,和ARSB变体的影响。我们检查了170个错义变体,其中G137V和G144R是预测对疾病进展有害的所得变体。天然结构与G137V和G144R一起被固定为受体,并与小分子Odiparcil进行分子对接,以分析结合效率和受体对药物的不同相互作用。相互作用导致相似的对接评分-7.3kcal/mol,表明药物与天然的CYS117,GLN118,THR182和GLN517残基的有效结合和一致的相互作用,以及G137V和G144R结构。进行分子动力学以验证天然和变体结构对配体结合的稳定性和灵活性。总体研究表明,基于较高的结合亲和力,药物对天然和变体具有类似的治疗性,并且复合物也显示出具有平均0.2nmRMS值的稳定性。这可以帮助MaroteauxLamy综合征的未来发展疗法。
    Mucopolysaccharidoses VI (Maroteaux Lamy syndrome) is a metabolic disorder due to the loss of enzyme activity of N-acetyl galactosamine-4-sulphatase arising from mutations in the ARSB gene. The mutated ARSB is the origin for the accumulation of GAGs within the lysosome leading to severe growth deformities, causing lysosomal storage disease. The main focus of this study is to identify the deleterious variants by applying bioinformatics tools to predict the conservation, pathogenicity, stability, and effect of the ARSB variants. We examined 170 missense variants, of which G137V and G144R were the resultant variants predicted detrimental to the progression of the disease. The native along with G137V and G144R structures were fixed as the receptors and subjected to Molecular docking with the small molecule Odiparcil to analyze the binding efficiency and the varied interactions of the receptors towards the drug. The interaction resulted in similar docking scores of - 7.3 kcal/mol indicating effective binding and consistent interactions of the drug with residues CYS117, GLN118, THR182, and GLN517 for native, along with G137V and G144R structures. Molecular Dynamics were conducted to validate the stability and flexibility of the native and variant structures on ligand binding. The overall study indicates that the drug has similar therapeutic towards the native and variant based on the higher binding affinity and also the complexes show stability with an average of 0.2 nm RMS value. This can aid in the future development therapeutics for the Maroteaux Lamy syndrome.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号