Software

软件
  • 文章类型: Journal Article
    生物学和生态学中动物声音的研究在很大程度上依赖于时频(TF)可视化,最常用的是短时傅里叶变换(STFT)谱图。这种方法,然而,对时间或频谱细节具有固有的偏见,可能导致对复杂动物声音的误解。理想的TF可视化应该在频率和时间方面准确地传达声音的结构,然而,STFT通常不能满足这一要求。我们评估了四种TF可视化方法的准确性(超级小变换[SLT],连续小波变换[CWT]和两个STFT)使用合成测试信号。然后我们应用这些方法来想象查戈斯蓝鲸的声音,亚洲象,南部食典区,东方鞭鸟,马洛韦鱼和美国鳄鱼。我们表明,SLT可视化测试信号的误差比其他方法小18.48%-28.08%。我们对动物声音的可视化与文献描述之间的比较表明,STFT的偏见可能在描述侏儒蓝鲸的歌声和大象的隆隆声时引起了误解。我们建议使用SLT可视化低频动物声音可以防止这种误解。最后,我们使用SLT来开发\'BASSA\',一个开源的,提供无代码的GUI软件应用程序,用户友好的工具,用于分析Windows平台的低频动物声音的短期记录。SLT以更高的精度可视化低频动物声音,以用户友好的格式,最大限度地减少误解的风险,同时需要比STFT更少的技术专长。使用这种方法可以推动声学驱动的动物交流研究的进展,声乐制作方法,发声和物种鉴定。
    The study of animal sounds in biology and ecology relies heavily upon time-frequency (TF) visualisation, most commonly using the short-time Fourier transform (STFT) spectrogram. This method, however, has inherent bias towards either temporal or spectral details that can lead to misinterpretation of complex animal sounds. An ideal TF visualisation should accurately convey the structure of the sound in terms of both frequency and time, however, the STFT often cannot meet this requirement. We evaluate the accuracy of four TF visualisation methods (superlet transform [SLT], continuous wavelet transform [CWT] and two STFTs) using a synthetic test signal. We then apply these methods to visualise sounds of the Chagos blue whale, Asian elephant, southern cassowary, eastern whipbird, mulloway fish and the American crocodile. We show that the SLT visualises the test signal with 18.48%-28.08% less error than the other methods. A comparison between our visualisations of animal sounds and their literature descriptions indicates that the STFT\'s bias may have caused misinterpretations in describing pygmy blue whale songs and elephant rumbles. We suggest that use of the SLT to visualise low-frequency animal sounds may prevent such misinterpretations. Finally, we employ the SLT to develop \'BASSA\', an open-source, GUI software application that offers a no-code, user-friendly tool for analysing short-duration recordings of low-frequency animal sounds for the Windows platform. The SLT visualises low-frequency animal sounds with improved accuracy, in a user-friendly format, minimising the risk of misinterpretation while requiring less technical expertise than the STFT. Using this method could propel advances in acoustics-driven studies of animal communication, vocal production methods, phonation and species identification.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    计算生物模型已被证明是理解和预测许多生物系统行为的宝贵工具。虽然对于有经验的研究人员来说,从头开始构建这样的模型可能不会太具有挑战性,对于早期研究人员来说,这不是一项简单的任务。设计模式是软件工程中广泛应用的众所周知的技术,因为它们为软件设计中的常见问题提供了一套典型的解决方案。在本文中,我们收集并讨论在构建和执行计算生物模型过程中通常使用的常见模式。我们采用Petri网作为建模语言,以提供每种模式的可视化说明;但是,本文提出的想法也可以使用其他建模形式来实现。为了说明的目的,我们提供了两个案例研究,并展示了如何从所呈现的较小模块中构建这些模型。我们希望本文讨论的想法将有助于许多研究人员建立自己的未来模型。
    Computational biological models have proven to be an invaluable tool for understanding and predicting the behaviour of many biological systems. While it may not be too challenging for experienced researchers to construct such models from scratch, it is not a straightforward task for early stage researchers. Design patterns are well-known techniques widely applied in software engineering as they provide a set of typical solutions to common problems in software design. In this paper, we collect and discuss common patterns that are usually used during the construction and execution of computational biological models. We adopt Petri nets as a modelling language to provide a visual illustration of each pattern; however, the ideas presented in this paper can also be implemented using other modelling formalisms. We provide two case studies for illustration purposes and show how these models can be built up from the presented smaller modules. We hope that the ideas discussed in this paper will help many researchers in building their own future models.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    ezSingleCell是一个交互式且易于使用的应用程序,用于分析各种单细胞和空间组学数据类型,而无需事先编程知识。它结合了性能最佳的公开可用方法进行深入的数据分析,一体化,和交互式数据可视化。ezSingleCell由五个模块组成,每个都被设计为一个数据类型或任务的综合工作流。此外,ezSingleCell允许统一接口内不同模块之间的串扰。可接受的输入数据可以是各种格式,而输出由发布就绪的数字和表格组成。深入的手册和视频教程可用于指导用户分析工作流程和参数调整,以适应他们的研究目标。ezSingleCell的流线型界面可以在不到五分钟的时间内分析3000个细胞的标准scRNA-seq数据集。ezSingleCell有两种形式:免安装的Web应用程序(https://immunesinglecell.org/ezsc/)或带有shinyApp界面的软件包(https://github.com/JinmiaoChenLab/ezSingleCell2),用于离线分析。
    ezSingleCell is an interactive and easy-to-use application for analysing various single-cell and spatial omics data types without requiring prior programing knowledge. It combines the best-performing publicly available methods for in-depth data analysis, integration, and interactive data visualization. ezSingleCell consists of five modules, each designed to be a comprehensive workflow for one data type or task. In addition, ezSingleCell allows crosstalk between different modules within a unified interface. Acceptable input data can be in a variety of formats while the output consists of publication ready figures and tables. In-depth manuals and video tutorials are available to guide users on the analysis workflows and parameter adjustments to suit their study aims. ezSingleCell\'s streamlined interface can analyse a standard scRNA-seq dataset of 3000 cells in less than five minutes. ezSingleCell is available in two forms: an installation-free web application ( https://immunesinglecell.org/ezsc/ ) or a software package with a shinyApp interface ( https://github.com/JinmiaoChenLab/ezSingleCell2 ) for offline analysis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    过去的几十年强调了研究杂交的重要性,特别是在灵长类物种中,因为它能让我们更好地了解自己的进化轨迹。这里,我们报告使用密集的遗传祖先估计,来自881Olive(Papioanubus)的全基因组数据,黄色(麻黄),或西南国家灵长类动物研究中心的橄榄黄色交叉圈养的狒狒。我们计算了全球和当地的祖先信息,估算低覆盖基因组(n=830)以提高标记质量,并更新了狒狒的遗传资源,以协助未来的研究。我们在一些假定的纯种动物中发现了历史混合物的证据,并在西南国家灵长类动物研究中心的谱系中发现了错误。我们还比较了两个不同的相位和归集管道以及两个不同的全球祖先估计软件之间的输出。全球血统估计软件之间有很好的一致性,R2>0.88,而相位切换误差的证据根据使用的相位和归因管道而增加。我们还生成了更新的基因图谱,并创建了一套简洁的祖先信息标记(n=1,747),以准确获得全球祖先估计。
    The last couple of decades have highlighted the importance of studying hybridization, particularly among primate species, as it allows us to better understand our own evolutionary trajectory. Here, we report on genetic ancestry estimates using dense, full genome data from 881 olive (Papio anubus), yellow (Papio cynocephalus), or olive-yellow crossed captive baboons from the Southwest National Primate Research Center. We calculated global and local ancestry information, imputed low coverage genomes (n = 830) to improve marker quality, and updated the genetic resources of baboons available to assist future studies. We found evidence of historical admixture in some putatively purebred animals and identified errors within the Southwest National Primate Research Center pedigree. We also compared the outputs between two different phasing and imputation pipelines along with two different global ancestry estimation software. There was good agreement between the global ancestry estimation software, with R2 > 0.88, while evidence of phase switch errors increased depending on what phasing and imputation pipeline was used. We also generated updated genetic maps and created a concise set of ancestry informative markers (n = 1,747) to accurately obtain global ancestry estimates.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    只有验证了所使用的统计方法的基本假设,数据分析才能准确可靠。任何违反这些假设的行为都可能改变分析的结果和结论。在这项研究中,我们开发了智能数据分析V2(SDA-V2),一个交互式和用户友好的Web应用程序,协助统计知识有限的用户进行数据分析,它可以在https://jularatchumnaul自由访问。shinyapps.io/SDA-V2/.SDA-V2自动探索和可视化数据,检查与参数检验相关的基本假设,并为给定数据选择合适的统计方法。此外,SDA-V2可以评估研究仪器的质量,并确定有意义研究所需的最小样本量。然而,虽然SDA-V2是简化统计分析的有价值的工具,它并不能取代对统计原理的基本理解。鼓励研究人员将他们的专业知识与软件的能力相结合,以实现最准确和可信的结果。
    Data analysis can be accurate and reliable only if the underlying assumptions of the used statistical method are validated. Any violations of these assumptions can change the outcomes and conclusions of the analysis. In this study, we developed Smart Data Analysis V2 (SDA-V2), an interactive and user-friendly web application, to assist users with limited statistical knowledge in data analysis, and it can be freely accessed at https://jularatchumnaul.shinyapps.io/SDA-V2/. SDA-V2 automatically explores and visualizes data, examines the underlying assumptions associated with the parametric test, and selects an appropriate statistical method for the given data. Furthermore, SDA-V2 can assess the quality of research instruments and determine the minimum sample size required for a meaningful study. However, while SDA-V2 is a valuable tool for simplifying statistical analysis, it does not replace the need for a fundamental understanding of statistical principles. Researchers are encouraged to combine their expertise with the software\'s capabilities to achieve the most accurate and credible results.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    力场X(FFX)是一个开源软件包,用于遗传变异和有机晶体的原子分辨率建模,利用先进的势能函数和实验数据。FFX目前由九个模块化软件包组成,这些软件包具有新颖的算法,包括通过多体扩展进行全局优化,使用可极化恒定pH分子动力学的酸碱化学,自由能差的估计,广义柯克伍德隐式溶剂模型,还有更多。FFX的应用重点是晶体结构预测管道的使用和开发,针对实验数据集的生物分子结构改进,以及估计遗传变异对蛋白质和核酸的热力学影响。并行Java和OpenMM的使用结合提供共享内存,消息传递,和图形处理单元并行化,以实现高性能仿真。总的来说,FFX平台用作计算显微镜,研究从有机晶体到溶剂化生物分子系统的系统。
    Force Field X (FFX) is an open-source software package for atomic resolution modeling of genetic variants and organic crystals that leverages advanced potential energy functions and experimental data. FFX currently consists of nine modular packages with novel algorithms that include global optimization via a many-body expansion, acid-base chemistry using polarizable constant-pH molecular dynamics, estimation of free energy differences, generalized Kirkwood implicit solvent models, and many more. Applications of FFX focus on the use and development of a crystal structure prediction pipeline, biomolecular structure refinement against experimental datasets, and estimation of the thermodynamic effects of genetic variants on both proteins and nucleic acids. The use of Parallel Java and OpenMM combines to offer shared memory, message passing, and graphics processing unit parallelization for high performance simulations. Overall, the FFX platform serves as a computational microscope to study systems ranging from organic crystals to solvated biomolecular systems.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    手稿“单位细胞建模:使用生物分子MD模拟平台Amber的晶体学细化程序”提出了一种新颖的蛋白质结构细化方法,声称可以对Refmac5和Phenix等传统技术进行改进。我们的重新评估表明,虽然新方法提供了改进,传统方法以更少的计算量获得可比的结果。
    The manuscript `Modeling a unit cell: crystallographic refinement procedure using the biomolecular MD simulation platform Amber\' presents a novel protein structure refinement method claimed to offer improvements over traditional techniques like Refmac5 and Phenix. Our re-evaluation suggests that while the new method provides improvements, traditional methods achieve comparable results with less computational effort.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    RNA-seq带来了关于RNA加工中的畸变的重大发现,这些RNA变异与多种疾病有关。RNA中的异常剪接和单核苷酸变体(SNV)已被证明可以改变转录物的稳定性,本地化,和功能。特别是,ADAR的上调,一种介导腺苷到肌苷编辑的酶,先前已与肺腺癌细胞的侵袭性增加有关,并与剪接调节有关。尽管研究剪接和SNV的功能重要性,短读RNA-seq的使用限制了社区同时询问两种形式的RNA变异的能力。
    我们采用长读测序技术来获得全长转录物序列,在单分子水平上阐明变体对剪接变化的顺式效应。我们开发了一个计算工作流程来增强FLAIR,调用以长读数据表示的同工型模型的工具,将RNA变体调用与携带它们的相关同种型整合在一起。我们从具有和不具有ADAR敲低的H1975肺腺癌细胞产生具有高序列准确性的纳米孔数据。我们应用我们的工作流程来确定关键的肌苷同工型关联,以帮助阐明ADAR在肿瘤发生中的重要性。
    最终,我们发现长篇阅读方法为表征RNA变体和剪接模式之间的关系提供了有价值的见解。
    RNA-seq has brought forth significant discoveries regarding aberrations in RNA processing, implicating these RNA variants in a variety of diseases. Aberrant splicing and single nucleotide variants (SNVs) in RNA have been demonstrated to alter transcript stability, localization, and function. In particular, the upregulation of ADAR, an enzyme that mediates adenosine-to-inosine editing, has been previously linked to an increase in the invasiveness of lung adenocarcinoma cells and associated with splicing regulation. Despite the functional importance of studying splicing and SNVs, the use of short-read RNA-seq has limited the community\'s ability to interrogate both forms of RNA variation simultaneously.
    We employ long-read sequencing technology to obtain full-length transcript sequences, elucidating cis-effects of variants on splicing changes at a single molecule level. We develop a computational workflow that augments FLAIR, a tool that calls isoform models expressed in long-read data, to integrate RNA variant calls with the associated isoforms that bear them. We generate nanopore data with high sequence accuracy from H1975 lung adenocarcinoma cells with and without knockdown of ADAR. We apply our workflow to identify key inosine isoform associations to help clarify the prominence of ADAR in tumorigenesis.
    Ultimately, we find that a long-read approach provides valuable insight toward characterizing the relationship between RNA variants and splicing patterns.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:真菌在几个重要的生态功能中起着关键作用,从有机物分解到与植物的共生关系。此外,真菌自然地栖息在人体中,当作为益生菌给药时可能是有益的。在真菌学中,采用内部转录间隔区(ITS)作为真菌分类的通用标记。因此,一种准确而稳健的ITS分类方法不仅是为了更好的多样性估计,但它也可以帮助我们更深入地了解环境群落的动态,并最终理解某些物种的丰度是否与健康和疾病相关。尽管已经提出了许多分类方法,据我们所知,在建立模型时,他们都没有充分探索分类树的层次结构。这反过来,导致较低的泛化能力和较高的分类错误风险。
    结果:这里我们介绍HiTaC,一个强大的分层机器学习模型,用于准确的ITS分类,这需要少量的数据进行训练,并且可以处理不平衡的数据集。使用已建立的TAXXI基准对HiTaC进行了彻底评估,并且可以正确地分类不同长度的真菌ITS序列以及训练和测试数据之间的一系列身份差异。HiTaC在对嘈杂数据进行训练时优于最先进的方法,在不同的分类等级中始终实现更高的F1分数和灵敏度,在TAXXI上最嘈杂的数据集中,灵敏度比顶级方法提高6.9个百分点。
    结论:HiTaC在Python包索引中公开可用,BIOCONDA和DockerHub。它是在新的BSD许可证下发布的,允许在学术界和工业界免费使用。源代码和文档,其中包括安装和使用说明,可以在https://gitlab.com/dacs-hpi/hitac上找到。
    BACKGROUND: Fungi play a key role in several important ecological functions, ranging from organic matter decomposition to symbiotic associations with plants. Moreover, fungi naturally inhabit the human body and can be beneficial when administered as probiotics. In mycology, the internal transcribed spacer (ITS) region was adopted as the universal marker for classifying fungi. Hence, an accurate and robust method for ITS classification is not only desired for the purpose of better diversity estimation, but it can also help us gain a deeper insight into the dynamics of environmental communities and ultimately comprehend whether the abundance of certain species correlate with health and disease. Although many methods have been proposed for taxonomic classification, to the best of our knowledge, none of them fully explore the taxonomic tree hierarchy when building their models. This in turn, leads to lower generalization power and higher risk of committing classification errors.
    RESULTS: Here we introduce HiTaC, a robust hierarchical machine learning model for accurate ITS classification, which requires a small amount of data for training and can handle imbalanced datasets. HiTaC was thoroughly evaluated with the established TAXXI benchmark and could correctly classify fungal ITS sequences of varying lengths and a range of identity differences between the training and test data. HiTaC outperforms state-of-the-art methods when trained over noisy data, consistently achieving higher F1-score and sensitivity across different taxonomic ranks, improving sensitivity by 6.9 percentage points over top methods in the most noisy dataset available on TAXXI.
    CONCLUSIONS: HiTaC is publicly available at the Python package index, BIOCONDA and Docker Hub. It is released under the new BSD license, allowing free use in academia and industry. Source code and documentation, which includes installation and usage instructions, are available at https://gitlab.com/dacs-hpi/hitac .
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    基因组组装的最新进展极大地改善了转座因子(TE)的综合注释的前景。然而,使用基因组组装进行TE注释的现有方法的准确性和鲁棒性有限,需要大量的手动编辑。此外,当前可用的黄金标准TE数据库并不全面,即使是广泛研究的物种,强调了对自动TE检测方法的迫切需要,以补充现有的存储库。在这项研究中,我们介绍HITE,一种快速准确的动态边界调整方法,旨在检测全长TEs。实验结果表明,HiTE优于最先进的工具RepeatModeler2,跨越各种物种。此外,HiTE已经鉴定了许多新的转座子,这些转座子具有明确的结构,含有蛋白质编码域,其中一些直接插入关键基因中,导致基因表达的直接改变。一个Nextflow版本的HiTE也可用,具有增强的并行性,再现性,和便携性。
    Recent advancements in genome assembly have greatly improved the prospects for comprehensive annotation of Transposable Elements (TEs). However, existing methods for TE annotation using genome assemblies suffer from limited accuracy and robustness, requiring extensive manual editing. In addition, the currently available gold-standard TE databases are not comprehensive, even for extensively studied species, highlighting the critical need for an automated TE detection method to supplement existing repositories. In this study, we introduce HiTE, a fast and accurate dynamic boundary adjustment approach designed to detect full-length TEs. The experimental results demonstrate that HiTE outperforms RepeatModeler2, the state-of-the-art tool, across various species. Furthermore, HiTE has identified numerous novel transposons with well-defined structures containing protein-coding domains, some of which are directly inserted within crucial genes, leading to direct alterations in gene expression. A Nextflow version of HiTE is also available, with enhanced parallelism, reproducibility, and portability.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号