Software libraries

  • 文章类型: Journal Article
    背景:随着测序技术的进步,基因组数据的激增,对用于序列分析的高效生物信息学工具的需求已经变得至关重要。类似BLAST的对齐工具(BLAT),序列比对工具,在性能效率和与现代编程环境的集成方面面临限制,尤其是Python。本研究介绍了PxBLAT,一个基于Python的框架,旨在增强BLAT的功能,专注于可用性,计算效率,和Python生态系统中的无缝集成。
    结果:PxBLAT在执行速度和数据处理方面明显优于BLAT,在50至600个样本的不同样本组中进行的综合基准证明了这一点。这些实验突出了显著的加速,与BLAT相比,减少了执行时间。该框架还引入了用户友好的功能,例如改进的服务器管理,数据转换实用程序,和shell完成,提升整体用户体验。此外,提供广泛的文档和全面的测试支持社区参与并促进PxBLAT的采用。
    结论:PxBLAT作为BLAT的强大替代品脱颖而出,提供性能和用户交互增强功能。它的发展强调了现代编程语言改进生物信息学工具的潜力,符合当代基因组研究的需要。通过提供更有效的,用户友好的工具,PxBLAT有可能影响基因组数据分析工作流程,在Python环境中支持更快、更准确的序列分析。
    BACKGROUND: With the surge in genomic data driven by advancements in sequencing technologies, the demand for efficient bioinformatics tools for sequence analysis has become paramount. BLAST-like alignment tool (BLAT), a sequence alignment tool, faces limitations in performance efficiency and integration with modern programming environments, particularly Python. This study introduces PxBLAT, a Python-based framework designed to enhance the capabilities of BLAT, focusing on usability, computational efficiency, and seamless integration within the Python ecosystem.
    RESULTS: PxBLAT demonstrates significant improvements over BLAT in execution speed and data handling, as evidenced by comprehensive benchmarks conducted across various sample groups ranging from 50 to 600 samples. These experiments highlight a notable speedup, reducing execution time compared to BLAT. The framework also introduces user-friendly features such as improved server management, data conversion utilities, and shell completion, enhancing the overall user experience. Additionally, the provision of extensive documentation and comprehensive testing supports community engagement and facilitates the adoption of PxBLAT.
    CONCLUSIONS: PxBLAT stands out as a robust alternative to BLAT, offering performance and user interaction enhancements. Its development underscores the potential for modern programming languages to improve bioinformatics tools, aligning with the needs of contemporary genomic research. By providing a more efficient, user-friendly tool, PxBLAT has the potential to impact genomic data analysis workflows, supporting faster and more accurate sequence analysis in a Python environment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Preprint
    我们介绍PxBLAT,一个Python库,旨在提高与类似BLAST的对齐工具(BLAT)交互时的可用性和效率。PxBLAT提供直观的应用程序编程接口(API)设计,允许将其功能直接合并到基于Python的生物信息学工作流中。此外,它与Biopython无缝集成,并配备了以用户为中心的功能,如服务器就绪性检查和端口重试机制。PxBLAT消除了系统调用和中间文件的必要性,以及减少延迟和数据转换开销。基准测试显示,与Python环境中的BLAT相比,PxBLAT的性能提升了约20%。可用性和实现:PxBLAT支持Python(3.8+版),和预编译的软件包通过PyPI(https://pypi.org/project/pxblat/)和Bioconda(https://anaconda.org/bioconda/pxblat)发布。PxBLAT的源代码可根据MIT开源许可证的条款获得,并托管在GitHub(https://github.com/ylab-hi/pxblat)上。它的文档可在ReadTheDocs(https://pxblat)上找到。readthedocs.io/en/latest/)。
    We introduce PxBLAT, a Python library designed to enhance usability and efficiency in interacting with the BLAST-like alignment tool (BLAT). PxBLAT provides an intuitive Application Programming Interface (API) design, allowing the incorporation of its functionality directly into Python-based bioinformatics workflows. Moreover, PxBLAT\'s design philosophy emphasizes ease of use, memory efficiency, and the elimination of intermediary files and unnecessary system calls, thereby enhancing computational speed and user experience. Benchmark tests reveal its superior performance across various datasets, illustrating its capacity to maintain correctness. PxBLAT supports Python (version 3.9+), and pre-compiled packages are released via PyPI (https://pypi.org/project/pxblat/) and Bioconda (https://anaconda.org/bioconda/pxblat). The source code and executable are freely available for academic, nonprofit, and personal use. Its documentation is available on ReadTheDocs (https://pxblat.readthedocs.io/en/latest/).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    高通量基于图像的技术现在广泛应用于快速发展的数字表型组学领域,并产生不断增加的数据量和多样性。人工智能(AI)正在成为一个改变游戏规则的人,将海量数据转化为有价值的预测和见解。然而,这需要专业的编程技能和对机器学习的深入理解,深度学习,和集成学习算法。这里,我们试图有条不紊地回顾不同工具的使用情况,技术,和可用于表型组学数据社区的服务,并展示如何将其应用于可解释的基于AI的图像分析中的选定问题。本教程为新手和专家提供了实用和有用的资源,以利用表型数据在可解释的AI主导育种计划中的潜力。
    High-throughput image-based technologies are now widely used in the rapidly developing field of digital phenomics and are generating ever-increasing amounts and diversity of data. Artificial intelligence (AI) is becoming a game changer in turning the vast seas of data into valuable predictions and insights. However, this requires specialized programming skills and an in-depth understanding of machine learning, deep learning, and ensemble learning algorithms. Here, we attempt to methodically review the usage of different tools, technologies, and services available to the phenomics data community and show how they can be applied to selected problems in explainable AI-based image analysis. This tutorial provides practical and useful resources for novices and experts to harness the potential of the phenomic data in explainable AI-led breeding programs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Glycoinformatics is a critical resource for the study of glycobiology, and glycobiology is a necessary component for understanding the complex interface between intra- and extracellular spaces. Despite this, there is limited software available to scientists studying these topics, requiring each to create fundamental data structures and representations anew for each of their applications. This leads to poor uptake of standardization and loss of focus on the real problems. We present glypy, a library written in Python for reading, writing, manipulating, and transforming glycans at several levels of precision. In addition to understanding several common formats for textual representation of glycans, the library also provides application programming interfaces (APIs) for major community databases, including GlyTouCan and UnicarbKB. The library is freely available under the Apache 2 common license with source code available at https://github.com/mobiusklein/ and documentation at https://glypy.readthedocs.io/ .
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Many of the novel ideas that drive today\'s proteomic technologies are focused essentially on experimental or data-processing workflows. The latter are implemented and published in a number of ways, from custom scripts and programs, to projects built using general-purpose or specialized workflow engines; a large part of routine data processing is performed manually or with custom scripts that remain unpublished. Facilitating the development of reproducible data-processing workflows becomes essential for increasing the efficiency of proteomic research. To assist in overcoming the bioinformatics challenges in the daily practice of proteomic laboratories, 5 years ago we developed and announced Pyteomics, a freely available open-source library providing Python interfaces to proteomic data. We summarize the new functionality of Pyteomics developed during the time since its introduction.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    BACKGROUND: The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome (Venter et al., 2001) would not have been possible without advanced assembly algorithms and the development of practical BWT based read mappers have been instrumental for NGS analysis. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there was a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use. We previously addressed this by introducing the SeqAn library of efficient data types and algorithms in 2008 (Döring et al., 2008).
    RESULTS: The SeqAn library has matured considerably since its first publication 9 years ago. In this article we review its status as an established resource for programmers in the field of sequence analysis and its contributions to many analysis tools.
    CONCLUSIONS: We anticipate that SeqAn will continue to be a valuable resource, especially since it started to actively support various hardware acceleration techniques in a systematic manner.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    BACKGROUND: In recent years, several mass spectrometry-based omics technologies emerged to investigate qualitative and quantitative changes within thousands of biologically active components such as proteins, lipids and metabolites. The research enabled through these methods potentially contributes to the diagnosis and pathophysiology of human diseases as well as to the clarification of structures and interactions between biomolecules. Simultaneously, technological advances in the field of mass spectrometry leading to an ever increasing amount of data, demand high standards in efficiency, accuracy and reproducibility of potential analysis software.
    RESULTS: This article presents the current state and ongoing developments in OpenMS, a versatile open-source framework aimed at enabling reproducible analyses of high-throughput mass spectrometry data. It provides implementations of frequently occurring processing operations on MS data through a clean application programming interface in C++ and Python. A collection of 185 tools and ready-made workflows for typical MS-based experiments enable convenient analyses for non-developers and facilitate reproducible research without losing flexibility.
    CONCLUSIONS: OpenMS will continue to increase its ease of use for developers as well as users with improved continuous integration/deployment strategies, regular trainings with updated training materials and multiple sources of support. The active developer community ensures the incorporation of new features to support state of the art research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    数据处理,管理和可视化是最先进的基于高通量质谱(MS)的蛋白质组学实验的核心和关键组成部分,通常是一些最耗时的步骤,特别是对于没有多少生物信息学支持的实验室。对蛋白质组学领域日益增长的兴趣引发了新软件库开发的增加,包括免费提供和开源软件。从数据库搜索分析到识别结果的后处理,即使这些库和软件包的目标可能有很大差异,它们通常共享许多特征。常见的用例包括蛋白质和肽序列的处理,从各种蛋白质组学搜索引擎输出文件的结果解析,以及MS相关信息(包括质谱和色谱图)的可视化。在这次审查中,我们提供现有软件库的概述,开源框架,还有,我们提供有关使用它们的一些免费应用程序的信息。本文是特刊的一部分,标题为:识别后时代的计算蛋白质组学。嘉宾编辑:马丁·艾森纳彻和克里斯蒂安·斯蒂芬。
    Data processing, management and visualization are central and critical components of a state of the art high-throughput mass spectrometry (MS)-based proteomics experiment, and are often some of the most time-consuming steps, especially for labs without much bioinformatics support. The growing interest in the field of proteomics has triggered an increase in the development of new software libraries, including freely available and open-source software. From database search analysis to post-processing of the identification results, even though the objectives of these libraries and packages can vary significantly, they usually share a number of features. Common use cases include the handling of protein and peptide sequences, the parsing of results from various proteomics search engines output files, and the visualization of MS-related information (including mass spectra and chromatograms). In this review, we provide an overview of the existing software libraries, open-source frameworks and also, we give information on some of the freely available applications which make use of them. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号