nextflow

nextflow
  • 文章类型: Journal Article
    微生物全基因组分析鉴定原核基因组中存在或不存在的基因。然而,在分析具有较高序列多样性或较高分类顺序的物种(如属或科)时,当前的工具受到限制。RoaryILP细菌核心注释管道(RIBAP)使用整数线性规划方法来细化Roary预测的基因簇,以识别核心基因。RIBAP成功处理了衣原体的复杂性和多样性,克雷伯菌属,布鲁氏菌,和肠球菌基因组,优于其他已建立和最近的pangenome工具,可在属水平上识别无所不包的核心基因。RIBAP是免费提供的Nextflow管道,网址为github.com/hoelzer-lab/ribap和zenodo.org/doi/10.5281/zenodo.10890871。
    Microbial pangenome analysis identifies present or absent genes in prokaryotic genomes. However, current tools are limited when analyzing species with higher sequence diversity or higher taxonomic orders such as genera or families. The Roary ILP Bacterial core Annotation Pipeline (RIBAP) uses an integer linear programming approach to refine gene clusters predicted by Roary for identifying core genes. RIBAP successfully handles the complexity and diversity of Chlamydia, Klebsiella, Brucella, and Enterococcus genomes, outperforming other established and recent pangenome tools for identifying all-encompassing core genes at the genus level. RIBAP is a freely available Nextflow pipeline at github.com/hoelzer-lab/ribap and zenodo.org/doi/10.5281/zenodo.10890871.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本研究描述了一个资源模块的开发,该模块是名为“NIGMSSandboxforCloud-basedLearning”(https://github.com/NIGMS/NIGMS-Sandbox)的学习平台的一部分。沙箱的整体起源在本补编开头的编辑NIGMS沙箱中进行了描述。该模块使用Nextflow以交互式格式提供有关从头转录组组装的学习材料,该格式使用适当的云资源进行数据访问和分析。云计算是一种强大的新手段,生物医学研究人员可以通过它访问以前无法实现或过于昂贵的资源和容量。为了利用这些资源,然而,生物医学研究界需要新的技能和知识。我们在这里介绍一个基于云的训练模块,与GoogleCloud共同开发,德勤咨询,和NIHSTRIDES计划,它使用从头转录组组装的生物学问题来演示和教授计算工作流(使用Nextflow)以及云服务的成本和资源高效使用(使用GoogleCloudPlatform)的概念。我们的工作强调了减少现场计算资源的必要性和基于云的基础设施对生物信息学应用的可访问性。
    This study describes the development of a resource module that is part of a learning platform named \"NIGMS Sandbox for Cloud-based Learning\" (https://github.com/NIGMS/NIGMS-Sandbox). The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module delivers learning materials on de novo transcriptome assembly using Nextflow in an interactive format that uses appropriate cloud resources for data access and analysis. Cloud computing is a powerful new means by which biomedical researchers can access resources and capacity that were previously either unattainable or prohibitively expensive. To take advantage of these resources, however, the biomedical research community needs new skills and knowledge. We present here a cloud-based training module, developed in conjunction with Google Cloud, Deloitte Consulting, and the NIH STRIDES Program, that uses the biological problem of de novo transcriptome assembly to demonstrate and teach the concepts of computational workflows (using Nextflow) and cost- and resource-efficient use of Cloud services (using Google Cloud Platform). Our work highlights the reduced necessity of on-site computing resources and the accessibility of cloud-based infrastructure for bioinformatics applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:可变数量串联重复序列(VNTR)是具有许多潜在致病变异的高度多态性DNA区域。然而,由于其重复性,VNTR通常在变异数据库中出现未解析(“暗”)。一种特别复杂和医学相关的VNTR是位于心血管疾病基因LPA中的KIV-2VNTR,其包含高达70%的编码序列。
    结果:使用高度复杂的LPA基因作为模型,我们开发了一种计算方法来解决从大量可用的短阅读测序数据中VNTR的重复内变异。我们将该方法应用于来自1000基因组计划的2504个样品中的六个蛋白质编码VNTR,并开发了一种针对LPAKIV-2VNTR的优化方法,该方法可以预先区分混杂的KIV-2亚型。与先前公布的策略相比,这导致F1分数提高高达2.1倍。最后,我们分析了>199,000个英国生物库样本中的LPAVNTR,检测>700KIV-2突变。这种方法成功地揭示了新的强Lp(a)-降低KIV-2变体的作用,对冠状动脉疾病有保护作用,并基于标记SNP验证了先前的发现。
    结论:我们的方法为在VNTRs中进行大规模的可靠变异检测铺平了道路,我们表明它可以转移到其他暗区,这将有助于解锁隐藏在VNTR中的医疗信息。
    Variable number tandem repeats (VNTRs) are highly polymorphic DNA regions harboring many potentially disease-causing variants. However, VNTRs often appear unresolved (\"dark\") in variation databases due to their repetitive nature. One particularly complex and medically relevant VNTR is the KIV-2 VNTR located in the cardiovascular disease gene LPA which encompasses up to 70% of the coding sequence.
    Using the highly complex LPA gene as a model, we develop a computational approach to resolve intra-repeat variation in VNTRs from largely available short-read sequencing data. We apply the approach to six protein-coding VNTRs in 2504 samples from the 1000 Genomes Project and developed an optimized method for the LPA KIV-2 VNTR that discriminates the confounding KIV-2 subtypes upfront. This results in an F1-score improvement of up to 2.1-fold compared to previously published strategies. Finally, we analyze the LPA VNTR in > 199,000 UK Biobank samples, detecting > 700 KIV-2 mutations. This approach successfully reveals new strong Lp(a)-lowering effects for KIV-2 variants, with protective effect against coronary artery disease, and also validated previous findings based on tagging SNPs.
    Our approach paves the way for reliable variant detection in VNTRs at scale and we show that it is transferable to other dark regions, which will help unlock medical information hidden in VNTRs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    核糖体谱分析是一种在转录组范围内研究翻译的强大技术。然而,确保良好的数据质量对于准确解释至关重要,确保分析是可重复的。我们引入了一个新的NextflowDSL2管道,riboseq-flow,设计用于核糖体谱分析实验的加工和全面质量控制。Riboseq-flow是用户友好的,多才多艺,坚持高标准的可重复性,可扩展性,便携性,版本控制和持续集成。它使用户能够有效地并行分析多个样本,并帮助他们根据自动生成的详细指标和可视化来评估数据的质量和实用性。Riboseq-flow可在https://github.com/iraisub/riboseq-flow获得。
    核糖体谱分析是一种先进的方法,可提供细胞内整个RNA分子的蛋白质合成的详细视图。为确保此类研究的可靠性,高质量的数据和复制分析的能力至关重要。为了解决这个问题,我们展示了核糖序列流,NextflowDSL2构建的新工具,专门用于分析核糖体分析实验的数据。这条管道因其易用性而脱颖而出,灵活性,并致力于高再现性标准。它的设计可以同时处理多个样本,确保大规模研究的有效分析。此外,riboseq-flow自动生成详细的报告和视觉表示来评估数据质量,加强研究人员对他们实验的理解,并指导未来的决策。这个宝贵的资源可以在https://github.com/iraisub/riboseq-flow上免费访问。
    Ribosome profiling is a powerful technique to study translation at a transcriptome-wide level. However, ensuring good data quality is paramount for accurate interpretation, as is ensuring that the analyses are reproducible. We introduce a new Nextflow DSL2 pipeline, riboseq-flow, designed for processing and comprehensive quality control of ribosome profiling experiments. Riboseq-flow is user-friendly, versatile and upholds high standards in reproducibility, scalability, portability, version control and continuous integration. It enables users to efficiently analyse multiple samples in parallel and helps them evaluate the quality and utility of their data based on the detailed metrics and visualisations that are automatically generated. Riboseq-flow is available at https://github.com/iraiosub/riboseq-flow.
    Ribosome profiling is a cutting-edge method that provides a detailed view of protein synthesis across the entire set of RNA molecules within cells. To ensure the reliability of such studies, high-quality data and the ability to replicate analyses are crucial. To address this, we present riboseq-flow, a new tool built with Nextflow DSL2, tailored for analysing data from ribosome profiling experiments. This pipeline stands out for its ease of use, flexibility, and commitment to high reproducibility standards. It\'s designed to handle multiple samples simultaneously, ensuring efficient analysis for large-scale studies. Moreover, riboseq-flow automatically generates detailed reports and visual representations to assess the data quality, enhancing researchers\' understanding of their experiments and guiding future decisions. This valuable resource is freely accessible at https://github.com/iraiosub/riboseq-flow.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    单细胞多路复用技术(细胞散列和遗传多路复用)结合多个样品,优化样品处理并降低成本。细胞散列缀合抗体标签或化学寡核苷酸到细胞膜,而遗传多路复用允许混合遗传多样性的样品,并依赖于RNA读段在已知基因组坐标处的聚集。我们开发了hadge(结合基因型信息进行哈希反卷积),一个Nextflow管道,它结合了12种方法来执行基于哈希和基因型的反卷积。我们提出了一种结合最佳性能方法的联合反卷积策略,并演示了这种方法如何导致在新鲜冷冻的脑组织的细胞核散列中回收先前丢弃的细胞。
    Single-cell multiplexing techniques (cell hashing and genetic multiplexing) combine multiple samples, optimizing sample processing and reducing costs. Cell hashing conjugates antibody-tags or chemical-oligonucleotides to cell membranes, while genetic multiplexing allows to mix genetically diverse samples and relies on aggregation of RNA reads at known genomic coordinates. We develop hadge (hashing deconvolution combined with genotype information), a Nextflow pipeline that combines 12 methods to perform both hashing- and genotype-based deconvolution. We propose a joint deconvolution strategy combining best-performing methods and demonstrate how this approach leads to the recovery of previously discarded cells in a nuclei hashing of fresh-frozen brain tissue.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    结论:SIMSAPiper是一个Nextflow管道,它可以创建可靠的,与标准的基于结构的比对方法相比,在时间范围内具有数千个蛋白质序列的结构信息MSA。结构信息可以由用户提供或由管道从在线资源收集。可以激活具有基于序列同一性的子集的并行化以显著加速比对过程。最后,通过利用守恒二级结构元素的位置,可以减少最终对齐中的间隙数量。
    方法:管道是使用Nextflow实现的,Python3和Bash。它可在github.com/Bio2Byte/simpsapiper上公开获得。
    背景:所有数据在GitHub上都可用。
    CONCLUSIONS: SIMSApiper is a Nextflow pipeline that creates reliable, structure-informed MSAs of thousands of protein sequences faster than standard structure-based alignment methods. Structural information can be provided by the user or collected by the pipeline from online resources. Parallelization with sequence identity-based subsets can be activated to significantly speed up the alignment process. Finally, the number of gaps in the final alignment can be reduced by leveraging the position of conserved secondary structure elements.
    METHODS: The pipeline is implemented using Nextflow, Python3, and Bash. It is publicly available on github.com/Bio2Byte/simsapiper.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    我们设计了一个基于NextflowDSL2的管道,空间转录组学定量(STQ),用于同时处理10倍基因组学空间转录组学数据和匹配的苏木精和伊红(H&E)染色的全载玻片图像(WSI),针对患者来源的异种移植(PDX)癌症标本进行了优化。我们的管道能够对测序的转录本进行分类,以对小鼠和人类物种进行去卷积,并将转录本映射到参考转录组。我们将H&EWSI与Visium载玻片的空间布局对齐,并为每个Visium斑点生成成像和定量形态特征。管道设计可实现多个分析工作流程,包括单或双参考基因组输入和独立的图像分析。我们展示了我们的管道在来自四个黑色素瘤PDX样品的Visium分析的数据集上的实用性。铯点的聚类和H&E成像特征的聚类揭示了两种数据模式产生的相似模式。
    We designed a Nextflow DSL2-based pipeline, Spatial Transcriptomics Quantification (STQ), for simultaneous processing of 10x Genomics Visium spatial transcriptomics data and a matched hematoxylin and eosin (H&E)-stained whole-slide image (WSI), optimized for patient-derived xenograft (PDX) cancer specimens. Our pipeline enables the classification of sequenced transcripts for deconvolving the mouse and human species and mapping the transcripts to reference transcriptomes. We align the H&E WSI with the spatial layout of the Visium slide and generate imaging and quantitative morphology features for each Visium spot. The pipeline design enables multiple analysis workflows, including single or dual reference genome input and stand-alone image analysis. We show the utility of our pipeline on a dataset from Visium profiling of four melanoma PDX samples. The clustering of Visium spots and clustering of H&E imaging features reveal similar patterns arising from the two data modalities.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    近年来,数据独立采集(DIA)已成为生物质谱(MS)中一种强大的分析方法。与以前主要的数据相关采集(DDA)相比,它提供了一种实现更大再现性的方法,灵敏度,和MS测量的动态范围。要使非专家用户可以访问DIA,一个多功能的,自动化高通量管道DIA蛋白质组学在计算工作流框架“Nextflow”(https://nextflow.io)中实现。这允许在不同的计算基础设施上高通量处理蛋白质组学和肽DIA数据集。本章提供了有关使用命令行分析peptidomics数据集的此管道最重要操作模式的简短摘要和使用协议指南。简而言之,DIAproteomics是围绕OpenSwathWorkflow的包装器,并且依赖于来自匹配DDA运行的现有或临时生成的光谱库。OpenSwathWorkflow从DIA运行中提取色谱,并执行色谱峰拾取。在管道的下游,这些山峰被评分,对齐,并根据用户的兴趣对不同条件下的定性和定量差异进行统计评估。DIAproteomics是开源的,可以在许可下获得。我们鼓励科学界使用或修改管道以满足其特定要求。
    In recent years, data-independent acquisition (DIA) has emerged as a powerful analysis method in biological mass spectrometry (MS). Compared to the previously predominant data-dependent acquisition (DDA), it offers a way to achieve greater reproducibility, sensitivity, and dynamic range in MS measurements. To make DIA accessible to non-expert users, a multifunctional, automated high-throughput pipeline DIAproteomics was implemented in the computational workflow framework \"Nextflow\" ( https://nextflow.io ). This allows high-throughput processing of proteomics and peptidomics DIA datasets on diverse computing infrastructures. This chapter provides a short summary and usage protocol guide for the most important modes of operation of this pipeline regarding the analysis of peptidomics datasets using the command line. In brief, DIAproteomics is a wrapper around the OpenSwathWorkflow and relies on either existing or ad-hoc generated spectral libraries from matching DDA runs. The OpenSwathWorkflow extracts chromatograms from the DIA runs and performs chromatographic peak-picking. Further downstream of the pipeline, these peaks are scored, aligned, and statistically evaluated for qualitative and quantitative differences across conditions depending on the user\'s interest. DIAproteomics is open-source and available under a permissive license. We encourage the scientific community to use or modify the pipeline to meet their specific requirements.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    群体基因组分析,如推断群体结构和识别选择特征,通常涉及大量工具的应用。工具的安装及其依赖关系,数据转换,或按特定顺序进行一系列数据预处理,有时使分析具有挑战性。虽然使用基于容器的技术已经显著解决了与工具安装及其依赖关系相关的问题,通过应用Nextflow和Snakemake等工作流管理系统,可以极大地促进需要多步骤管道或复杂数据转换的人口基因组学分析。这里,我们现在出现了鳞茎,一组完全自动化的工作流程,可以对存储在变体调用格式文件或plink生成的二进制文件中的双等位基因单核苷酸多态性数据进行广泛使用的群体基因组学分析。Scalepopgen在Nextflow中开发,可以在本地或使用Conda,奇点,或者Docker。自动化工作流程包括诸如(1)过滤个体和基因型、(2)主成分分析,确定最佳K值的混合,(3)在有或没有引导和迁移边缘的情况下运行Treemix分析,然后识别最佳数量的迁移边,(4)实施基于单种群和配对种群比较的程序,以识别选择的基因组特征。管道使用各种开源工具;此外,还提供了几个Python和R脚本来收集和可视化结果。该工具可在以下网址免费获得:https://github.com/Popgen48/scalepopgen。
    Population genomic analyses such as inference of population structure and identifying signatures of selection usually involve the application of a plethora of tools. The installation of tools and their dependencies, data transformation, or series of data preprocessing in a particular order sometimes makes the analyses challenging. While the usage of container-based technologies has significantly resolved the problems associated with the installation of tools and their dependencies, population genomic analyses requiring multistep pipelines or complex data transformation can greatly be facilitated by the application of workflow management systems such as Nextflow and Snakemake. Here, we present scalepopgen, a collection of fully automated workflows that can carry out widely used population genomic analyses on the biallelic single nucleotide polymorphism data stored in either variant calling format files or the plink-generated binary files. scalepopgen is developed in Nextflow and can be run locally or on high-performance computing systems using either Conda, Singularity, or Docker. The automated workflow includes procedures such as (i) filtering of individuals and genotypes; (ii) principal component analysis, admixture with identifying optimal K-values; (iii) running TreeMix analysis with or without bootstrapping and migration edges, followed by identification of an optimal number of migration edges; (iv) implementing single-population and pair-wise population comparison-based procedures to identify genomic signatures of selection. The pipeline uses various open-source tools; additionally, several Python and R scripts are also provided to collect and visualize the results. The tool is freely available at https://github.com/Popgen48/scalepopgen.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Review
    背景:DNA测序技术的进步已经改变了细菌基因组学领域,与十年前相比,允许更快、更经济的染色体水平组装。然而,由于从不同测序仪器获得的数据的质量和数量各不相同,因此将原始读数转换为完整的基因组模型是一项重大的计算挑战。以及基因组的内在特征和所需的分析。为了解决这个问题,我们使用Nextflow开发了一组基于容器的管道,为没有经验的用户提供共同的工作流程,并为有经验的用户提供高级别的定制。他们的处理策略是基于测序数据类型的适应性,和他们的模块化使新的组件的合并,以满足社区的不断变化的需求。方法:这些管道包括三个部分:质量控制,从头基因组组装,和细菌基因组注释。特别是,基因组注释管道提供了基因组的全面概述,包括标准基因预测和功能推断,以及与临床应用相关的预测,如毒力和抗性基因注释,次生代谢物检测,propage和质粒预测,还有更多.成果:成果注解成果在报导中,基因组浏览器,和基于Web的应用程序,使用户能够探索基因组注释结果并与之交互。结论:总体而言,我们的用户友好的管道提供了计算工具的无缝集成,以促进常规细菌基因组学研究。通过检查肺炎克雷伯菌临床样品的测序数据来说明这些方法的有效性。
    Background: Advancements in DNA sequencing technology have transformed the field of bacterial genomics, allowing for faster and more cost effective chromosome level assemblies compared to a decade ago. However, transforming raw reads into a complete genome model is a significant computational challenge due to the varying quality and quantity of data obtained from different sequencing instruments, as well as intrinsic characteristics of the genome and desired analyses. To address this issue, we have developed a set of container-based pipelines using Nextflow, offering both common workflows for inexperienced users and high levels of customization for experienced ones. Their processing strategies are adaptable based on the sequencing data type, and their modularity enables the incorporation of new components to address the community\'s evolving needs. Methods: These pipelines consist of three parts: quality control, de novo genome assembly, and bacterial genome annotation. In particular, the genome annotation pipeline provides a comprehensive overview of the genome, including standard gene prediction and functional inference, as well as predictions relevant to clinical applications such as virulence and resistance gene annotation, secondary metabolite detection, prophage and plasmid prediction, and more. Results: The annotation results are presented in reports, genome browsers, and a web-based application that enables users to explore and interact with the genome annotation results. Conclusions: Overall, our user-friendly pipelines offer a seamless integration of computational tools to facilitate routine bacterial genomics research. The effectiveness of these is illustrated by examining the sequencing data of a clinical sample of Klebsiella pneumoniae.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号