analysis pipelines

  • 文章类型: Journal Article
    器官发生,胚胎发育的阶段开始于胃泌素的结束,一直持续到出生,是了解器官发育过程中细胞分化和成熟的关键过程。单细胞转录组学技术的快速发展在理解器官发生方面带来了许多新发现,同时也积累了大量数据。为了填补这个空白,OrganogenesisDB(http://organogenesisdb.com/),这是一个全面的数据库,致力于探索器官发生过程中的细胞类型识别和基因表达动力学,已开发。OrganogenesisDB包含来自49个已发布的数据集的超过140万个细胞的单细胞RNA测序数据,这些数据涵盖了各个发育阶段。此外,针对9个人体器官和4个小鼠器官的1120种细胞类型,手动筛选3324种细胞标记。OrganogenesisDB利用各种分析工具来帮助用户注释和理解不同发育阶段的细胞类型,并帮助挖掘和呈现在细胞成熟和分化过程中表现出特定模式并发挥关键调节作用的基因。这项工作为破译细胞谱系确定和揭示器官发生机制提供了关键资源和有用的工具。
    Organogenesis, the phase of embryonic development that starts at the end of gastrulation and continues until birth is the critical process for understanding cellular differentiation and maturation during organ development. The rapid development of single-cell transcriptomics technology has led to many novel discoveries in understanding organogenesis while also accumulating a large quantity of data. To fill this gap, OrganogenesisDB (http://organogenesisdb.com/), which is a comprehensive database dedicated to exploring cell-type identification and gene expression dynamics during organogenesis, is developed. OrganogenesisDB contains single-cell RNA sequencing data for more than 1.4 million cells from 49 published datasets spanning various developmental stages. Additionally, 3324 cell markers are manually curated for 1120 cell types across 9 human organs and 4 mouse organs. OrganogenesisDB leverages various analysis tools to assist users in annotating and understanding cell types at different developmental stages and helps in mining and presenting genes that exhibit specific patterns and play key regulatory roles during cell maturation and differentiation. This work provides a critical resource and useful tool for deciphering cell lineage determination and uncovering the mechanisms underlying organogenesis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    由于16SrRNA基因的合成平台及其分析管道的多样性,使用下一代测序(NGS)始终分析细菌微生物群落具有挑战性。这项研究比较了来自人类肠道微生物组的合成16SrRNA基因的全长(V1-V9高变区)和部分长度(V3-V4高变区)测序的功效,专注于儿童肥胖。
    在这项观察和比较研究中,我们在12名阻塞性睡眠呼吸暂停儿童中,探讨了这两种测序方法在分类分类和体重状态预测方面的差异.
    Pacbio®的全长NGS方法在V1-V9地区确定了118属和248种,全部为0%的未分类率。相比之下,Illumina®的部分长度NGS方法在V3-V4区域检测到142属(未分类率39%)和6种(未分类率99%)。这些方法在肠道微生物组组成和功能预测方面显示出明显的差异。全长方法使用Firmicutes/拟杆菌比率区分肥胖和非肥胖儿童,一个已知的肥胖标记(p=0.046),而部分长度法的结论较少(p=0.075)。此外,在通过全长测序确定的73个代谢途径中,35(48%)与1级代谢相关,与通过部分长度方法鉴定的61条路径中的28条(46%)相比。全长NGS还强调了体重指数z-score之间的复杂关联,三种细菌(卵形拟杆菌,假双歧杆菌,和副血链球菌ATCC15912),和17种代谢途径。两种测序技术都揭示了肠道菌群组成与OSA相关参数之间的关系,与V3-V4技术相比,全长测序提供了对相关代谢途径的更全面了解。
    这些发现突出了基于NGS的评估中的差异,强调具有扩增子序列变异分析的全长NGS在临床肠道微生物组研究中的价值。他们强调了在未来的荟萃分析中考虑方法差异的重要性。
    UNASSIGNED: Analyzing bacterial microbiomes consistently using next-generation sequencing (NGS) is challenging due to the diversity of synthetic platforms for 16S rRNA genes and their analytical pipelines. This study compares the efficacy of full-length (V1-V9 hypervariable regions) and partial-length (V3-V4 hypervariable regions) sequencing of synthetic 16S rRNA genes from human gut microbiomes, with a focus on childhood obesity.
    UNASSIGNED: In this observational and comparative study, we explored the differences between these two sequencing methods in taxonomic categorization and weight status prediction among twelve children with obstructive sleep apnea.
    UNASSIGNED: The full-length NGS method by Pacbio® identified 118 genera and 248 species in the V1-V9 regions, all with a 0% unclassified rate. In contrast, the partial-length NGS method by Illumina® detected 142 genera (with a 39% unclassified rate) and 6 species (with a 99% unclassified rate) in the V3-V4 regions. These approaches showed marked differences in gut microbiome composition and functional predictions. The full-length method distinguished between obese and non-obese children using the Firmicutes/Bacteroidetes ratio, a known obesity marker (p = 0.046), whereas the partial-length method was less conclusive (p = 0.075). Additionally, out of 73 metabolic pathways identified through full-length sequencing, 35 (48%) were associated with level 1 metabolism, compared to 28 of 61 pathways (46%) identified through the partial-length method. The full-length NGS also highlighted complex associations between body mass index z-score, three bacterial species (Bacteroides ovatus, Bifidobacterium pseudocatenulatum, and Streptococcus parasanguinis ATCC 15912), and 17 metabolic pathways. Both sequencing techniques revealed relationships between gut microbiota composition and OSA-related parameters, with full-length sequencing offering more comprehensive insights into associated metabolic pathways than the V3-V4 technique.
    UNASSIGNED: These findings highlight disparities in NGS-based assessments, emphasizing the value of full-length NGS with amplicon sequence variant analysis for clinical gut microbiome research. They underscore the importance of considering methodological differences in future meta-analyses.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Preprint
    DNA元件百科全书(ENCODE)项目是一项合作努力,旨在创建人类基因组中功能元件的综合目录。目前的数据库包括超过19000个功能基因组学实验,在1000多个细胞系和组织中使用广泛的实验技术来研究染色质结构,智人和小家鼠基因组的调控和转录景观。所有实验数据,元数据,由ENCODE联盟创建的相关计算分析提交给数据协调中心(DCC)进行验证,跟踪,storage,并分配给社区资源和科学界。ENCODE项目设计和分布了统一的处理管道,以促进数据来源和可重复性,并允许基因组资源和其他联盟之间的互操作性。所有数据文件,参考基因组版本,软件版本,和管道使用的参数通过ENCODE门户捕获和可用。管道代码,使用Docker和工作流描述语言(WDL;https://openwdl.org/)开发的产品在GitHub中公开可用,使用Dockerhub上可用的映像(https://hub。docker.com),能够接触到各种各样的生物医学研究人员。由DCC维护和使用的ENCODE管道可以安装在个人计算机上运行,本地HPC群集,或通过克伦威尔在云计算环境中。通过云访问管道和数据使小型实验室能够使用数据或软件,而无需访问机构计算集群。用于分析和质量控制的计算方法的标准化导致来自不同ENCODE集合的可比结果-成功的综合分析的先决条件。
    The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    肝脏对消化系统和免疫系统至关重要。尽管肝脏的生理学和病理学已经得到了很好的研究,并且产生了许多scRNA-seq数据,缺乏用于在单细胞分辨率下表征不同肝脏疾病或发育阶段的细胞类型和基因表达的数据库和景观。因此,scLiverDB的开发,用于人类和小鼠肝脏转录组的专门数据库,以解开肝细胞类型的景观,在各种肝脏疾病/细胞类型/发育阶段的单细胞分辨率下的细胞异质性和基因表达。迄今为止,收集了62个数据集,包括9,050个样本和1,741,734个细胞。使用统一的工作流程,其中包括质量控制,降维,聚类,和细胞类型注释,以在同一平台上分析数据集;集成了手动和自动方法,以实现准确的细胞类型识别,并提供了具有多尺度功能的用户友好的Web界面。有两个案例研究显示了scLiverDB的有用性,确定LTB(淋巴毒素β)基因是淋巴样细胞分化的潜在生物标志物,并显示Foxa3(叉头框A3)在肝脏慢性进行性疾病中的表达变化。这项工作提供了一个关键的资源来解决分子和细胞信息在正常,患病,发育人类和小鼠肝脏。
    The liver is critical for the digestive and immune systems. Although the physiology and pathology of liver have been well studied and many scRNA-seq data are generated, a database and landscape for characterizing cell types and gene expression in different liver diseases or developmental stages at single-cell resolution are lacking. Hence, scLiverDB is developed, a specialized database for human and mouse liver transcriptomes to unravel the landscape of liver cell types, cell heterogeneity and gene expression at single-cell resolution across various liver diseases/cell types/developmental stages. To date, 62 datasets including 9,050 samples and 1,741,734 cells is curated. A uniform workflow is used, which included quality control, dimensional reduction, clustering, and cell-type annotation to analyze datasets on the same platform; integrated manual and automatic methods for accurate cell-type identification and provided a user-friendly web interface with multiscale functions. There are two case studies to show the usefulness of scLiverDB, which identified the LTB (lymphotoxin Beta) gene as a potential biomarker of lymphoid cells differentiation and showed the expression changes of Foxa3 (forkhead box A3) in liver chronic progressive diseases. This work provides a crucial resource to resolve molecular and cellular information in normal, diseased, and developing human and mouse livers.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Preprint
    DNA元件百科全书(ENCODE)项目是一项合作努力,旨在创建人类基因组中功能元件的综合目录。目前的数据库包括超过19000个功能基因组学实验,在1000多个细胞系和组织中使用广泛的实验技术来研究染色质结构,智人和小家鼠基因组的调控和转录景观。所有实验数据,元数据,由ENCODE联盟创建的相关计算分析提交给数据协调中心(DCC)进行验证,跟踪,storage,并分配给社区资源和科学界。ENCODE项目设计和分布了统一的处理管道,以促进数据来源和可重复性,并允许基因组资源和其他联盟之间的互操作性。所有数据文件,参考基因组版本,软件版本,和管道使用的参数通过ENCODE门户捕获和可用。管道代码,使用Docker和工作流描述语言(WDL;https://openwdl.org/)开发的产品在GitHub中公开可用,使用Dockerhub上可用的映像(https://hub。docker.com),能够接触到各种各样的生物医学研究人员。由DCC维护和使用的ENCODE管道可以安装在个人计算机上运行,本地HPC群集,或通过克伦威尔在云计算环境中。通过云访问管道和数据使小型实验室能够使用数据或软件,而无需访问机构计算集群。用于分析和质量控制的计算方法的标准化导致来自不同ENCODE集合的可比结果-成功的综合分析的先决条件。数据库URL:https://www。encodeproject.org/.
    The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Comparative Study
    BackgroundWhole genome sequencing (WGS) is a reliable tool for studying tuberculosis (TB) transmission. WGS data are usually processed by custom-built analysis pipelines with little standardisation between them.AimTo compare the impact of variability of several WGS analysis pipelines used internationally to detect epidemiologically linked TB cases.MethodsFrom the Netherlands, 535 Mycobacterium tuberculosis complex (MTBC) strains from 2016 were included. Epidemiological information obtained from municipal health services was available for all mycobacterial interspersed repeat unit-variable number of tandem repeat (MIRU-VNTR) clustered cases. WGS data was analysed using five different pipelines: one core genome multilocus sequence typing (cgMLST) approach and four single nucleotide polymorphism (SNP)-based pipelines developed in Oxford, United Kingdom; Borstel, Germany; Bilthoven, the Netherlands and Copenhagen, Denmark. WGS clusters were defined using a maximum pairwise distance of 12 SNPs/alleles.ResultsThe cgMLST approach and Oxford pipeline clustered all epidemiologically linked cases, however, in the other three SNP-based pipelines one epidemiological link was missed due to insufficient coverage. In general, the genetic distances varied between pipelines, reflecting different clustering rates: the cgMLST approach clustered 92 cases, followed by 84, 83, 83 and 82 cases in the SNP-based pipelines from Copenhagen, Oxford, Borstel and Bilthoven respectively.ConclusionConcordance in ruling out epidemiological links was high between pipelines, which is an important step in the international validation of WGS data analysis. To increase accuracy in identifying TB transmission clusters, standardisation of crucial WGS criteria and creation of a reference database of representative MTBC sequences would be advisable.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    In cognitive neuroscience, functional magnetic resonance imaging (fMRI) data are widely analyzed using general linear models (GLMs). However, model quality of GLMs for fMRI is rarely assessed, in part due to the lack of formal measures for statistical model inference.
    We introduce a new SPM toolbox for model assessment, comparison and selection (MACS) of GLMs applied to fMRI data. MACS includes classical, information-theoretic and Bayesian methods of model assessment previously applied to GLMs for fMRI as well as recent methodological developments of model selection and model averaging in fMRI data analysis.
    The toolbox - which is freely available from GitHub - directly builds on the Statistical Parametric Mapping (SPM) software package and is easy-to-use, general-purpose, modular, readable and extendable. We validate the toolbox by reproducing model selection and model averaging results from earlier publications.
    A previous toolbox for model diagnosis in fMRI has been discontinued and other approaches to model comparison between GLMs have not been translated into reusable computational resources in the past.
    Increased attention on model quality will lead to lower false-positive rates in cognitive neuroscience and increased application of the MACS toolbox will increase the reproducibility of GLM analyses and is likely to increase the replicability of fMRI studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    We describe open, reproducible pipelines that create an integrated genomic profile of a cancer and use the profile to find mutations associated with disease and potentially useful drugs. These pipelines analyze high-throughput cancer exome and transcriptome sequence data together with public databases to find relevant mutations and drugs. The three pipelines that we have developed are: (1) an exome analysis pipeline, which uses whole or targeted tumor exome sequence data to produce a list of putative variants (no matched normal data are needed); (2) a transcriptome analysis pipeline that processes whole tumor transcriptome sequence (RNA-seq) data to compute gene expression and find potential gene fusions; and (3) an integrated variant analysis pipeline that uses the tumor variants from the exome pipeline and tumor gene expression from the transcriptome pipeline to identify deleterious and druggable mutations in all genes and in highly expressed genes. These pipelines are integrated into the popular Web platform Galaxy at http://usegalaxy.org/cancer to make them accessible and reproducible, thereby providing an approach for doing standardized, distributed analyses in clinical studies. We have used our pipeline to identify similarities and differences between pancreatic adenocarcinoma cancer cell lines and primary tumors.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    IlluminaHumanMethumanMethylation450BeadChip已成为在全表观基因组关联研究(EWAS)和相关项目以及国际癌症基因组联盟(ICGC)和国际人类基因组联盟(IHEC)等资源努力中询问DNA甲基化的流行平台。这导致了近年来450k数据的指数增长,并引发了许多集成分析管道和独立软件包的开发。这篇评论将介绍和讨论当前最受欢迎的管道和软件包,特别针对新的450k用户。
    The Illumina HumanMethylation450 BeadChip has become a popular platform for interrogating DNA methylation in epigenome-wide association studies (EWAS) and related projects as well as resource efforts such as the International Cancer Genome Consortium (ICGC) and the International Human Epigenome Consortium (IHEC). This has resulted in an exponential increase of 450k data in recent years and triggered the development of numerous integrated analysis pipelines and stand-alone packages. This review will introduce and discuss the currently most popular pipelines and packages and is particularly aimed at new 450k users.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号