single-cell analysis

单细胞分析
  • 文章类型: Journal Article
    已经开发了几种方法来计算预测单细胞RNA测序(scRNAseq)数据的细胞类型。随着方法的发展,调查人员面临的一个普遍问题是确定他们应该应用于其特定用例的最佳方法。为了应对这一挑战,我们提出了CHAI(用于单细胞类型识别的一致性聚类相似矩阵积分),scRNAseq聚类的人群方法的智慧。CHAI提出了两种竞争方法,它们汇总了来自七种最先进的聚类方法的聚类结果:CHAI-AvgSim和CHAI-SNF。CHAI-AvgSim和CHAI-SNF在多个基准测试数据集上表现出卓越的性能。此外,两种CHAI方法都优于最新的共识聚类方法,SAME-clustering。我们通过鉴定富含CDH3的前导肿瘤细胞团来展示CHAI的实际用例。CHAI提供了一个多体集成的平台,我们证明了CHAI-SNF在包括空间转录组学数据时具有更好的性能。CHAI通过将最新和性能最高的scRNAseq聚类算法合并到聚合框架中,克服了以前的局限性。它也是一个直观且易于自定义的R包,用户可以将自己的聚类方法添加到管道中,或向下选择他们想要用于群集聚合的那些。这确保了随着更先进的聚类算法的开发,CHAI将作为一个通用框架对社区仍然有用。CHAI可以在GitHub上作为开源R包提供:https://github.com/lodimk2/chai。
    Several methods have been developed to computationally predict cell-types for single cell RNA sequencing (scRNAseq) data. As methods are developed, a common problem for investigators has been identifying the best method they should apply to their specific use-case. To address this challenge, we present CHAI (consensus Clustering tHrough similArIty matrix integratIon for single cell-type identification), a wisdom of crowds approach for scRNAseq clustering. CHAI presents two competing methods which aggregate the clustering results from seven state-of-the-art clustering methods: CHAI-AvgSim and CHAI-SNF. CHAI-AvgSim and CHAI-SNF demonstrate superior performance across several benchmarking datasets. Furthermore, both CHAI methods outperform the most recent consensus clustering method, SAME-clustering. We demonstrate CHAI\'s practical use case by identifying a leader tumor cell cluster enriched with CDH3. CHAI provides a platform for multiomic integration, and we demonstrate CHAI-SNF to have improved performance when including spatial transcriptomics data. CHAI overcomes previous limitations by incorporating the most recent and top performing scRNAseq clustering algorithms into the aggregation framework. It is also an intuitive and easily customizable R package where users may add their own clustering methods to the pipeline, or down-select just the ones they want to use for the clustering aggregation. This ensures that as more advanced clustering algorithms are developed, CHAI will remain useful to the community as a generalized framework. CHAI is available as an open source R package on GitHub: https://github.com/lodimk2/chai.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:单细胞RNA测序(scRNA-seq)和空间分辨转录组学(SRT)导致了生命科学领域的突破性进展。为scRNA-seq和SRT数据开发生物信息学工具并执行无偏基准测试,通过提供明确的地面实况和生成定制的数据集,数据模拟已被广泛采用。然而,仿真方法在多种场景下的性能尚未得到全面评估,这使得在没有实际指导的情况下选择合适的方法变得具有挑战性。
    结果:我们在准确性方面系统地评估了为scRNA-seq和/或SRT数据开发的49种模拟方法,功能,可扩展性,和可用性使用来自24个平台的152个参考数据集。SRTsim,scDesign3,ZINB-WAVE,和scDesign2在各种平台上具有最佳的精度性能。出乎意料的是,一些针对scRNA-seq数据定制的方法对于模拟SRT数据具有潜在的兼容性。伦,斯帕西姆,和scDesign3-tree在相应的仿真场景下优于其他方法。Phenopath,伦,简单,和MFA产生高可扩展性得分,但它们不能生成真实的模拟数据。用户在做出决策时应考虑方法准确性和可伸缩性(或功能)之间的权衡。此外,执行错误主要是由于失败的参数估计和在计算中出现缺失或无限值引起的。我们提供了方法选择的实用指南,标准管道Simpipe(https://github.com/duohongrui/simpipe;https://doi.org/10.5281/zenodo.11178409),和在线工具Simsite(https://www.ciblab.net/软件/simshiny/)用于数据模拟。
    结论:没有一种方法在所有标准下都表现最好,因此,如果有效和合理地解决问题,建议使用一种好的但不是最好的方法。我们的全面工作为开发人员提供了有关基因表达数据建模的重要见解,并为用户提供了模拟过程。
    Single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) have led to groundbreaking advancements in life sciences. To develop bioinformatics tools for scRNA-seq and SRT data and perform unbiased benchmarks, data simulation has been widely adopted by providing explicit ground truth and generating customized datasets. However, the performance of simulation methods under multiple scenarios has not been comprehensively assessed, making it challenging to choose suitable methods without practical guidelines.
    We systematically evaluated 49 simulation methods developed for scRNA-seq and/or SRT data in terms of accuracy, functionality, scalability, and usability using 152 reference datasets derived from 24 platforms. SRTsim, scDesign3, ZINB-WaVE, and scDesign2 have the best accuracy performance across various platforms. Unexpectedly, some methods tailored to scRNA-seq data have potential compatibility for simulating SRT data. Lun, SPARSim, and scDesign3-tree outperform other methods under corresponding simulation scenarios. Phenopath, Lun, Simple, and MFA yield high scalability scores but they cannot generate realistic simulated data. Users should consider the trade-offs between method accuracy and scalability (or functionality) when making decisions. Additionally, execution errors are mainly caused by failed parameter estimations and appearance of missing or infinite values in calculations. We provide practical guidelines for method selection, a standard pipeline Simpipe ( https://github.com/duohongrui/simpipe ; https://doi.org/10.5281/zenodo.11178409 ), and an online tool Simsite ( https://www.ciblab.net/software/simshiny/ ) for data simulation.
    No method performs best on all criteria, thus a good-yet-not-the-best method is recommended if it solves problems effectively and reasonably. Our comprehensive work provides crucial insights for developers on modeling gene expression data and fosters the simulation process for users.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    肿瘤内异质性损害了结直肠癌转录组分类的临床价值。我们调查了转录组异质性的预后效应,以及在来自692例患者的1093例肿瘤样本的单医院系列中,较不容易受到异质性影响的分类的可能性。包括98个原发性肿瘤和35个原发性转移组的多区域样本。我们表明,共有分子亚型(CMS)的肿瘤内异质性是常见的,并且与肿瘤微环境标志物无关,预后不良。多区域转录组学揭示了癌细胞固有和低异质性信号,这些信号概括了单细胞测序提出的固有CMS。进一步的子分类识别了一致的CMS,这些CMS解释了比肿瘤内异质性更大的患者生存率变化比例。可塑性由匹配的原发性和转移性肿瘤的不一致内在表型指示。我们得出的结论是,在肿瘤内异质性的背景下,多区域采样可以调和来自单细胞和大量转录组学的肿瘤分类的预后能力,和表型可塑性挑战了原发性和转移性亚型的协调。
    Intra-tumor heterogeneity compromises the clinical value of transcriptomic classifications of colorectal cancer. We investigated the prognostic effect of transcriptomic heterogeneity and the potential for classifications less vulnerable to heterogeneity in a single-hospital series of 1093 tumor samples from 692 patients, including multiregional samples from 98 primary tumors and 35 primary-metastasis sets. We show that intra-tumor heterogeneity of the consensus molecular subtypes (CMS) is frequent and has poor-prognostic associations independently of tumor microenvironment markers. Multiregional transcriptomics uncover cancer cell-intrinsic and low-heterogeneity signals that recapitulate the intrinsic CMSs proposed by single-cell sequencing. Further subclassification identifies congruent CMSs that explain a larger proportion of variation in patient survival than intra-tumor heterogeneity. Plasticity is indicated by discordant intrinsic phenotypes of matched primary and metastatic tumors. We conclude that multiregional sampling reconciles the prognostic power of tumor classifications from single-cell and bulk transcriptomics in the context of intra-tumor heterogeneity, and phenotypic plasticity challenges the reconciliation of primary and metastatic subtypes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    乳腺癌中恶性细胞的高度异质性和复杂的相互作用已被认为是癌症进展和治疗失败的驱动因素。然而,对常见癌细胞状态及其潜在驱动因素的全面了解仍然很少且具有挑战性.这里,通过对乳腺癌单细胞RNA测序数据的整合分析,我们揭示了7种跨患者共同复发的癌细胞状态.独特的生物学功能,特定亚型的分布,在多个独立数据集中,系统阐明并验证了潜在的起源细胞和共有癌细胞状态的相互关系.我们进一步发现了肿瘤微环境中的内部调节子和外部细胞成分,这有助于形成一致的癌细胞状态。使用状态特定的签名,我们还通过对大型乳腺癌RNA-seq队列的去卷积来推断具有每种共有癌细胞状态的细胞的丰度,揭示免疫相关状态与更好的生存率的关联。我们的研究为乳腺癌的癌细胞状态组成和潜在的治疗策略提供了新的见解。
    High heterogeneity and complex interactions of malignant cells in breast cancer has been recognized as a driver of cancer progression and therapeutic failure. However, complete understanding of common cancer cell states and their underlying driver factors remain scarce and challenging. Here, we revealed seven consensus cancer cell states recurring cross patients by integrative analysis of single-cell RNA sequencing data of breast cancer. The distinct biological functions, the subtype-specific distribution, the potential cells of origin and the interrelation of consensus cancer cell states were systematically elucidated and validated in multiple independent datasets. We further uncovered the internal regulons and external cell components in tumor microenvironments, which contribute to the consensus cancer cell states. Using the state-specific signature, we also inferred the abundance of cells with each consensus cancer cell state by deconvolution of large breast cancer RNA-seq cohorts, revealing the association of immune-related state with better survival. Our study provides new insights for the cancer cell state composition and potential therapeutic strategies of breast cancer.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:单细胞RNA测序(scRNA-seq)数据,按单元格类型注释,可用于各种下游生物应用,例如在单细胞水平分析基因表达。然而,手动分配这些注释与已知的标记基因是耗时和主观的。
    结果:我们提出了一种基于图卷积网络(GCN)的方法来自动化注释过程。我们的流程建立在现有的标签方法上,使用最先进的工具通过共识找到具有高度自信的标签分配的细胞,并用半监督GCN传播这些自信的标签。使用来自不同组织的模拟数据和两个scRNA-seq数据集,我们表明,我们的方法提高了精度比一个简单的共识算法和基础工具的平均值。我们还将我们的方法与非参数邻居多数方法进行了比较,显示出可比的结果。然后,我们证明我们的GCN方法允许特征解释,确定细胞类型分类的重要基因。我们展示了我们完成的管道,写在PyTorch,作为自动化和解释scRNA-seq数据分类的端到端工具。
    方法:我们用于进行本文中的实验和使用我们的模型的代码可在https://github.com/lewinsohndp/scSHARP获得。
    Single-cell RNA sequencing (scRNA-seq) data, annotated by cell type, is useful in a variety of downstream biological applications, such as profiling gene expression at the single-cell level. However, manually assigning these annotations with known marker genes is both time-consuming and subjective.
    We present a Graph Convolutional Network (GCN)-based approach to automate the annotation process. Our process builds upon existing labeling approaches, using state-of-the-art tools to find cells with highly confident label assignments through consensus and spreading these confident labels with a semi-supervised GCN. Using simulated data and two scRNA-seq datasets from different tissues, we show that our method improves accuracy over a simple consensus algorithm and the average of the underlying tools. We also compare our method to a nonparametric neighbor majority approach, showing comparable results. We then demonstrate that our GCN method allows for feature interpretation, identifying important genes for cell type classification. We present our completed pipeline, written in PyTorch, as an end-to-end tool for automating and interpreting the classification of scRNA-seq data.
    Our code for conducting the experiments in this paper and using our model is available at https://github.com/lewinsohndp/scSHARP.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:近年来,单细胞RNA测序(scRNA-seq)的引入使得能够以前所未有的粒度和处理速度分析细胞的转录组。应用该技术的实验结果是包含M基因和N细胞样品的聚集的mRNA表达计数的[公式:参见正文]矩阵。从这个矩阵中,科学家可以研究细胞蛋白质合成如何响应各种因素而变化,例如,疾病与非疾病状态对治疗方案的反应。这项技术的关键挑战是检测和准确记录低表达的基因。因此,低表达水平往往会被错过并记录为零-一个被称为dropout的事件。这使得低表达的基因与真正的零表达没有区别,并且与相同类型的细胞中存在的低表达不同。这个问题使得任何后续的下游分析变得困难。
    结果:为了解决这个问题,我们提出了一种使用共识聚类来测量细胞相似性的方法,并展示了一种有效且高效的算法,该算法利用这种新的相似性度量来估算scRNA-seq数据集中最可能的丢失事件。我们证明了我们的方法超过了现有插补方法的性能,同时引入了最少的新噪声,这是通过对具有已知小区身份的数据集上的性能特征进行聚类来衡量的。
    结论:ccImpute是一种有效的算法,可以纠正丢失事件,从而改善对scRNA-seq数据的下游分析。ccImpute在R中实现,可在https://github.com/khazum/ccImpute获得。
    BACKGROUND: In recent years, the introduction of single-cell RNA sequencing (scRNA-seq) has enabled the analysis of a cell\'s transcriptome at an unprecedented granularity and processing speed. The experimental outcome of applying this technology is a [Formula: see text] matrix containing aggregated mRNA expression counts of M genes and N cell samples. From this matrix, scientists can study how cell protein synthesis changes in response to various factors, for example, disease versus non-disease states in response to a treatment protocol. This technology\'s critical challenge is detecting and accurately recording lowly expressed genes. As a result, low expression levels tend to be missed and recorded as zero - an event known as dropout. This makes the lowly expressed genes indistinguishable from true zero expression and different than the low expression present in cells of the same type. This issue makes any subsequent downstream analysis difficult.
    RESULTS: To address this problem, we propose an approach to measure cell similarity using consensus clustering and demonstrate an effective and efficient algorithm that takes advantage of this new similarity measure to impute the most probable dropout events in the scRNA-seq datasets. We demonstrate that our approach exceeds the performance of existing imputation approaches while introducing the least amount of new noise as measured by clustering performance characteristics on datasets with known cell identities.
    CONCLUSIONS: ccImpute is an effective algorithm to correct for dropout events and thus improve downstream analysis of scRNA-seq data. ccImpute is implemented in R and is available at https://github.com/khazum/ccImpute .
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    得益于高通量测序技术的发展,已经积累了大量的各种生物分子数据,以彻底改变基因组学和分子生物学的研究。分析这种生物分子数据的主要挑战之一是将它们的亚型聚类成亚群以促进随后的下游分析。最近,已经开发了许多聚类方法来解决生物分子数据。然而,计算方法经常受到许多限制,如高维,数据异质性和噪声。
    在我们的研究中,我们开发了一种新颖的基于图的多层次一致性聚类(GMHCC)方法,该方法采用无监督的基于图的特征排序(FR)和基于图的链接方法,以探索多种类型生物分子数据的一致性聚类的底层分区的多个层次信息.的确,我们首先建议使用基于图的无监督FR模型来测量每个特征,方法是在成对特征上构建一个图,然后为每个特征提供一个秩。随后,为了保持基本分区(BP)的多样性和鲁棒性,我们提出了多个不同的特征子集来生成多个BP,然后通过细化全局一致性函数来探索多个BP的层次结构。最后,我们开发了一种新的基于图的链接方法,,它明确考虑集群之间的关系以生成最终分区。对包括35个癌症基因表达数据集和8个单细胞RNA-seq数据集在内的多种类型的生物分子数据的实验验证了我们的方法相对于几种最先进的共识聚类方法的有效性。此外,差异基因分析,进行了基因本体论富集分析和KEGG通路分析,提供对细胞发育谱系和表征机制的新见解。
    源代码可在GitHub获得:https://github.com/yifuLu/GMHCC。软件和支持数据可从以下网址下载:https://figshare.com/articles/software/GMHCC/17111291。
    补充数据可在生物信息学在线获得。
    Thanks to the development of high-throughput sequencing technologies, massive amounts of various biomolecular data have been accumulated to revolutionize the study of genomics and molecular biology. One of the main challenges in analyzing this biomolecular data is to cluster their subtypes into subpopulations to facilitate subsequent downstream analysis. Recently, many clustering methods have been developed to address the biomolecular data. However, the computational methods often suffer from many limitations such as high dimensionality, data heterogeneity and noise.
    In our study, we develop a novel Graph-based Multiple Hierarchical Consensus Clustering (GMHCC) method with an unsupervised graph-based feature ranking (FR) and a graph-based linking method to explore the multiple hierarchical information of the underlying partitions of the consensus clustering for multiple types of biomolecular data. Indeed, we first propose to use a graph-based unsupervised FR model to measure each feature by building a graph over pairwise features and then providing each feature with a rank. Subsequently, to maintain the diversity and robustness of basic partitions (BPs), we propose multiple diverse feature subsets to generate several BPs and then explore the hierarchical structures of the multiple BPs by refining the global consensus function. Finally, we develop a new graph-based linking method, which explicitly considers the relationships between clusters to generate the final partition. Experiments on multiple types of biomolecular data including 35 cancer gene expression datasets and eight single-cell RNA-seq datasets validate the effectiveness of our method over several state-of-the-art consensus clustering approaches. Furthermore, differential gene analysis, gene ontology enrichment analysis and KEGG pathway analysis are conducted, providing novel insights into cell developmental lineages and characterization mechanisms.
    The source code is available at GitHub: https://github.com/yifuLu/GMHCC. The software and the supporting data can be downloaded from: https://figshare.com/articles/software/GMHCC/17111291.
    Supplementary data are available at Bioinformatics online.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    阿尔茨海默病(AD)是最常见的痴呆,以进行性认知障碍和神经变性为特征。广泛的临床和基因组研究揭示了生物标志物,危险因素,通路,以及过去十年AD的目标。然而,AD发生和进展的确切分子基础仍然难以捉摸。新兴的单细胞测序技术可以潜在地提供对疾病的细胞水平见解。在这里,我们系统地回顾了最先进的生物信息学方法来分析单细胞测序数据及其在14个主要方向上的应用。包括1)质量控制和标准化,2)降维和特征提取,3)细胞聚类分析,4)细胞类型的推断和注释,5)差异表达,6)轨迹推断,7)拷贝数变异分析,8)整合单细胞多组学,9)表观基因组分析,10)基因网络推断,11)细胞亚群的优先排序,12)人和小鼠sc-RNA-seq数据的整合分析,13)空间转录组学,和14)比较单细胞AD小鼠模型研究和单细胞人AD研究。我们还解决了使用人类死后和小鼠组织的挑战,并概述了单细胞测序数据分析的未来发展。重要的是,我们已经为每个主要分析方向实施了推荐的工作流程,并将其应用于AD中的大型单核RNA测序(snRNA-seq)数据集.报告关键分析结果,同时通过GitHub与研究社区共享脚本和数据。总之,这篇全面的综述提供了分析单细胞测序数据的各种方法的见解,并为研究设计和各种分析方向提供了具体指南。审查和伴随的软件工具将作为研究AD的细胞和分子机制的宝贵资源,其他疾病,或单细胞水平的生物系统。
    Alzheimer\'s disease (AD) is the most common form of dementia, characterized by progressive cognitive impairment and neurodegeneration. Extensive clinical and genomic studies have revealed biomarkers, risk factors, pathways, and targets of AD in the past decade. However, the exact molecular basis of AD development and progression remains elusive. The emerging single-cell sequencing technology can potentially provide cell-level insights into the disease. Here we systematically review the state-of-the-art bioinformatics approaches to analyze single-cell sequencing data and their applications to AD in 14 major directions, including 1) quality control and normalization, 2) dimension reduction and feature extraction, 3) cell clustering analysis, 4) cell type inference and annotation, 5) differential expression, 6) trajectory inference, 7) copy number variation analysis, 8) integration of single-cell multi-omics, 9) epigenomic analysis, 10) gene network inference, 11) prioritization of cell subpopulations, 12) integrative analysis of human and mouse sc-RNA-seq data, 13) spatial transcriptomics, and 14) comparison of single cell AD mouse model studies and single cell human AD studies. We also address challenges in using human postmortem and mouse tissues and outline future developments in single cell sequencing data analysis. Importantly, we have implemented our recommended workflow for each major analytic direction and applied them to a large single nucleus RNA-sequencing (snRNA-seq) dataset in AD. Key analytic results are reported while the scripts and the data are shared with the research community through  GitHub. In summary, this comprehensive review provides insights into various approaches to analyze single cell sequencing data and offers specific guidelines for study design and a variety of analytic directions. The review and the accompanied software tools will serve as a valuable resource for studying cellular and molecular mechanisms of AD, other diseases, or biological systems at the single cell level.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    单细胞转录组学可以在一个实验中分析数千个细胞,并识别新的细胞类型。各种组织和生物体中的状态和动力学。已经开发了标准实验方案和分析工作流程以从组织创建单细胞转录组学图谱。本教程重点介绍如何解释这些数据以识别细胞类型,状态和其他生物学相关模式,目的是创建带注释的细胞图。我们建议使用三步工作流程,包括自动单元格注释(尽可能),手动单元格注释和验证。讨论了经常遇到的挑战,以及解决这些问题的策略。涵盖了可用于每个步骤的软件工具和资源的指导原则和具体建议,并包含R笔记本以帮助运行推荐的工作流。假设对计算机软件有基本的熟悉,和编程的基本知识(例如,在R语言中)是推荐的。
    Single-cell transcriptomics can profile thousands of cells in a single experiment and identify novel cell types, states and dynamics in a wide variety of tissues and organisms. Standard experimental protocols and analysis workflows have been developed to create single-cell transcriptomic maps from tissues. This tutorial focuses on how to interpret these data to identify cell types, states and other biologically relevant patterns with the objective of creating an annotated map of cells. We recommend a three-step workflow including automatic cell annotation (wherever possible), manual cell annotation and verification. Frequently encountered challenges are discussed, as well as strategies to address them. Guiding principles and specific recommendations for software tools and resources that can be used for each step are covered, and an R notebook is included to help run the recommended workflow. Basic familiarity with computer software is assumed, and basic knowledge of programming (e.g., in the R language) is recommended.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Single-cell RNA sequencing (scRNA-seq) is a popular and powerful technology that allows you to profile the whole transcriptome of a large number of individual cells. However, the analysis of the large volumes of data generated from these experiments requires specialized statistical and computational methods. Here we present an overview of the computational workflow involved in processing scRNA-seq data. We discuss some of the most common tasks and the tools available for addressing central biological questions. In this article and our companion website ( https://scrnaseq-course.cog.sanger.ac.uk/website/index.html ), we provide guidelines regarding best practices for performing computational analyses. This tutorial provides a hands-on guide for experimentalists interested in analyzing their data as well as an overview for bioinformaticians seeking to develop new computational methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号