RNA-seq

RNA - Seq
  • 文章类型: Journal Article
    背景:单细胞RNA测序(scRNA-seq)和空间分辨转录组学(SRT)导致了生命科学领域的突破性进展。为scRNA-seq和SRT数据开发生物信息学工具并执行无偏基准测试,通过提供明确的地面实况和生成定制的数据集,数据模拟已被广泛采用。然而,仿真方法在多种场景下的性能尚未得到全面评估,这使得在没有实际指导的情况下选择合适的方法变得具有挑战性。
    结果:我们在准确性方面系统地评估了为scRNA-seq和/或SRT数据开发的49种模拟方法,功能,可扩展性,和可用性使用来自24个平台的152个参考数据集。SRTsim,scDesign3,ZINB-WAVE,和scDesign2在各种平台上具有最佳的精度性能。出乎意料的是,一些针对scRNA-seq数据定制的方法对于模拟SRT数据具有潜在的兼容性。伦,斯帕西姆,和scDesign3-tree在相应的仿真场景下优于其他方法。Phenopath,伦,简单,和MFA产生高可扩展性得分,但它们不能生成真实的模拟数据。用户在做出决策时应考虑方法准确性和可伸缩性(或功能)之间的权衡。此外,执行错误主要是由于失败的参数估计和在计算中出现缺失或无限值引起的。我们提供了方法选择的实用指南,标准管道Simpipe(https://github.com/duohongrui/simpipe;https://doi.org/10.5281/zenodo.11178409),和在线工具Simsite(https://www.ciblab.net/软件/simshiny/)用于数据模拟。
    结论:没有一种方法在所有标准下都表现最好,因此,如果有效和合理地解决问题,建议使用一种好的但不是最好的方法。我们的全面工作为开发人员提供了有关基因表达数据建模的重要见解,并为用户提供了模拟过程。
    Single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) have led to groundbreaking advancements in life sciences. To develop bioinformatics tools for scRNA-seq and SRT data and perform unbiased benchmarks, data simulation has been widely adopted by providing explicit ground truth and generating customized datasets. However, the performance of simulation methods under multiple scenarios has not been comprehensively assessed, making it challenging to choose suitable methods without practical guidelines.
    We systematically evaluated 49 simulation methods developed for scRNA-seq and/or SRT data in terms of accuracy, functionality, scalability, and usability using 152 reference datasets derived from 24 platforms. SRTsim, scDesign3, ZINB-WaVE, and scDesign2 have the best accuracy performance across various platforms. Unexpectedly, some methods tailored to scRNA-seq data have potential compatibility for simulating SRT data. Lun, SPARSim, and scDesign3-tree outperform other methods under corresponding simulation scenarios. Phenopath, Lun, Simple, and MFA yield high scalability scores but they cannot generate realistic simulated data. Users should consider the trade-offs between method accuracy and scalability (or functionality) when making decisions. Additionally, execution errors are mainly caused by failed parameter estimations and appearance of missing or infinite values in calculations. We provide practical guidelines for method selection, a standard pipeline Simpipe ( https://github.com/duohongrui/simpipe ; https://doi.org/10.5281/zenodo.11178409 ), and an online tool Simsite ( https://www.ciblab.net/software/simshiny/ ) for data simulation.
    No method performs best on all criteria, thus a good-yet-not-the-best method is recommended if it solves problems effectively and reasonably. Our comprehensive work provides crucial insights for developers on modeling gene expression data and fosters the simulation process for users.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    结肠癌的分子异质性使肿瘤分类成为有效治疗的必要条件。结肠癌患者分子亚型分型的方法之一是共识分子亚型(CMS),由结直肠癌分型协会开发。CMS特异性RNA-Seq依赖性分类方法是最新的,敏感性和特异性相对较低。在这项研究中,我们的目的是使用患者的RNA-seq谱将患者分为CMS组.
    我们首先使用FuzzyC-Means算法和对数秩检验鉴定了亚型特异性和生存相关基因。然后,我们使用支持向量机和向后消除方法对患者进行分类。
    我们使用25个基因以最小的分类错误率优化了基于RNA-seq的分类。在这项研究中,我们报告了使用精度的分类性能,灵敏度,特异性,错误发现率,和平衡的准确性指标。
    我们以最小的分类错误率提供了结肠癌分类的基因列表,并观察到CMS3相关基因的最低灵敏度但最高特异性,由于该组的诊所患者数量少,差异显着。
    UNASSIGNED: The molecular heterogeneity of colon cancer has made classification of tumors a requirement for effective treatment. One of the approaches for molecular subtyping of colon cancer patients is the consensus molecular subtypes (CMS), developed by the Colorectal Cancer Subtyping Consortium. CMS-specific RNA-Seq-dependent classification approaches are recent, with relatively low sensitivity and specificity. In this study, we aimed to classify patients into CMS groups using their RNA-seq profiles.
    UNASSIGNED: We first identified subtype-specific and survival-associated genes using the Fuzzy C-Means algorithm and log-rank test. We then classified patients using support vector machines with backward elimination methodology.
    UNASSIGNED: We optimized RNA-seq-based classification using 25 genes with a minimum classification error rate. In this study, we reported the classification performance using precision, sensitivity, specificity, false discovery rate, and balanced accuracy metrics.
    UNASSIGNED: We present a gene list for colon cancer classification with minimum classification error rates and observed the lowest sensitivity but the highest specificity with CMS3-associated genes, which significantly differed due to the low number of patients in the clinic for this group.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    对肌肉浸润性膀胱癌(MIBC)的大型转录组学数据集的分析已导致共识分类。上尿路尿路上皮癌(UTUC)的分子亚型鲜为人知。我们的目标是通过表征手术治疗的≥pT1肿瘤的新队列来确定UTUC中共识分类的相关性。多重亚型IHC标记GATA3-CK5/6-TUBB2B,CK20,p16和Ki67,MMR蛋白,评估PD-L1IHC。形态学和/或亚型IHC评估异质性。通过焦磷酸测序鉴定FGFR3突变。我们进行了3个RNA-seq,包括在异质情况下的多重采样。共识课,无人监督的团体,使用基因表达确定微环境细胞丰度。66例患者中大部分为男性(77.3%),pT1(n=23,34.8%)或pT2-4期UTUC(n=43,65.2%)。在40%和4.7%的病例中发现FGFR3突变和dMMR状态,分别。共识亚型对UTUC进行了强有力的分类,并反映了内在的亚群。所有pT1肿瘤被分类为管腔乳头状瘤(LumP)。将我们的共识分类结果与之前发表的UTUC队列的结果相结合,LumP肿瘤占≥pT2UTUC的57.2%,明显高于MIBC。十名患者(15.2%)具有不同亚型的区域。共识类别与FGFR3突变相关,舞台,形态学和IHC。大多数LumP肿瘤的特点是低免疫浸润和PD-L1表达,特别是如果FGFR3突变。我们的研究表明,MIBC共识分类对UTUC进行了稳健分类,并强调了肿瘤内分子异质性。LumP的比例明显高于MIBCs。大多数LumP肿瘤显示低免疫浸润和PD-L1表达以及高比例的FGFR3突变。这些发现表明UTUC和MIBC患者对新疗法的不同反应。
    Analyses of large transcriptomics data sets of muscle-invasive bladder cancer (MIBC) have led to a consensus classification. Molecular subtypes of upper tract urothelial carcinomas (UTUCs) are less known. Our objective was to determine the relevance of the consensus classification in UTUCs by characterizing a novel cohort of surgically treated ≥pT1 tumors. Using immunohistochemistry (IHC), subtype markers GATA3-CK5/6-TUBB2B in multiplex, CK20, p16, Ki67, mismatch repair system proteins, and PD-L1 were evaluated. Heterogeneity was assessed morphologically and/or with subtype IHC. FGFR3 mutations were identified by pyrosequencing. We performed 3\'RNA sequencing of each tumor, with multisampling in heterogeneous cases. Consensus classes, unsupervised groups, and microenvironment cell abundance were determined using gene expression. Most of the 66 patients were men (77.3%), with pT1 (n = 23, 34.8%) or pT2-4 stage UTUC (n = 43, 65.2%). FGFR3 mutations and mismatch repair-deficient status were identified in 40% and 4.7% of cases, respectively. Consensus subtypes robustly classified UTUCs and reflected intrinsic subgroups. All pT1 tumors were classified as luminal papillary (LumP). Combining our consensus classification results with those of previously published UTUC cohorts, LumP tumors represented 57.2% of ≥pT2 UTUCs, which was significantly higher than MIBCs. Ten patients (15.2%) harbored areas of distinct subtypes. Consensus classes were associated with FGFR3 mutations, stage, morphology, and IHC. The majority of LumP tumors were characterized by low immune infiltration and PD-L1 expression, in particular, if FGFR3 mutated. Our study shows that MIBC consensus classification robustly classified UTUCs and highlighted intratumoral molecular heterogeneity. The proportion of LumP was significantly higher in UTUCs than in MIBCs. Most LumP tumors showed low immune infiltration and PD-L1 expression and high proportion of FGFR3 mutations. These findings suggest differential response to novel therapies between patients with UTUC and those with MIBC.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    单细胞RNA测序(scRNA-seq)在生物医学研究中的应用促进了我们对疾病发病机理的理解,并为新的诊断和治疗策略提供了有价值的见解。随着高通量scRNA-seq容量的扩大,包括临床样本,对于进入这一领域的研究人员来说,对这些大量数据的分析已经成为一个令人生畏的前景。这里,我们回顾了典型的scRNA-seq数据分析的工作流程,涵盖原始数据处理和质量控制,适用于几乎所有scRNA-seq数据集的基本数据分析,和先进的数据分析,应针对特定的科学问题。在总结每个分析步骤的当前方法的同时,我们还提供了一个软件和包装脚本的在线存储库来支持实施。指出了一些特定分析任务和方法的建议和注意事项。我们希望这个资源将有助于研究人员参与scRNA-seq,特别是对于新兴的临床应用。
    The application of single-cell RNA sequencing (scRNA-seq) in biomedical research has advanced our understanding of the pathogenesis of disease and provided valuable insights into new diagnostic and therapeutic strategies. With the expansion of capacity for high-throughput scRNA-seq, including clinical samples, the analysis of these huge volumes of data has become a daunting prospect for researchers entering this field. Here, we review the workflow for typical scRNA-seq data analysis, covering raw data processing and quality control, basic data analysis applicable for almost all scRNA-seq data sets, and advanced data analysis that should be tailored to specific scientific questions. While summarizing the current methods for each analysis step, we also provide an online repository of software and wrapped-up scripts to support the implementation. Recommendations and caveats are pointed out for some specific analysis tasks and approaches. We hope this resource will be helpful to researchers engaging with scRNA-seq, in particular for emerging clinical applications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    由鸟分枝杆菌亚种引起的约翰氏病。副结核病(MAP)是乳制品行业的主要关注点。因为,该病的发病机制尚不清楚,有必要开发一种方法来高度自信地发现这种疾病背后的分子机制。生物学研究经常遇到重复性问题。缺乏从不同的数据集中找到稳定的共表达网络模块的方法,这促使我们提出一个计算管道来识别未保留的共识模块。分析了与MAP感染相关的两个RNA-Seq数据集,和共识模块被检测并进行保存分析。两个数据集中的未保留的共识模块被确定为其连通性和密度受疾病影响的模块。鉴定了未保留的共有模块中的长链非编码RNA(lncRNA)和TF基因以构建lncRNA-mRNA-TF的整合网络。这些网络由蛋白质-蛋白质相互作用(PPIs)网络证实。此外,两个数据集之间重叠的hub基因被认为是共识模块的hub基因.在66个共识模块中,21个模块是未保留的共识模块,在两个数据集中都很常见,619个hub基因是这些模块的成员。此外,在12个和19个未保留的共有模块中鉴定了34个lncRNA和152个TF基因,分别。17个未保留共识模块中的预测PPI具有重要意义,在共表达和PPI网络中通常鉴定出283个hub基因。功能富集分析显示,21个模块中有8个显著富集了与约翰病相关的生物过程,包括炎症反应,\"\"白细胞介素-1介导的信号通路\",“I型干扰素信号通路,“”细胞因子介导的信号通路,“\”干扰素β生产的调节,“和”对干扰素-γ的反应。\"此外,一些基因(hubmRNA,TF,和lncRNA)被引入作为约翰疾病发病机理的潜在候选者,例如TLR2,NFKB1,IRF1,ATF3,TREM1,CDH26,HMGB1,STAT1,ISG15,CASS3。这项研究扩大了我们对约翰病分子机制的认识,提出的管道使我们能够取得更多有效的结果。
    Johne\'s disease caused by Mycobacterium avium subsp. paratuberculosis (MAP) is a major concern in dairy industry. Since, the pathogenesis of the disease is not clearly known, it is necessary to develop an approach to discover molecular mechanisms behind this disease with high confidence. Biological studies often suffer from issues with reproducibility. Lack of a method to find stable modules in co-expression networks from different datasets related to Johne\'s disease motivated us to present a computational pipeline to identify non-preserved consensus modules. Two RNA-Seq datasets related to MAP infection were analyzed, and consensus modules were detected and were subjected to the preservation analysis. The non-preserved consensus modules in both datasets were determined as they are modules whose connectivity and density are affected by the disease. Long non-coding RNAs (lncRNAs) and TF genes in the non-preserved consensus modules were identified to construct integrated networks of lncRNA-mRNA-TF. These networks were confirmed by protein-protein interactions (PPIs) networks. Also, the overlapped hub genes between two datasets were considered hub genes of the consensus modules. Out of 66 consensus modules, 21 modules were non-preserved consensus modules, which were common in both datasets and 619 hub genes were members of these modules. Moreover, 34 lncRNA and 152 TF genes were identified in 12 and 19 non-preserved consensus modules, respectively. The predicted PPIs in 17 non-preserved consensus modules were significant, and 283 hub genes were commonly identified in both co-expression and PPIs networks. Functional enrichment analysis revealed that eight out of 21 modules were significantly enriched for biological processes associated with Johne\'s disease including \"inflammatory response,\" \"interleukin-1-mediated signaling pathway\", \"type I interferon signaling pathway,\" \"cytokine-mediated signaling pathway,\" \"regulation of interferon-beta production,\" and \"response to interferon-gamma.\" Moreover, some genes (hub mRNA, TF, and lncRNA) were introduced as potential candidates for Johne\'s disease pathogenesis such as TLR2, NFKB1, IRF1, ATF3, TREM1, CDH26, HMGB1, STAT1, ISG15, CASP3. This study expanded our knowledge of molecular mechanisms involved in Johne\'s disease, and the presented pipeline enabled us to achieve more valid results.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    转座因子(TE)在关键的生物学途径中起着关键作用。因此,最近开发了几种能够量化其表达的工具。然而,许多现有工具缺乏区分自主表达的TE和嵌入规范编码/非编码非TE转录物中的TE片段的转录的能力。因此,给定TE表达的明显变化可以简单地反映含有TE衍生序列的转录物表达的变化。为了克服这个问题,我们开发了TEspex,在共识水平上量化TE表达的管道。TEspeX使用IlluminaRNA-seq短读段来定量TE表达,避免计数源自嵌入规范转录物中的非活性TE片段的读段。
    该工具在python3中实现,根据GNU通用公共许可证(GPL)分发,并可在Github上获得,网址为https://github.com/fansalon/TEspeX(ZenodoURL:https://doi.org/10.5281/zenodo.6800331)。
    补充数据可在生物信息学在线获得。
    Transposable elements (TEs) play key roles in crucial biological pathways. Therefore, several tools enabling the quantification of their expression were recently developed. However, many of the existing tools lack the capability to distinguish between the transcription of autonomously expressed TEs and TE fragments embedded in canonical coding/non-coding non-TE transcripts. Consequently, an apparent change in the expression of a given TE may simply reflect the variation in the expression of the transcripts containing TE-derived sequences. To overcome this issue, we have developed TEspeX, a pipeline for the quantification of TE expression at the consensus level. TEspeX uses Illumina RNA-seq short reads to quantify TE expression avoiding counting reads deriving from inactive TE fragments embedded in canonical transcripts.
    The tool is implemented in python3, distributed under the GNU General Public License (GPL) and available on Github at https://github.com/fansalon/TEspeX (Zenodo URL: https://doi.org/10.5281/zenodo.6800331).
    Supplementary data are available at Bioinformatics online.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:近年来,单细胞RNA测序(scRNA-seq)的引入使得能够以前所未有的粒度和处理速度分析细胞的转录组。应用该技术的实验结果是包含M基因和N细胞样品的聚集的mRNA表达计数的[公式:参见正文]矩阵。从这个矩阵中,科学家可以研究细胞蛋白质合成如何响应各种因素而变化,例如,疾病与非疾病状态对治疗方案的反应。这项技术的关键挑战是检测和准确记录低表达的基因。因此,低表达水平往往会被错过并记录为零-一个被称为dropout的事件。这使得低表达的基因与真正的零表达没有区别,并且与相同类型的细胞中存在的低表达不同。这个问题使得任何后续的下游分析变得困难。
    结果:为了解决这个问题,我们提出了一种使用共识聚类来测量细胞相似性的方法,并展示了一种有效且高效的算法,该算法利用这种新的相似性度量来估算scRNA-seq数据集中最可能的丢失事件。我们证明了我们的方法超过了现有插补方法的性能,同时引入了最少的新噪声,这是通过对具有已知小区身份的数据集上的性能特征进行聚类来衡量的。
    结论:ccImpute是一种有效的算法,可以纠正丢失事件,从而改善对scRNA-seq数据的下游分析。ccImpute在R中实现,可在https://github.com/khazum/ccImpute获得。
    BACKGROUND: In recent years, the introduction of single-cell RNA sequencing (scRNA-seq) has enabled the analysis of a cell\'s transcriptome at an unprecedented granularity and processing speed. The experimental outcome of applying this technology is a [Formula: see text] matrix containing aggregated mRNA expression counts of M genes and N cell samples. From this matrix, scientists can study how cell protein synthesis changes in response to various factors, for example, disease versus non-disease states in response to a treatment protocol. This technology\'s critical challenge is detecting and accurately recording lowly expressed genes. As a result, low expression levels tend to be missed and recorded as zero - an event known as dropout. This makes the lowly expressed genes indistinguishable from true zero expression and different than the low expression present in cells of the same type. This issue makes any subsequent downstream analysis difficult.
    RESULTS: To address this problem, we propose an approach to measure cell similarity using consensus clustering and demonstrate an effective and efficient algorithm that takes advantage of this new similarity measure to impute the most probable dropout events in the scRNA-seq datasets. We demonstrate that our approach exceeds the performance of existing imputation approaches while introducing the least amount of new noise as measured by clustering performance characteristics on datasets with known cell identities.
    CONCLUSIONS: ccImpute is an effective algorithm to correct for dropout events and thus improve downstream analysis of scRNA-seq data. ccImpute is implemented in R and is available at https://github.com/khazum/ccImpute .
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    多发性硬化症(MS)是一种以脱髓鞘为特征的中枢神经系统慢性炎症性疾病,这导致白质病变(WMLs)和灰质病变(GMLs)的形成。最近,大量的转录组学或蛋白质组学研究工作探索MS,但是很少有研究集中在转录组学中GML和WML之间的差异和相似性。此外,WML和GML之间有惊人的病理学差异,例如,WML和GML之间的浸润免疫细胞的类型和丰度存在差异。这里,我们使用共识加权基因共表达网络分析(WGCNA),单样本基因集富集分析(ssGSEA),和机器学习方法来识别GML和WML之间MS的转录组差异和相似性,并找到它们之间具有显著差异或相似性的共表达模块。通过加权共表达网络分析和ssGSEA分析,CD56明亮的自然杀伤细胞被确定为MS的关键免疫浸润因子,无论是在GM还是WM。我们还发现两组之间的共表达网络非常相似(密度=0.79),28个差异表达基因(DEGs)分布在午夜蓝模块中,与GM中CD56明亮的自然杀伤细胞最相关。同时,我们还发现模块之间存在巨大的差异,例如暗红色模块和浅色模块之间的差异,这些差异可能与模块中基因的功能有关。
    Multiple sclerosis (MS) is a chronic inflammatory disease of the central nervous system characterized by demyelination, which leads to the formation of white matter lesions (WMLs) and gray matter lesions (GMLs). Recently, a large amount of transcriptomics or proteomics research works explored MS, but few studies focused on the differences and similarities between GMLs and WMLs in transcriptomics. Furthermore, there are astonishing pathological differences between WMLs and GMLs, for example, there are differences in the type and abundance of infiltrating immune cells between WMLs and GMLs. Here, we used consensus weighted gene co-expression network analysis (WGCNA), single-sample gene set enrichment analysis (ssGSEA), and machine learning methods to identify the transcriptomic differences and similarities of the MS between GMLs and WMLs, and to find the co-expression modules with significant differences or similarities between them. Through weighted co-expression network analysis and ssGSEA analysis, CD56 bright natural killer cell was identified as the key immune infiltration factor in MS, whether in GM or WM. We also found that the co-expression networks between the two groups are quite similar (density = 0.79), and 28 differentially expressed genes (DEGs) are distributed in the midnightblue module, which is most related to CD56 bright natural killer cell in GM. Simultaneously, we also found that there are huge disparities between the modules, such as divergences between darkred module and lightyellow module, and these divergences may be relevant to the functions of the genes in the modules.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    人类参考基因组是现代基因组分析的基础。然而,以目前的形式,它不能充分代表人类的巨大遗传多样性。在这项研究中,我们探索了作为当前参考基因组的潜在后继的共有基因组,并评估了其对RNA-seq读数比对准确性的影响.为了找到最好的单倍体基因组代表,我们构建了泛人类的共识基因组,超人口,和人口水平,使用来自1000基因组计划联盟的变异信息。使用个人单倍体基因组作为基本事实,我们比较了与共有基因组和参考基因组比对的真实RNA-seq读数的作图误差.对于读数重叠的纯合变体,我们发现,当参考被全人类共有基因组取代时,作图误差减少了约2-3倍.我们还发现,与使用泛人类共识相比,使用更多针对人群的共识几乎没有增加,这表明整合更具体的基因组变异的效用受到限制。用共识基因组代替参考会影响功能分析,例如同工型的差异表达,基因,和拼接点。
    The Human Reference Genome serves as the foundation for modern genomic analyses. However, in its present form, it does not adequately represent the vast genetic diversity of the human population. In this study, we explored the consensus genome as a potential successor of the current reference genome and assessed its effect on the accuracy of RNA-seq read alignment. To find the best haploid genome representation, we constructed consensus genomes at the pan-human, superpopulation, and population levels, using variant information from The 1000 Genomes Project Consortium. Using personal haploid genomes as the ground truth, we compared mapping errors for real RNA-seq reads aligned to the consensus genomes versus the reference genome. For reads overlapping homozygous variants, we found that the mapping error decreased by a factor of approximately two to three when the reference was replaced with the pan-human consensus genome. We also found that using more population-specific consensuses resulted in little to no increase over using the pan-human consensus, suggesting a limit in the utility of incorporating a more specific genomic variation. Replacing the reference with consensus genomes impacts functional analyses, such as differential expressions of isoforms, genes, and splice junctions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号