limma

Limma
  • 文章类型: Journal Article
    2019年冠状病毒病(COVID-19)大流行在全球产生了重大影响,导致更高的死亡人数和幸存者持续的健康问题,特别是那些有预先存在的医疗条件。许多研究表明,灾难性的COVID-19结果与糖尿病之间存在很强的相关性。为了获得更深入的见解,我们分析了COVID-19和糖尿病周围神经病患者的转录组数据集.使用R编程语言,差异表达基因(DEGs)进行鉴定和分类的基础上,向上和向下的规定。然后在这些组之间探索DEG的重叠。使用基因本体论(GO)对这些常见DEG进行功能注释,京都基因和基因组百科全书(KEGG),生物星球,Reactome,和Wiki途径。使用生物信息学工具创建了蛋白质-蛋白质相互作用(PPI)网络,以了解分子相互作用。通过对PPI网络的拓扑分析,我们确定了hub基因模块并探索了基因调控网络(GRN).此外,该研究扩展到基于综合分析为已鉴定的相互DEG提出潜在的药物分子.通过深入了解潜在的治疗干预措施,这些方法可能有助于了解COVID-19在糖尿病周围神经病变患者中的分子复杂性。
    The coronavirus disease 2019 (COVID-19) pandemic has had a significant impact globally, resulting in a higher death toll and persistent health issues for survivors, particularly those with pre-existing medical conditions. Numerous studies have demonstrated a strong correlation between catastrophic COVID-19 results and diabetes. To gain deeper insights, we analysed the transcriptome dataset from COVID-19 and diabetic peripheral neuropathic patients. Using the R programming language, differentially expressed genes (DEGs) were identified and classified based on up and down regulations. The overlaps of DEGs were then explored between these groups. Functional annotation of those common DEGs was performed using Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), Bio-Planet, Reactome, and Wiki pathways. A protein-protein interaction (PPI) network was created with bioinformatics tools to understand molecular interactions. Through topological analysis of the PPI network, we determined hub gene modules and explored gene regulatory networks (GRN). Furthermore, the study extended to suggesting potential drug molecules for the identified mutual DEG based on the comprehensive analysis. These approaches may contribute to understanding the molecular intricacies of COVID-19 in diabetic peripheral neuropathy patients through insights into potential therapeutic interventions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:表达蛋白质组学涉及系统内蛋白质丰度的整体评估。反过来,差异表达分析可用于研究扰动此类系统后蛋白质丰度的变化。方法:这里,我们提供了处理的工作流程,基于质谱的定量表达蛋白质组学数据的分析和解释。该工作流程利用Bioconductor项目的开源R软件包,并指导用户端到端和逐步完成分析的每个阶段。作为用例,我们从有和没有处理的HEK293细胞产生表达蛋白质组学数据。值得注意的是,实验包括使用串联质量标签(TMT)技术标记的细胞蛋白和使用无标记定量(LFQ)定量的分泌蛋白。结果:工作流程在专注于数据导入之前解释了软件基础架构,预处理和质量控制。这对于TMT和LFQ数据集单独完成。证明了统计差异表达分析的应用,然后通过基因本体论富集分析进行解释。结论:处理的全面工作流程,表达蛋白质组学的分析和解释。该工作流是蛋白质组学社区的宝贵资源,特别是至少熟悉R的初学者,他们希望了解并做出有关其分析的数据驱动决策。
    Background: Expression proteomics involves the global evaluation of protein abundances within a system. In turn, differential expression analysis can be used to investigate changes in protein abundance upon perturbation to such a system. Methods: Here, we provide a workflow for the processing, analysis and interpretation of quantitative mass spectrometry-based expression proteomics data. This workflow utilizes open-source R software packages from the Bioconductor project and guides users end-to-end and step-by-step through every stage of the analyses. As a use-case we generated expression proteomics data from HEK293 cells with and without a treatment. Of note, the experiment included cellular proteins labelled using tandem mass tag (TMT) technology and secreted proteins quantified using label-free quantitation (LFQ). Results: The workflow explains the software infrastructure before focusing on data import, pre-processing and quality control. This is done individually for TMT and LFQ datasets. The application of statistical differential expression analysis is demonstrated, followed by interpretation via gene ontology enrichment analysis. Conclusions: A comprehensive workflow for the processing, analysis and interpretation of expression proteomics is presented. The workflow is a valuable resource for the proteomics community and specifically beginners who are at least familiar with R who wish to understand and make data-driven decisions with regards to their analyses.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    简介:随着RNA-seq技术和机器学习的进步,用机器学习模型训练来自数据库的大规模RNA-seq数据通常可以识别以前被标准线性分析方法错过的具有重要调节作用的基因。发现组织特异性基因可以提高我们对组织和基因之间关系的理解。然而,一些用于转录组数据的机器学习模型已经被部署和比较,以识别组织特异性基因,特别是对于植物。方法:在本研究中,用线性模型(Limma)处理表达式矩阵,机器学习模型(LightGBM),以及具有信息增益的深度学习模型(CNN)和基于从公共数据库获得的1,548个玉米多组织RNA-seq数据的SHAP策略,以识别组织特异性基因。在验证方面,基于基因集的k-均值聚类计算V-测量值以评估其技术互补性。此外,采用GO分析和文献检索的方法验证了这些基因的功能和研究现状。结果:基于聚类验证,卷积神经网络优于其他具有较高的V度量值0.647,表明其基因集可以覆盖尽可能多的各种组织的特定属性,而LightGBM发现了关键转录因子。三个基因集的组合产生了78个核心组织特异性基因,这些基因先前已在文献中显示具有生物学意义。讨论:由于机器学习模型的不同解释策略,确定了不同的组织特异性基因集,研究人员可以根据目标使用多种方法和策略来处理组织特异性基因集。数据类型,和计算资源。本研究为转录组数据集的大规模数据挖掘提供了比较见解,解决生物信息学数据处理中的高维和偏倚困难。
    Introduction: With the advancement of RNA-seq technology and machine learning, training large-scale RNA-seq data from databases with machine learning models can generally identify genes with important regulatory roles that were previously missed by standard linear analytic methodologies. Finding tissue-specific genes could improve our comprehension of the relationship between tissues and genes. However, few machine learning models for transcriptome data have been deployed and compared to identify tissue-specific genes, particularly for plants. Methods: In this study, an expression matrix was processed with linear models (Limma), machine learning models (LightGBM), and deep learning models (CNN) with information gain and the SHAP strategy based on 1,548 maize multi-tissue RNA-seq data obtained from a public database to identify tissue-specific genes. In terms of validation, V-measure values were computed based on k-means clustering of the gene sets to evaluate their technical complementarity. Furthermore, GO analysis and literature retrieval were used to validate the functions and research status of these genes. Results: Based on clustering validation, the convolutional neural network outperformed others with higher V-measure values as 0.647, indicating that its gene set could cover as many specific properties of various tissues as possible, whereas LightGBM discovered key transcription factors. The combination of three gene sets produced 78 core tissue-specific genes that had previously been shown in the literature to be biologically significant. Discussion: Different tissue-specific gene sets were identified due to the distinct interpretation strategy for machine learning models and researchers may use multiple methodologies and strategies for tissue-specific gene sets based on their goals, types of data, and computational resources. This study provided comparative insight for large-scale data mining of transcriptome datasets, shedding light on resolving high dimensions and bias difficulties in bioinformatics data processing.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在过去的几十年中,基因组生物学在其分析和计算方面取得了重大进展。差异基因表达是许多计算密集领域之一;它主要是在R编程语言下开发的。在这里,我们解释了基因表达数据中R优势的可能原因。接下来,我们讨论了Python在未来几年在这一研究领域具有竞争力的前景。我们指出Python已经可以用于单细胞差异基因表达领域。我们确定了Python中仍然缺少的部分以及改进的可能性。
    Genome biology shows substantial progress in its analytical and computational part in the last decades. Differential gene expression is one of many computationally intense areas; it is largely developed under R programming language. Here we explain possible reasons for such dominance of R in gene expression data. Next, we discuss the prospects for Python to become competitive in this area of research in coming years. We indicate that Python can be used already in a field of a single cell differential gene expression. We pinpoint still missing parts in Python and possibilities for improvement.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    关键区域的DNA甲基化与癌症发病机制和药物反应高度相关。然而,从大量潜在的多态性DNA甲基化位点中鉴定因果甲基化具有挑战性.这种高维数据带来了两个障碍:第一,许多已建立的统计模型无法扩展到这么多特征;第二,多重测试和过拟合变得严重。为此,迫切需要一种快速过滤候选位点以缩小下游分析目标的方法。BACkPAy是一种预筛选贝叶斯方法,用于在小样本量下检测潜在差异甲基化水平的生物学有意义的模式。BACkPAy通过贝叶斯错误发现率(FDR)方法优先考虑潜在重要的生物标志物。它过滤非信息网站(即,无差异)在整个实验条件下具有平坦的甲基化模式水平。在这项工作中,我们将BACkPAy应用于具有三种组织类型的全基因组甲基化数据集,每种类型包含三种胃癌样本.我们还应用LIMMA(微阵列和RNA-Seq数据的线性模型)将其结果与我们通过BACkPAy获得的结果进行比较。然后,利用Cox比例风险回归模型,用癌症基因组图谱(TCGA)数据可视化预后显著标志物,用于生存分析。使用BACKPAY,我们从DNA甲基化数据集中鉴定了8个有生物学意义的模式/差异探针组.使用TCGA数据,我们还确定了五个预后基因(即,预测胃癌的进展),包含一些差异甲基化探针,而在LIMMA中使用Benjamin-HochbergFDR没有发现显著结果。我们显示了使用BACkPAy分析胃癌中样本极小的DNA甲基化数据的重要性。我们发现RDH13,CLDN11,TMTC1,UCHL1和FOXP2可以作为胃癌治疗的预测生物标志物,血清中这五个基因的启动子甲基化水平可能在胃癌患者中具有预后和诊断功能。
    DNA methylations in critical regions are highly involved in cancer pathogenesis and drug response. However, to identify causal methylations out of a large number of potential polymorphic DNA methylation sites is challenging. This high-dimensional data brings two obstacles: first, many established statistical models are not scalable to so many features; second, multiple-test and overfitting become serious. To this end, a method to quickly filter candidate sites to narrow down targets for downstream analyses is urgently needed. BACkPAy is a pre-screening Bayesian approach to detect biological meaningful patterns of potential differential methylation levels with small sample size. BACkPAy prioritizes potentially important biomarkers by the Bayesian false discovery rate (FDR) approach. It filters non-informative sites (i.e., non-differential) with flat methylation pattern levels across experimental conditions. In this work, we applied BACkPAy to a genome-wide methylation dataset with three tissue types and each type contains three gastric cancer samples. We also applied LIMMA (Linear Models for Microarray and RNA-Seq Data) to compare its results with what we achieved by BACkPAy. Then, Cox proportional hazards regression models were utilized to visualize prognostics significant markers with The Cancer Genome Atlas (TCGA) data for survival analysis. Using BACkPAy, we identified eight biological meaningful patterns/groups of differential probes from the DNA methylation dataset. Using TCGA data, we also identified five prognostic genes (i.e., predictive to the progression of gastric cancer) that contain some differential methylation probes, whereas no significant results was identified using the Benjamin-Hochberg FDR in LIMMA. We showed the importance of using BACkPAy for the analysis of DNA methylation data with extremely small sample size in gastric cancer. We revealed that RDH13, CLDN11, TMTC1, UCHL1, and FOXP2 can serve as predictive biomarkers for gastric cancer treatment and the promoter methylation level of these five genes in serum could have prognostic and diagnostic functions in gastric cancer patients.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:冠状病毒(CoV)是一种新兴的人类病原体,在世界各地引起严重的急性呼吸道综合症(SARS)。早期识别SARS的生物标志物可以促进检测并降低疾病的死亡率。因此,通过集成网络分析和结构建模方法,我们旨在探索冠状病毒治疗SARS的潜在药物靶点和候选药物。方法:使用Limma对CoV感染的宿主基因(HGs)表达谱进行差异表达(DE)分析。选择高度整合的DE-CoV-HG来构建蛋白质-蛋白质相互作用(PPI)网络。结果:使用Walktrap算法高度互联的模块包括模块1(202个节点);模块2(126个节点)和模块3(121个节点)从PPI网络中检索到。MYC,HDAC9,NCOA3,CEBPB,VEGFA,BCL3,SMAD3,SMURF1,KLHL12,CBL,ERBB4和CRKL被确定为潜在的药物靶标(PDT),在CoV感染后在人类呼吸系统中高度表达。功能术语生长因子受体结合,c型凝集素受体信号,白细胞介素-1介导的信号,TAP依赖性抗原加工和通过MHCI类呈递肽抗原,刺激性T细胞受体信号,和先天免疫应答信号通路,在模块中富集了信号转导和细胞因子免疫信号通路。蛋白质-蛋白质对接结果显示ERBB4-3cLpro复合物的强结合亲和力(〜314.57kcal/mol),其被选作药物靶标。此外,分子动力学模拟表明ERBB4-3cLpro复合物的结构稳定性和灵活性。Further,Wortmannin被提议作为ERBB4的候选药物,通过抑制受体酪氨酸激酶依赖性巨噬细胞增多来控制SARS-CoV-2的发病机理。MAPK信号,和调节宿主细胞进入的NF-kb单个通路,复制,和调节宿主免疫系统。结论:我们得出的结论是,CoV药物靶标“ERBB4”和候选药物“Wortmannin”为新兴COVID-19的可能个性化治疗提供了见解。
    Background: Coronavirus (CoV) is an emerging human pathogen causing severe acute respiratory syndrome (SARS) around the world. Earlier identification of biomarkers for SARS can facilitate detection and reduce the mortality rate of the disease. Thus, by integrated network analysis and structural modeling approach, we aimed to explore the potential drug targets and the candidate drugs for coronavirus medicated SARS. Methods: Differentially expression (DE) analysis of CoV infected host genes (HGs) expression profiles was conducted by using the Limma. Highly integrated DE-CoV-HGs were selected to construct the protein-protein interaction (PPI) network.  Results: Using the Walktrap algorithm highly interconnected modules include module 1 (202 nodes); module 2 (126 nodes) and module 3 (121 nodes) modules were retrieved from the PPI network. MYC, HDAC9, NCOA3, CEBPB, VEGFA, BCL3, SMAD3, SMURF1, KLHL12, CBL, ERBB4, and CRKL were identified as potential drug targets (PDTs), which are highly expressed in the human respiratory system after CoV infection. Functional terms growth factor receptor binding, c-type lectin receptor signaling, interleukin-1 mediated signaling, TAP dependent antigen processing and presentation of peptide antigen via MHC class I, stimulatory T cell receptor signaling, and innate immune response signaling pathways, signal transduction and cytokine immune signaling pathways were enriched in the modules. Protein-protein docking results demonstrated the strong binding affinity (-314.57 kcal/mol) of the ERBB4-3cLpro complex which was selected as a drug target. In addition, molecular dynamics simulations indicated the structural stability and flexibility of the ERBB4-3cLpro complex. Further, Wortmannin was proposed as a candidate drug to ERBB4 to control SARS-CoV-2 pathogenesis through inhibit receptor tyrosine kinase-dependent macropinocytosis, MAPK signaling, and NF-kb singling pathways that regulate host cell entry, replication, and modulation of the host immune system. Conclusion: We conclude that CoV drug target \"ERBB4\" and candidate drug \"Wortmannin\" provide insights on the possible personalized therapeutics for emerging COVID-19.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    近年来,对于小样本量的功能磁共振成像(fMRI)研究一直存在显著的批评.论点是这样的研究具有较低的统计能力,以及降低具有统计学意义的结果成为真实效果的可能性。这些研究的普遍性导致了大量发表的结果不可复制并且可能是错误的情况。尽管有越来越多的证据,小样本功能磁共振成像研究继续定期进行;可能是由于扫描成本高。在本报告中,我们研究了使用适度的t统计量进行组水平的fMRI分析,以帮助缓解与小样本量相关的问题。拟议的方法,在流行的R包LIMMA(微阵列数据的线性模型)中实现,在基因组学文献中发现了处理类似问题的广泛用途。利用人类连接体项目(HCP)的基于任务的功能磁共振成像数据,我们比较了适度t统计量与标准t统计量的性能,以及非参数功能磁共振成像分析中常用的伪t统计量。我们发现,对于样本量小于40名受试者的研究,适度的t检验显着优于两种替代方法。Further,我们发现,当使用基于体素和基于聚类的阈值时,结果是一致的.我们还介绍了一个R包,LIMMI(医学图像的线性模型),这提供了一种快速便捷的方法来将该方法应用于fMRI数据。
    In recent years, there has been significant criticism of functional magnetic resonance imaging (fMRI) studies with small sample sizes. The argument is that such studies have low statistical power, as well as reduced likelihood for statistically significant results to be true effects. The prevalence of these studies has led to a situation where a large number of published results are not replicable and likely false. Despite this growing body of evidence, small sample fMRI studies continue to be regularly performed; likely due to the high cost of scanning. In this report we investigate the use of a moderated t-statistic for performing group-level fMRI analysis to help alleviate problems related to small sample sizes. The proposed approach, implemented in the popular R-package LIMMA (linear models for microarray data), has found wide usage in the genomics literature for dealing with similar issues. Utilizing task-based fMRI data from the Human Connectome Project (HCP), we compare the performance of the moderated t-statistic with the standard t-statistic, as well as the pseudo t-statistic commonly used in non-parametric fMRI analysis. We find that the moderated t-test significantly outperforms both alternative approaches for studies with sample sizes less than 40 subjects. Further, we find that the results were consistent both when using voxel-based and cluster-based thresholding. We also introduce an R-package, LIMMI (linear models for medical images), that provides a quick and convenient way to apply the method to fMRI data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Nipah virus (NiV) is an ssRNA, enveloped paramyxovirus in the genus Henipaveridae with a case fatality rate >70%. We analyzed the NGS RNA-Seq gene expression data of NiV to detect differentially expressed genes (DEGs) using the statistical R package limma. We used the Cytoscape, Ensembl, and STRING tools to construct the gene-gene interaction tree, phylogenetic gene tree and protein-protein interaction networks towards functional annotation. We identified 2707 DEGs (p-value <0.05) among 54359 NiV genes. The top-up and down-regulated DEGs were EPST1, MX1, IFIT3, RSAD2, OAS1, OASL, CMPK2 and SLFN13, SPAC977.17 using log2FC criteria with optimum threshold 1.0. The top 20 up-regulated gene-gene interaction trees showed no significant association between Nipah and Tularemia virus. Similarly, the top 20 down-regulated genes of neither Ebola nor Tularemia virus showed an association with the Nipah virus. Hence, we document the top-up and down-regulated DEGs for further consideration as biomarkers and candidates for vaccine or drug design against Nipah virus to combat infection.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    BACKGROUND: Different human responses to the same vaccine were frequently observed. For example, independent studies identified overlapping but different transcriptomic gene expression profiles in Yellow Fever vaccine 17D (YF-17D) immunized human subjects. Different experimental and analysis conditions were likely contributed to the observed differences. To investigate this issue, we developed a Vaccine Investigation Ontology (VIO), and applied VIO to classify the different variables and relations among these variables systematically. We then evaluated whether the ontological VIO modeling and VIO-based statistical analysis would contribute to the enhanced vaccine investigation studies and a better understanding of vaccine response mechanisms.
    RESULTS: Our VIO modeling identified many variables related to data processing and analysis such as normalization method, cut-off criteria, software settings including software version. The datasets from two previous studies on human responses to YF-17D vaccine, reported by Gaucher et al. (2008) and Querec et al. (2009), were re-analyzed. We first applied the same LIMMA statistical method to re-analyze the Gaucher data set and identified a big difference in terms of significantly differentiated gene lists compared to the original study. The different results were likely due to the LIMMA version and software package differences. Our second study re-analyzed both Gaucher and Querec data sets but with the same data processing and analysis pipeline. Significant differences in differential gene lists were also identified. In both studies, we found that Gene Ontology (GO) enrichment results had more overlapping than the gene lists and enriched pathway lists. The visualization of the identified GO hierarchical structures among the enriched GO terms and their associated ancestor terms using GOfox allowed us to find more associations among enriched but often different GO terms, demonstrating the usage of GO hierarchical relations enhance data analysis.
    CONCLUSIONS: The ontology-based analysis framework supports standardized representation, integration, and analysis of heterogeneous data of host responses to vaccines. Our study also showed that differences in specific variables might explain different results drawn from similar studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    单细胞RNA测序(scRNA-seq)的快速发展允许在复杂疾病或组织中以单细胞分辨率测量基因的表达。虽然已经开发了许多方法来从scRNA-seq数据中检测细胞簇,这项任务目前仍然是一项主要挑战。我们提出了一种基于多目标优化的模糊聚类方法,用于从scRNA-seq数据中检测细胞簇。首先,我们进行了初始滤波和SCnorm归一化。我们通过选择不同的群集编号(cl=2到用户定义的编号)考虑了各种案例研究,并分别应用了模糊c均值聚类算法。从每个案例中,我们评估了四个聚类效度指标的得分,分区熵(PE),分区系数(PC),修正的分区系数(MPC),和模糊剪影指数(FSI)。接下来,我们将第一个度量设置为最小化目标(丨),其余三个作为最大化目标(^),然后应用了多目标决策技术,TOPSIS,找出最佳的最优解。选择TOPSIS得分最高的最佳解决方案(案例研究)作为最终的最佳聚类。最后,我们通过比较每个合成簇和其余簇之间的样品表达,使用Limma获得了差异表达基因(DEGs)。我们将我们的方法应用于小鼠中罕见肠细胞类型的scRNA-seq数据集[GEOID:GSE62270,23,630个特征(基因)和288个细胞]。最优聚类结果(TOPSIS最优分数=0.858)包括两个聚类,一个有115个细胞,其他91个细胞。四个聚类效度指数的评估得分,FSI,PE,PC,优化后的模糊聚类的MPC分别为0.482、0.578、0.607和0.215。Limma分析确定了1240个DEG(第1组与集群2)。前十位基因标记分别为Rps21、Slc5a1、Crip1、Rpl15、Rpl3、Rpl27a,Khk,Rps3a1,Aldob和Rps17。在这个列表中,Khk(编码酮己糖激酶)是罕见肠细胞类型的新型标记。总之,该方法可用于从scRNA-seq数据中检测细胞簇。
    Rapid advance in single-cell RNA sequencing (scRNA-seq) allows measurement of the expression of genes at single-cell resolution in complex disease or tissue. While many methods have been developed to detect cell clusters from the scRNA-seq data, this task currently remains a main challenge. We proposed a multi-objective optimization-based fuzzy clustering approach for detecting cell clusters from scRNA-seq data. First, we conducted initial filtering and SCnorm normalization. We considered various case studies by selecting different cluster numbers ( c l = 2 to a user-defined number), and applied fuzzy c-means clustering algorithm individually. From each case, we evaluated the scores of four cluster validity index measures, Partition Entropy ( P E ), Partition Coefficient ( P C ), Modified Partition Coefficient ( M P C ), and Fuzzy Silhouette Index ( F S I ). Next, we set the first measure as minimization objective (↓) and the remaining three as maximization objectives (↑), and then applied a multi-objective decision-making technique, TOPSIS, to identify the best optimal solution. The best optimal solution (case study) that had the highest TOPSIS score was selected as the final optimal clustering. Finally, we obtained differentially expressed genes (DEGs) using Limma through the comparison of expression of the samples between each resultant cluster and the remaining clusters. We applied our approach to a scRNA-seq dataset for the rare intestinal cell type in mice [GEO ID: GSE62270, 23,630 features (genes) and 288 cells]. The optimal cluster result (TOPSIS optimal score= 0.858) comprised two clusters, one with 115 cells and the other 91 cells. The evaluated scores of the four cluster validity indices, F S I , P E , P C , and M P C for the optimized fuzzy clustering were 0.482, 0.578, 0.607, and 0.215, respectively. The Limma analysis identified 1240 DEGs (cluster 1 vs. cluster 2). The top ten gene markers were Rps21, Slc5a1, Crip1, Rpl15, Rpl3, Rpl27a, Khk, Rps3a1, Aldob and Rps17. In this list, Khk (encoding ketohexokinase) is a novel marker for the rare intestinal cell type. In summary, this method is useful to detect cell clusters from scRNA-seq data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号