关键词: DNA methylation profile cancer of unknown primary gene expression profile machine learning tumor tissue origin detection

来  源:   DOI:10.3389/fgene.2024.1383852   PDF(Pubmed)

Abstract:
UNASSIGNED: Tumor tissue origin detection is of great importance in determining the appropriate course of treatment for cancer patients. Classifiers based on gene expression and DNA methylation profiles have been confirmed to be feasible and reliable to predict the tumor primary. However, few works have been performed to compare the performance of these classifiers based on different profiles.
UNASSIGNED: Using gene expression and DNA methylation profiles from The Cancer Genome Atlas (TCGA) project, eight machine learning methods were employed for the tumor tissue origin detection. We then evaluated the predictive performance using DNA methylation, mRNA, microRNA (miRNA) and long non-coding RNA (lncRNA) expression profiles in a comparative manner. A statistical method was introduced to select the most informative CpG sites.
UNASSIGNED: We found that LASSO is the most predictive models based on various profiles. Further analyses indicated that the results derived from DNA methylation (overall accuracy: 97.77%) are better than those derived from mRNA expression (overall accuracy: 88.01%), microRNA expression (overall accuracy: 91.03%) and lncRNA expression (overall accuracy: 95.7%). It has been suggested that we can achieve an overall accuracy >90% using only 1,000 methylated CpG sites for prediction.
UNASSIGNED: In this work, we comprehensively evaluated the performance of classifiers based on different profiles for the tumor origin detection. Our findings demonstrated the effectiveness of DNA methylation as biomarker for tracing tumor tissue origin using LASSO and neural network.
摘要:
肿瘤组织来源检测在确定癌症患者的适当治疗过程中非常重要。基于基因表达和DNA甲基化谱的分类器已被证实是预测肿瘤原发的可行和可靠的。然而,已经执行了一些工作来比较基于不同配置文件的这些分类器的性能。
使用来自癌症基因组图谱(TCGA)项目的基因表达和DNA甲基化谱,8种机器学习方法用于肿瘤组织来源检测。然后,我们使用DNA甲基化评估了预测性能,mRNAmicroRNA(miRNA)和长链非编码RNA(lncRNA)表达谱具有比较性。引入统计学方法来选择信息量最大的CpG位点。
我们发现LASSO是基于各种配置文件的最具预测性的模型。进一步的分析表明,来自DNA甲基化的结果(总体准确度:97.77%)优于来自mRNA表达的结果(总体准确度:88.01%)。microRNA表达(总体准确度:91.03%)和lncRNA表达(总体准确度:95.7%)。已经提出,使用仅1,000个甲基化CpG位点进行预测,我们可以实现>90%的总体准确度。
在这项工作中,我们综合评估了基于不同轮廓的分类器在肿瘤起源检测中的性能。我们的发现证明了DNA甲基化作为生物标志物使用LASSO和神经网络追踪肿瘤组织起源的有效性。
公众号