关键词: Gene expression Lung adenocarcinoma Network embedding Somatic mutations Survival prediction

Mesh : Humans Adenocarcinoma of Lung / genetics mortality Lung Neoplasms / genetics mortality Prognosis Mutation Protein Interaction Maps / genetics Survival Analysis Algorithms Male Female Computational Biology / methods Gene Regulatory Networks Gene Expression Regulation, Neoplastic Gene Expression Profiling Multiomics

来  源:   DOI:10.1016/j.cmpb.2024.108192

Abstract:
OBJECTIVE: The morbidity of lung adenocarcinoma (LUAD) has been increasing year by year and the prognosis is poor. This has prompted researchers to study the survival of LUAD patients to ensure that patients can be cured in time or survive after appropriate treatment. There is still no fully valid model that can be applied to clinical practice.
METHODS: We introduced struc2vec-based multi-omics data integration (SBMOI), which could integrate gene expression, somatic mutations and clinical data to construct mutation gene vectors representing LUAD patient features. Based on the patient features, the random survival forest (RSF) model was used to predict the long- and short-term survival of LUAD patients. To further demonstrate the superiority of SBMOI, we simultaneously replaced scale-free gene co-expression network (FCN) with a protein-protein interaction (PPI) network and a significant co-expression network (SCN) to compare accuracy in predicting LUAD patient survival under the same conditions.
RESULTS: Our results suggested that compared with SCN and PPI network, the FCN based SBMOI combined with RSF model had better performance in long- and short-term survival prediction tasks for LUAD patients. The AUC of 1-year, 5-year, and 10-year survival in the validation dataset were 0.791, 0.825, and 0.917, respectively.
CONCLUSIONS: This study provided a powerful network-based method to multi-omics data integration. SBMOI combined with RSF successfully predicted long- and short-term survival of LUAD patients, especially with high accuracy on long-term survival. Besides, SBMOI algorithm has the potential to combine with other machine learning models to complete clustering or stratificational tasks, and being applied to other diseases.
摘要:
目的:肺腺癌(LUAD)的发病率逐年升高,预后较差。这促使研究人员研究LUAD患者的生存率,以确保患者能够及时治愈或在适当治疗后生存。目前还没有完全有效的模型可以应用于临床实践。
方法:我们引入了基于struc2vec的多组学数据集成(SBMOI),可以整合基因表达,体细胞突变和临床数据构建代表LUAD患者特征的突变基因载体。根据病人的特征,随机生存森林(RSF)模型用于预测LUAD患者的长期和短期生存。为了进一步证明SBMOI的优越性,我们同时将无标度基因共表达网络(FCN)替换为蛋白质-蛋白质相互作用(PPI)网络和显著共表达网络(SCN),以比较相同条件下预测LUAD患者生存率的准确性.
结果:我们的结果表明,与SCN和PPI网络相比,基于FCN的SBMOI联合RSF模型在LUAD患者的长期和短期生存预测任务中具有更好的表现。1年的AUC,5年,验证数据集中的10年生存率分别为0.791,0.825和0.917.
结论:这项研究为多组数据整合提供了一种强大的基于网络的方法。SBMOI联合RSF成功预测了LUAD患者的长期和短期生存。特别是对长期生存的高精度。此外,SBMOI算法有可能与其他机器学习模型相结合来完成聚类或分层任务,并应用于其他疾病。
公众号