关键词: ANNs, artificial neural networks ANOVA, analysis of variance AUC, the area under the ROC curve CART, classification and regression tree CNV, copy number variation DTs, decision trees Decision tree FFNN, Feedforward neural networks LS-SVM, least-squares support vector machine LUAD, lung adenocarcinoma LUSC, lung squamous cell carcinoma Lung cancer ML, machine learning Machine learning NSCLC, non-small cell lung cancer Personalized diagnosis and prognosis ROC, receiver operating characteristic SVMs, support vector machines TCGA, The Cancer Genome Atlas TNM, a common cancer staging system while T, N and M refers to tumour, node and metastasis

来  源:   DOI:10.1016/j.csbj.2022.03.035   PDF(Pubmed)

Abstract:
Machine learning is an important artificial intelligence technique that is widely applied in cancer diagnosis and detection. More recently, with the rise of personalised and precision medicine, there is a growing trend towards machine learning applications for prognosis prediction. However, to date, building reliable prediction models of cancer outcomes in everyday clinical practice is still a hurdle. In this work, we integrate genomic, clinical and demographic data of lung adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) patients from The Cancer Genome Atlas (TCGA) and introduce copy number variation (CNV) and mutation information of 15 selected genes to generate predictive models for recurrence and survivability. We compare the accuracy and benefits of three well-established machine learning algorithms: decision tree methods, neural networks and support vector machines. Although the accuracy of predictive models using the decision tree method has no significant advantage, the tree models reveal the most important predictors among genomic information (e.g. KRAS, EGFR, TP53), clinical status (e.g. TNM stage and radiotherapy) and demographics (e.g. age and gender) and how they influence the prediction of recurrence and survivability for both early stage LUAD and LUSC. The machine learning models have the potential to help clinicians to make personalised decisions on aspects such as follow-up timeline and to assist with personalised planning of future social care needs.
摘要:
机器学习是一种重要的人工智能技术,广泛应用于癌症诊断和检测。最近,随着个性化和精准医疗的兴起,机器学习应用于预后预测的趋势正在增长。然而,到目前为止,在日常临床实践中建立可靠的癌症预后预测模型仍然是一个障碍。在这项工作中,我们整合基因组,来自癌症基因组图谱(TCGA)的肺腺癌(LUAD)和鳞状细胞癌(LUSC)患者的临床和人口统计学数据,并引入15个选定基因的拷贝数变异(CNV)和突变信息,以生成复发和存活的预测模型。我们比较了三种成熟的机器学习算法的准确性和好处:决策树方法、神经网络和支持向量机。尽管使用决策树方法的预测模型的准确性没有显著优势,树模型揭示了基因组信息中最重要的预测因子(例如KRAS,EGFR,TP53),临床状态(如TNM分期和放疗)和人口统计学(如年龄和性别),以及它们如何影响早期LUAD和LUSC的复发和存活预测.机器学习模型有可能帮助临床医生在后续时间表等方面做出个性化决策,并帮助个性化规划未来的社会护理需求。
公众号