关键词: CP: Cancer biology CP: Systems biology CT scan attention deep learning gene expression imaging lung cancer multi-modal model radiogenomics small samples survival

Mesh : Humans Lung Neoplasms / genetics diagnostic imaging pathology Deep Learning Carcinoma, Non-Small-Cell Lung / genetics diagnostic imaging pathology Tomography, X-Ray Computed / methods Biomarkers, Tumor / genetics Prognosis Male Female Gene Expression Regulation, Neoplastic Transcriptome

来  源:   DOI:10.1016/j.crmeth.2024.100817   PDF(Pubmed)

Abstract:
Deep-learning tools that extract prognostic factors derived from multi-omics data have recently contributed to individualized predictions of survival outcomes. However, the limited size of integrated omics-imaging-clinical datasets poses challenges. Here, we propose two biologically interpretable and robust deep-learning architectures for survival prediction of non-small cell lung cancer (NSCLC) patients, learning simultaneously from computed tomography (CT) scan images, gene expression data, and clinical information. The proposed models integrate patient-specific clinical, transcriptomic, and imaging data and incorporate Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome pathway information, adding biological knowledge within the learning process to extract prognostic gene biomarkers and molecular pathways. While both models accurately stratify patients in high- and low-risk groups when trained on a dataset of only 130 patients, introducing a cross-attention mechanism in a sparse autoencoder significantly improves the performance, highlighting tumor regions and NSCLC-related genes as potential biomarkers and thus offering a significant methodological advancement when learning from small imaging-omics-clinical samples.
摘要:
从多组学数据中提取预后因素的深度学习工具最近有助于对生存结果进行个性化预测。然而,集成组学-成像-临床数据集的有限规模带来了挑战.这里,我们提出了两种生物学可解释和强大的深度学习架构,用于非小细胞肺癌(NSCLC)患者的生存预测,同时从计算机断层扫描(CT)扫描图像中学习,基因表达数据,和临床信息。拟议的模型集成了患者特定的临床,转录组,和成像数据,并纳入京都基因和基因组百科全书(KEGG)和反应组途径信息,在学习过程中增加生物学知识,以提取预后基因生物标志物和分子通路。虽然在仅130名患者的数据集上进行训练时,这两种模型都可以准确地对高风险和低风险组的患者进行分层,在稀疏自动编码器中引入交叉注意机制显着提高了性能,突出肿瘤区域和NSCLC相关基因作为潜在的生物标志物,因此在从小型成像组学临床样本中学习时提供了显着的方法学进步。
公众号