关键词: Cancer unknown primary DNA methylation Deep learning Molecular diagnosis

Mesh : DNA Methylation Humans Deep Learning Neoplasms, Unknown Primary / genetics pathology CpG Islands Algorithms Biomarkers, Tumor / genetics Image Processing, Computer-Assisted / methods

来  源:   DOI:10.1016/j.neo.2024.101021   PDF(Pubmed)

Abstract:
Cancer of unknown primary (CUP) is a rare type of metastatic cancer in which the origin of the tumor is unknown. Since the treatment strategy for patients with metastatic tumors depends on knowing the primary site, accurate identification of the origin site is important. Here, we developed an image-based deep-learning model that utilizes a vision transformer algorithm for predicting the origin of CUP. Using DNA methylation dataset of 8,233 primary tumors from The Cancer Genome Atlas (TCGA), we categorized 29 cancer types into 18 organ classes and extracted 2,312 differentially methylated CpG sites (DMCs) from non-squamous cancer group and 420 DMCs from squamous cell cancer group. Using these DMCs, we created organ-specific DNA methylation images and used them for model training and testing. Model performance was evaluated using 394 metastatic cancer samples from TCGA (TCGA-meta) and 995 samples (693 primary and 302 metastatic cancers) obtained from 20 independent external studies. We identified that the DNA methylation image reveals a distinct pattern based on the origin of cancer. Our model achieved an overall accuracy of 96.95 % in the TCGA-meta dataset. In the external validation datasets, our classifier achieved overall accuracies of 96.39 % and 94.37 % in primary and metastatic tumors, respectively. Especially, the overall accuracies for both primary and metastatic samples of non-squamous cell cancer were exceptionally high, with 96.79 % and 96.85 %, respectively.
摘要:
未知原发癌(CUP)是一种罕见的转移性癌症,其中肿瘤的起源未知。由于转移性肿瘤患者的治疗策略取决于了解原发部位,准确识别产地很重要。这里,我们开发了一种基于图像的深度学习模型,该模型利用视觉转换算法来预测CUP的起源。使用来自癌症基因组图谱(TCGA)的8,233个原发性肿瘤的DNA甲基化数据集,我们将29种癌症类型分为18个器官类别,并从非鳞状细胞癌组提取了2,312个差异甲基化CpG位点(DMC),从鳞状细胞癌组提取了420个DMC.使用这些DMC,我们创建了器官特异性DNA甲基化图像,并将其用于模型训练和测试.使用来自TCGA(TCGA-meta)的394个转移性癌症样品和从20个独立的外部研究获得的995个样品(693个原发性癌症和302个转移性癌症)评估模型性能。我们确定DNA甲基化图像揭示了基于癌症起源的独特模式。我们的模型在TCGA-meta数据集中实现了96.95%的总体准确率。在外部验证数据集中,我们的分类器在原发性和转移性肿瘤中的总体准确率分别为96.39%和94.37%,分别。尤其是,非鳞状细胞癌的原发和转移样本的总体准确性异常高,96.79%和96.85%,分别。
公众号