关键词: Breast cancer Machine learning Magnetic resonance images Radiogenomics cGANs

Mesh : Humans Female Breast Neoplasms / diagnostic imaging genetics Radiomics DNA Copy Number Variations Bayes Theorem Magnetic Resonance Imaging / methods Mutation / genetics

来  源:   DOI:10.1186/s12967-024-05018-9   PDF(Pubmed)

Abstract:
BACKGROUND: Breast Cancer (BC) is a highly heterogeneous and complex disease. Personalized treatment options require the integration of multi-omic data and consideration of phenotypic variability. Radiogenomics aims to merge medical images with genomic measurements but encounter challenges due to unpaired data consisting of imaging, genomic, or clinical outcome data. In this study, we propose the utilization of a well-trained conditional generative adversarial network (cGAN) to address the unpaired data issue in radiogenomic analysis of BC. The generated images will then be used to predict the mutations status of key driver genes and BC subtypes.
METHODS: We integrated the paired MRI and multi-omic (mRNA gene expression, DNA methylation, and copy number variation) profiles of 61 BC patients from The Cancer Imaging Archive (TCIA) and The Cancer Genome Atlas (TCGA). To facilitate this integration, we employed a Bayesian Tensor Factorization approach to factorize the multi-omic data into 17 latent features. Subsequently, a cGAN model was trained based on the matched side-view patient MRIs and their corresponding latent features to predict MRIs for BC patients who lack MRIs. Model performance was evaluated by calculating the distance between real and generated images using the Fréchet Inception Distance (FID) metric. BC subtype and mutation status of driver genes were obtained from the cBioPortal platform, where 3 genes were selected based on the number of mutated patients. A convolutional neural network (CNN) was constructed and trained using the generated MRIs for mutation status prediction. Receiver operating characteristic area under curve (ROC-AUC) and precision-recall area under curve (PR-AUC) were used to evaluate the performance of the CNN models for mutation status prediction. Precision, recall and F1 score were used to evaluate the performance of the CNN model in subtype classification.
RESULTS: The FID of the images from the well-trained cGAN model based on the test set is 1.31. The CNN for TP53, PIK3CA, and CDH1 mutation prediction yielded ROC-AUC values 0.9508, 0.7515, and 0.8136 and PR-AUC are 0.9009, 0.7184, and 0.5007, respectively for the three genes. Multi-class subtype prediction achieved precision, recall and F1 scores of 0.8444, 0.8435 and 0.8336 respectively. The source code and related data implemented the algorithms can be found in the project GitHub at https://github.com/mattthuang/BC_RadiogenomicGAN .
CONCLUSIONS: Our study establishes cGAN as a viable tool for generating synthetic BC MRIs for mutation status prediction and subtype classification to better characterize the heterogeneity of BC in patients. The synthetic images also have the potential to significantly augment existing MRI data and circumvent issues surrounding data sharing and patient privacy for future BC machine learning studies.
摘要:
背景:乳腺癌(BC)是一种高度异质性和复杂性的疾病。个性化治疗方案需要整合多维数据并考虑表型变异性。放射基因组学旨在将医学图像与基因组测量结果合并,但由于成像组成的不成对数据而面临挑战。基因组,或临床结果数据。在这项研究中,我们建议利用训练有素的条件生成对抗网络(cGAN)来解决BC的放射基因组分析中的不成对数据问题。然后,生成的图像将用于预测关键驱动基因和BC亚型的突变状态。
方法:我们整合了成对的MRI和多组(mRNA基因表达,DNA甲基化,和拷贝数变异)来自癌症成像档案(TCIA)和癌症基因组图谱(TCGA)的61例BC患者的概况。为了促进这种整合,我们采用贝叶斯张量分解方法将多组数据分解为17个潜在特征。随后,基于匹配的侧视患者MRI及其对应的潜在特征训练cGAN模型,以预测缺乏MRI的BC患者的MRI.通过使用FréchetInceptionDistance(FID)度量计算真实图像与生成图像之间的距离来评估模型性能。从cBioPortal平台获得BC亚型和驱动基因的突变状态,其中根据突变患者的数量选择了3个基因。使用生成的MRI构建和训练卷积神经网络(CNN)以用于突变状态预测。使用受试者工作特征曲线下面积(ROC-AUC)和精确召回曲线下面积(PR-AUC)来评估CNN模型对突变状态预测的性能。Precision,使用回忆和F1评分来评估CNN模型在亚型分类中的性能。
结果:来自基于测试集的经过良好训练的cGAN模型的图像的FID为1.31。CNN为TP53,PIK3CA,和CDH1突变预测产生的ROC-AUC值分别为0.9508、0.7515和0.8136,PR-AUC为0.9009、0.7184和0.5007。实现了多类子类型预测的精度,召回和F1得分分别为0.8444、0.8435和0.8336。实现算法的源代码和相关数据可以在项目GitHub中找到,网址为https://github.com/mattthuang/BC_RadiogenomicGAN。
结论:我们的研究确立了cGAN作为生成合成BCMRI的可行工具,用于突变状态预测和亚型分类,以更好地表征患者BC的异质性。合成图像还具有显着增强现有MRI数据的潜力,并为未来的BC机器学习研究规避围绕数据共享和患者隐私的问题。
公众号