关键词: AUC, Area under the ROC curve BrCa, Breast cancer COSMIC, The catalogue of somatic mutations in cancer CX-25, Complete XgBoost top 25 DE, Differential Expression DMFS, Distasnt metastasis free survival DX-20, Driver XgBoost top 20 Differential gene expression Distant-metastasis free survival EMT, Epithelial to mesenchymal transition ER, Oestrogen Receptor FDR, False discovery rate GEO, Gene expression omnibous HER2, Human epidermal growth factor receptor 2 KM, Kaplan Meier ML, Machine learning NSCLC, Non small cell lung carcinoma OS, Overall survival PCA, Principal component analysis POU2AF1 PR, Progesterone receptor Prognostic gene signatures RF, Random forest RFE, Recursive feature elimination ROC, Receiver operating characteristics curve S100B SVM, Support vector machine TNBC TNBC, Triple negative breast cancer kNN, k Nearest neighbors

来  源:   DOI:10.1016/j.csbj.2022.03.019   PDF(Pubmed)

Abstract:
Tumor heterogeneity and the unclear metastasis mechanisms are the leading cause for the unavailability of effective targeted therapy for Triple-negative breast cancer (TNBC), a breast cancer (BrCa) subtype characterized by high mortality and high frequency of distant metastasis cases. The identification of prognostic biomarker can improve prognosis and personalized treatment regimes. Herein, we collected gene expression datasets representing TNBC and Non-TNBC BrCa. From the complete dataset, a subset reflecting solely known cancer driver genes was also constructed. Recursive Feature Elimination (RFE) was employed to identify top 20, 25, 30, 35, 40, 45, and 50 gene signatures that differentiate TNBC from the other BrCa subtypes. Five machine learning algorithms were employed on these selected features and on the basis of model performance evaluation, it was found that for the complete and driver dataset, XGBoost performs the best for a subset of 25 and 20 genes, respectively. Out of these 45 genes from the two datasets, 34 genes were found to be differentially regulated. The Kaplan-Meier (KM) analysis for Distant Metastasis Free Survival (DMFS) of these 34 differentially regulated genes revealed four genes, out of which two are novel that could be potential prognostic genes (POU2AF1 and S100B). Finally, interactome and pathway enrichment analyses were carried out to investigate the functional role of the identified potential prognostic genes in TNBC. These genes are associated with MAPK, PI3-AkT, Wnt, TGF-β, and other signal transduction pathways, pivotal in metastasis cascade. These gene signatures can provide novel molecular-level insights into metastasis.
摘要:
肿瘤异质性和转移机制不明确是导致三阴性乳腺癌(TNBC)无法获得有效靶向治疗的主要原因。一种乳腺癌(BrCa)亚型,其特征是高死亡率和高频率的远处转移病例。预后生物标志物的鉴定可以改善预后和个性化治疗方案。在这里,我们收集了代表TNBC和非TNBCBrCa的基因表达数据集。从完整的数据集中,还构建了一个仅反映已知癌症驱动基因的子集。采用递归特征消除(RFE)来鉴定将TNBC与其他BrCa亚型区分开的前20、25、30、35、40、45和50个基因标签。在这些选定的特征和模型性能评估的基础上,采用了五种机器学习算法,发现对于完整和驱动程序数据集,XGBoost对25个和20个基因的子集表现最好,分别。在这两个数据集中的45个基因中,发现34个基因受到差异调节。Kaplan-Meier(KM)分析了这34个差异调节基因的远处无转移生存(DMFS),揭示了四个基因,其中两个是新的,可能是潜在的预后基因(POU2AF1和S100B)。最后,我们进行了相互作用组和通路富集分析,以研究已鉴定的潜在预后基因在TNBC中的功能作用.这些基因与MAPK有关,PI3-AkT,Wnt,TGF-β,和其他信号转导途径,在转移级联中至关重要。这些基因标签可以提供对转移的新的分子水平见解。
公众号