关键词: BRCA Bioinformatics Breast cancer Genetic risk factor Machine learning Ovarian cancer WES

来  源:   DOI:10.7717/peerj-cs.1942   PDF(Pubmed)

Abstract:
Breast and ovarian cancers are prevalent worldwide, with genetic factors such as BRCA1 and BRCA2 mutations playing a significant role. However, not all patients carry these mutations, making it challenging to identify risk factors. Researchers have turned to whole exome sequencing (WES) as a tool to identify genetic risk factors in BRCA-negative women. WES allows the sequencing of all protein-coding regions of an individual\'s genome, providing a comprehensive analysis that surpasses traditional gene-by-gene sequencing methods. This technology offers efficiency, cost-effectiveness and the potential to identify new genetic variants contributing to the susceptibility to the diseases. Interpreting WES data for disease-causing variants is challenging due to its complex nature. Machine learning techniques can uncover hidden genetic-variant patterns associated with cancer susceptibility. In this study, we used the extreme gradient boosting (XGBoost) and random forest (RF) algorithms to identify BRCA-related cancer high-risk genes specifically in the Saudi population. The experimental results exposed that the RF method scored superior performance with an accuracy of 88.16% and an area under the receiver-operator characteristic curve of 0.95. Using bioinformatics analysis tools, we explored the top features of the high-accuracy machine learning model that we built to enhance our knowledge of genetic interactions and find complex genetic patterns connected to the development of BRCA-related cancers. We were able to identify the significance of HLA gene variations in these WES datasets for BRCA-related patients. We find that immune response mechanisms play a major role in the development of BRCA-related cancer. It specifically highlights genes associated with antigen processing and presentation, such as HLA-B, HLA-A and HLA-DRB1 and their possible effects on tumour progression and immune evasion. In summary, by utilizing machine learning approaches, we have the potential to aid in the development of precision medicine approaches for early detection and personalized treatment strategies.
摘要:
乳腺癌和卵巢癌在全世界都很普遍,遗传因素如BRCA1和BRCA2突变发挥了重要作用。然而,不是所有的病人都携带这些突变,这使得识别风险因素具有挑战性。研究人员已经转向全外显子组测序(WES)作为识别BRCA阴性女性遗传风险因素的工具。WES允许对个体基因组的所有蛋白质编码区进行测序,提供了一个全面的分析,超越了传统的基因测序方法。这项技术提供了效率,成本效益和识别导致疾病易感性的新遗传变异的潜力。由于其复杂的性质,解释引起疾病的变异的WES数据具有挑战性。机器学习技术可以发现与癌症易感性相关的隐藏的遗传变异模式。在这项研究中,我们使用极端梯度增强(XGBoost)和随机森林(RF)算法来鉴定特定于沙特人群的BRCA相关癌症高危基因.实验结果表明,RF方法取得了卓越的性能,精度为88.16%,接收器-运营商特征曲线下的面积为0.95。使用生物信息学分析工具,我们探索了我们建立的高精度机器学习模型的主要特征,该模型旨在增强我们对遗传相互作用的认识,并发现与BRCA相关癌症发展相关的复杂遗传模式.我们能够在BRCA相关患者的这些WES数据集中确定HLA基因变异的重要性。我们发现免疫应答机制在BRCA相关癌症的发展中起主要作用。它特别突出了与抗原加工和呈递相关的基因,比如HLA-B,HLA-A和HLA-DRB1及其对肿瘤进展和免疫逃避的可能影响。总之,通过利用机器学习方法,我们有潜力帮助开发精准医学方法,以早期发现和个性化治疗策略。
公众号