关键词: differential network essential genes network representation learning pan-cancer

Mesh : Algorithms Gene Regulatory Networks Humans Neoplasms / genetics pathology Oncogenes

来  源:   DOI:10.1093/bfgp/elac012

Abstract:
Identification of cancer-related genes is helpful for understanding the pathogenesis of cancer, developing targeted drugs and creating new diagnostic and therapeutic methods. Considering the complexity of the biological laboratory methods, many network-based methods have been proposed to identify cancer-related genes at the global perspective with the increasing availability of high-throughput data. Some studies have focused on the tissue-specific cancer networks. However, cancers from different tissues may share common features, and those methods may ignore the differences and similarities across cancers during the establishment of modeling. In this work, in order to make full use of global information of the network, we first establish the pan-cancer network via differential network algorithm, which not only contains heterogeneous data across multiple cancer types but also contains heterogeneous data between tumor samples and normal samples. Second, the node representation vectors are learned by network embedding. In contrast to ranking analysis-based methods, with the help of integrative network analysis, we transform the cancer-related gene identification problem into a binary classification problem. The final results are obtained via ensemble classification. We further applied these methods to the most commonly used gene expression data involving six tissue-specific cancer types. As a result, an integrative pan-cancer network and several biologically meaningful results were obtained. As examples, nine genes were ultimately identified as potential pan-cancer-related genes. Most of these genes have been reported in published studies, thus showing our method\'s potential for application in identifying driver gene candidates for further biological experimental verification.
摘要:
癌症相关基因的鉴定有助于理解癌症的发病机制,开发靶向药物,创造新的诊断和治疗方法。考虑到生物实验室方法的复杂性,随着高通量数据的日益普及,已经提出了许多基于网络的方法在全球范围内鉴定癌症相关基因.一些研究集中在组织特异性癌症网络上。然而,来自不同组织的癌症可能具有共同的特征,这些方法可能忽略了在建立模型过程中不同癌症的差异和相似性。在这项工作中,为了充分利用全球信息的网络,首先通过差分网络算法建立泛癌网络,它不仅包含多种癌症类型的异构数据,还包含肿瘤样本和正常样本之间的异构数据。第二,节点表示向量通过网络嵌入学习。与基于排名分析的方法相比,在综合网络分析的帮助下,我们将癌症相关基因识别问题转化为二元分类问题。最终的结果是通过集成分类获得的。我们进一步将这些方法应用于涉及六种组织特异性癌症类型的最常用基因表达数据。因此,我们获得了一个整合的泛癌症网络和几个有生物学意义的结果.作为例子,9个基因最终被鉴定为潜在的泛癌症相关基因.这些基因中的大多数已在已发表的研究中被报道,从而显示了我们的方法在识别候选驱动基因以进一步生物实验验证方面的应用潜力。
公众号