关键词: essential genes lncRNA-protein interaction network mete-path-guided random walk optimal negative samples

Mesh : Humans Animals Mice Protein Interaction Maps RNA, Long Noncoding / genetics metabolism MicroRNAs / metabolism Neoplasms Neural Networks, Computer

来  源:   DOI:10.1093/bib/bbad097

Abstract:
Gene essentiality is defined as the extent to which a gene is required for the survival and reproductive success of a living system. It can vary between genetic backgrounds and environments. Essential protein coding genes have been well studied. However, the essentiality of non-coding regions is rarely reported. Most regions of human genome do not encode proteins. Determining essentialities of non-coding genes is demanded. We developed iEssLnc models, which can assign essentiality scores to lncRNA genes. As far as we know, this is the first direct quantitative estimation to the essentiality of lncRNA genes. By taking the advantage of graph neural network with meta-path-guided random walks on the lncRNA-protein interaction network, iEssLnc models can perform genome-wide screenings for essential lncRNA genes in a quantitative manner. We carried out validations and whole genome screening in the context of human cancer cell-lines and mouse genome. In comparisons to other methods, which are transferred from protein-coding genes, iEssLnc achieved better performances. Enrichment analysis indicated that iEssLnc essentiality scores clustered essential lncRNA genes with high ranks. With the screening results of iEssLnc models, we estimated the number of essential lncRNA genes in human and mouse. We performed functional analysis to find that essential lncRNA genes interact with microRNAs and cytoskeletal proteins significantly, which may be of interest in experimental life sciences. All datasets and codes of iEssLnc models have been deposited in GitHub (https://github.com/yyZhang14/iEssLnc).
摘要:
基因重要性定义为生命系统的生存和繁殖成功所需的基因程度。它可以在遗传背景和环境之间变化。已经对必需蛋白质编码基因进行了充分的研究。然而,很少报道非编码区的重要性。人类基因组的大多数区域不编码蛋白质。需要确定非编码基因的必要性。我们开发了iEssLnc模型,它可以为lncRNA基因分配重要性评分。据我们所知,这是对lncRNA基因重要性的首次直接定量评估。通过在lncRNA-蛋白质相互作用网络上利用图神经网络的元路径引导随机游走,iEssLnc模型可以以定量方式对必需的lncRNA基因进行全基因组筛选。我们在人类癌细胞系和小鼠基因组的背景下进行了验证和全基因组筛选。与其他方法相比,从蛋白质编码基因转移过来,iEssLnc取得了更好的表现。富集分析表明iEssLnc重要性评分聚集了具有高排名的必需lncRNA基因。根据iEssLnc模型的筛选结果,我们估计了人类和小鼠中必需的lncRNA基因的数量。我们进行了功能分析,发现必需的lncRNA基因与microRNA和细胞骨架蛋白显著相互作用,这可能对实验生命科学感兴趣。iEssLnc模型的所有数据集和代码已存储在GitHub(https://github.com/yyyZhang14/iEssLnc)中。
公众号