基因调控网络定义了细胞中DNA产物与其他物质之间的相互作用。增加对这些网络的了解提高了描述引发不同疾病的过程的细节水平,并促进了新治疗靶标的开发。这些网络通常用图表示,它们正确构造的主要来源通常是来自差异表达数据的时间序列。在文献中,从这种数据类型推断网络的方法有所不同。大多数情况下,计算学习技术已经实现,它们最终在特定数据集中显示了一些专业化。出于这个原因,需要创建新的和更强大的策略,以根据先前的结果达成共识,以获得特定的概括能力。本文介绍了GENECI(GENE网络共识推断),一种进化机器学习方法,充当构建集合的组织者,以处理文献中报告的主要推理技术的结果,并优化从中得出的共识网络,根据它们的置信水平和拓扑特征。经过设计,该提案面临着从学术基准(DREAM挑战和IRMA网络)收集的数据集,以量化其准确性。随后,它被应用于黑色素瘤患者的真实世界生物学网络,其结果可以与文献中收集的医学研究进行对比。最后,已经证明,它能够优化多个网络的共识,导致突出的鲁棒性和准确性,在面对多个数据集的推理后,获得一定的泛化能力。源代码托管在GitHub的公共存储库中,MIT许可证为https://github.com/AdrianSeguraOrtiz/GENECI。此外,为了方便其安装和使用,与此实现相关的软件已封装在PyPI:https://pypi.org/project/geneci/上的python包中。
Gene regulatory networks define the interactions between DNA products and other substances in cells. Increasing knowledge of these networks improves the level of detail with which the processes that trigger different diseases are described and fosters the development of new therapeutic targets. These networks are usually represented by graphs, and the primary sources for their correct construction are usually time series from differential expression data. The inference of networks from this data type has been approached differently in the literature. Mostly, computational learning techniques have been implemented, which have finally shown some specialization in specific datasets. For this reason, the need arises to create new and more robust strategies for reaching a
consensus based on previous results to gain a particular capacity for generalization. This paper presents GENECI (GEne NEtwork
Consensus Inference), an evolutionary machine learning approach that acts as an organizer for constructing ensembles to process the results of the main inference techniques reported in the literature and to optimize the
consensus network derived from them, according to their confidence levels and topological characteristics. After its design, the proposal was confronted with datasets collected from academic benchmarks (DREAM challenges and IRMA network) to quantify its accuracy. Subsequently, it was applied to a real-world biological network of melanoma patients whose results could be contrasted with medical research collected in the literature. Finally, it has been proved that its ability to optimize the
consensus of several networks leads to outstanding robustness and accuracy, gaining a certain generalization capacity after facing the inference of multiple datasets. The source code is hosted in a public repository at GitHub under MIT license: https://github.com/AdrianSeguraOrtiz/GENECI. Moreover, to facilitate its installation and use, the software associated with this implementation has been encapsulated in a python package available at PyPI: https://pypi.org/project/geneci/.