关键词: Biology Integration Omics System

Mesh : Genomics Genome Plants

来  源:   DOI:10.1186/s12864-023-09833-0   PDF(Pubmed)

Abstract:
BACKGROUND: The ongoing evolution of the Next Generation Sequencing (NGS) technologies has led to the production of genomic data on a massive scale. While tools for genomic data integration and analysis are becoming increasingly available, the conceptual and analytical complexities still represent a great challenge in many biological contexts.
RESULTS: To address this issue, we describe a six-steps tutorial for the best practices in genomic data integration, consisting of (1) designing a data matrix; (2) formulating a specific biological question toward data description, selection and prediction; (3) selecting a tool adapted to the targeted questions; (4) preprocessing of the data; (5) conducting preliminary analysis, and finally (6) executing genomic data integration.
CONCLUSIONS: The tutorial has been tested and demonstrated on publicly available genomic data generated from poplar (Populus L.), a woody plant model. We also developed a new graphical output for the unsupervised multi-block analysis, cimDiablo_v2, available at https://forgemia.inra.fr/umr-gdec/omics-integration-on-poplar , and allowing the selection of master drivers in genomic data variation and interplay.
摘要:
背景:下一代测序(NGS)技术的不断发展导致了大规模基因组数据的产生。虽然基因组数据整合和分析的工具变得越来越可用,在许多生物学背景下,概念和分析的复杂性仍然是一个巨大的挑战。
结果:要解决此问题,我们描述了基因组数据集成最佳实践的六步教程,包括(1)设计数据矩阵;(2)针对数据描述制定特定的生物学问题,选择和预测;(3)选择适合目标问题的工具;(4)对数据进行预处理;(5)进行初步分析,最后(6)执行基因组数据集成。
结论:本教程已在杨树(PopulusL.)产生的公开基因组数据上进行了测试和演示,木本植物模型.我们还为无监督多块分析开发了一种新的图形输出,cimDiablo_v2,可在https://forgemia获得。inra.fr/umr-gdec/omics-整合在杨树上,并允许在基因组数据变异和相互作用中选择主驱动因素。
公众号