关键词: evolution gene expression prediction genotype to phenotype map low frequency variants machine learning regulatory elements

来  源:   DOI:10.1101/2024.04.25.591174   PDF(Pubmed)

Abstract:
The evolution of gene expression responses are a critical component of adaptation to variable environments. Predicting how DNA sequence influences expression is challenging because the genotype to phenotype map is not well resolved for cis regulatory elements, transcription factor binding, regulatory interactions, and epigenetic features, not to mention how these factors respond to environment. We tested if flexible machine learning models could learn some of the underlying cis-regulatory genotype to phenotype map. We tested this approach using cold-responsive transcriptome profiles in 5 diverse Arabidopsis thaliana accessions. We first tested for evidence that cis regulation plays a role in environmental response, finding 14 and 15 motifs that were significantly enriched within the up- and down-stream regions of cold-responsive differentially regulated genes (DEGs). We next applied convolutional neural networks (CNNs), which learn de novo cis-regulatory motifs in DNA sequences to predict expression response to environment. We found that CNNs predicted differential expression with moderate accuracy, with evidence that predictions were hindered by biological complexity of regulation and the large potential regulatory code. Overall, DEGs between specific environments can be predicted based on variation in cis-regulatory sequences, although more information needs to be incorporated and better models may be required.
摘要:
基因表达反应的进化是适应可变环境的关键组成部分。预测DNA序列如何影响表达是具有挑战性的,因为基因型到表型图谱对于顺式调控元件没有很好的解决。转录因子结合,监管互动,和表观遗传特征,更不用说这些因素对环境的反应了。我们测试了灵活的机器学习模型是否可以学习一些潜在的顺式调节基因型到表型图谱。我们在5个不同的拟南芥种质中使用冷响应转录组谱测试了这种方法。我们首先测试了顺式调节在环境响应中起作用的证据,发现14个和15个基序在冷反应差异调节基因(DEGs)的上游和下游区域显着富集。我们接下来应用卷积神经网络(CNN),它学习DNA序列中的从头顺式调控基序,以预测对环境的表达反应。我们发现CNN以中等精度预测差异表达,有证据表明,生物调控的复杂性和巨大的潜在调控代码阻碍了预测。总的来说,可以根据顺式调控序列的变化来预测特定环境之间的DEG,尽管需要纳入更多信息,并且可能需要更好的模型。
公众号