Mesh : Arabidopsis / chemistry genetics metabolism Arabidopsis Proteins / chemistry classification metabolism Conserved Sequence / genetics Datasets as Topic Gene Expression Regulation, Plant / genetics Indoleacetic Acids / metabolism Intrinsically Disordered Proteins Molecular Sequence Annotation Neural Networks, Computer Protein Domains Proteome / chemistry metabolism Transcription Factors / chemistry classification metabolism Transcriptional Activation / genetics

来  源:   DOI:10.1038/s41586-024-07707-3

Abstract:
Gene expression in Arabidopsis is regulated by more than 1,900 transcription factors (TFs), which have been identified genome-wide by the presence of well-conserved DNA-binding domains. Activator TFs contain activation domains (ADs) that recruit coactivator complexes; however, for nearly all Arabidopsis TFs, we lack knowledge about the presence, location and transcriptional strength of their ADs1. To address this gap, here we use a yeast library approach to experimentally identify Arabidopsis ADs on a proteome-wide scale, and find that more than half of the Arabidopsis TFs contain an AD. We annotate 1,553 ADs, the vast majority of which are, to our knowledge, previously unknown. Using the dataset generated, we develop a neural network to accurately predict ADs and to identify sequence features that are necessary to recruit coactivator complexes. We uncover six distinct combinations of sequence features that result in activation activity, providing a framework to interrogate the subfunctionalization of ADs. Furthermore, we identify ADs in the ancient AUXIN RESPONSE FACTOR family of TFs, revealing that AD positioning is conserved in distinct clades. Our findings provide a deep resource for understanding transcriptional activation, a framework for examining function in intrinsically disordered regions and a predictive model of ADs.
摘要:
拟南芥的基因表达受1,900多个转录因子(TFs)的调控,已通过存在保守的DNA结合结构域在全基因组范围内鉴定。激活剂TFs包含招募共激活剂复合物的激活域(AD);然而,对于几乎所有的拟南芥TFs,我们缺乏关于存在的知识,它们的ADs1的位置和转录强度。为了解决这个差距,在这里,我们使用酵母文库方法在蛋白质组范围内通过实验鉴定拟南芥AD,发现一半以上的拟南芥TFs含有AD。我们注释了1,553个广告,其中绝大多数是,根据我们的知识,以前未知。使用生成的数据集,我们开发了一种神经网络来准确预测AD,并识别招募共激活复合物所必需的序列特征.我们发现了导致激活活性的六种不同的序列特征组合,提供一个框架来询问AD的亚功能化。此外,我们在TFs的古代AUXIN反应因子家族中鉴定了AD,揭示AD定位在不同的进化枝中是保守的。我们的发现为理解转录激活提供了深入的资源,用于检查内在无序区域中的功能的框架和AD的预测模型。
公众号