关键词: AlphaFold BEN domain DUF4806 domain annotation homeodomain lin-14 structure prediction transcription factor

Mesh : Animals Humans Protein Domains Caenorhabditis elegans / genetics metabolism Phylogeny Sequence Alignment DNA-Binding Proteins / metabolism DNA / genetics

来  源:   DOI:10.1016/j.cub.2023.05.011   PDF(Pubmed)

Abstract:
Organization of protein sequences into domain families is a foundation for cataloging and investigating protein functions. However, long-standing strategies based on primary amino acid sequences are blind to the possibility that proteins with dissimilar sequences could have comparable tertiary structures. Building on our recent findings that in silico structural predictions of BEN family DNA-binding domains closely resemble their experimentally determined crystal structures, we exploited the AlphaFold2 database for comprehensive identification of BEN domains. Indeed, we identified numerous novel BEN domains, including members of new subfamilies. For example, while no BEN domain factors had previously been annotated in C. elegans, this species actually encodes multiple BEN proteins. These include key developmental timing genes of orphan domain status, sel-7 and lin-14, the latter being the central target of the founding miRNA lin-4. We also reveal that the domain of unknown function 4806 (DUF4806), which is widely distributed across metazoans, is structurally similar to BEN and comprises a new subtype. Surprisingly, we find that BEN domains resemble both metazoan and non-metazoan homeodomains in 3D conformation and preserve characteristic residues, indicating that despite their inability to be aligned by conventional methods, these DNA-binding modules are probably evolutionarily related. Finally, we broaden the application of structural homology searches by revealing novel human members of DUF3504, which exists on diverse proteins with presumed or known nuclear functions. Overall, our work strongly expands this recently identified family of transcription factors and illustrates the value of 3D structural predictions to annotate protein domains and interpret their functions.
摘要:
将蛋白质序列组织为结构域家族是编目和研究蛋白质功能的基础。然而,基于一级氨基酸序列的长期策略对具有不同序列的蛋白质可能具有可比的三级结构的可能性是盲目的。基于我们最近的发现,BEN家族DNA结合域的计算机结构预测与实验确定的晶体结构非常相似,我们利用AlphaFold2数据库对BEN结构域进行了全面鉴定。的确,我们发现了许多新的BEN域,包括新的亚科成员.例如,虽然以前在秀丽隐杆线虫中没有注释过BEN域因子,这个物种实际上编码多种BEN蛋白。这些包括孤儿域状态的关键发育时序基因,sel-7和lin-14,后者是建立miRNAlin-4的中心靶标。我们还揭示了未知函数4806(DUF4806)的域,它广泛分布在后生动物中,在结构上类似于BEN,并且包含新的亚型。令人惊讶的是,我们发现BEN结构域在3D构象中类似于后生和非后生同源结构域,并保留特征残基,这表明尽管它们无法通过常规方法对齐,这些DNA结合模块可能是进化相关的。最后,我们通过揭示DUF3504的新人类成员来扩大结构同源性搜索的应用,这些成员存在于具有假定或已知核功能的多种蛋白质上。总的来说,我们的工作极大地扩展了这个最近发现的转录因子家族,并说明了3D结构预测对注释蛋白质结构域和解释其功能的价值。
公众号