植物基因组可以承受小规模和大规模的复制,在生命之树上取得了比任何其他王国更大的成功,导致基因家族的存在和进化,通常有一百多个成员!基因家族,反过来,经历亚功能化或新功能化,形成在原始活动的背景下执行独特或分组功能的蛋白质结构域。由于植物界有大量这样的案例,研究其特定的感兴趣基因家族已成为植物生物学家的常规任务。在这一章中,我们为这项工作提供了一个简单而标准的管道,以水稻中类固醇急性调节蛋白(StAR)相关脂质转移(START)结构域为例,作为参考。我们描述了提取,processing,水稻的下游分析。用于鉴定和比较START结构域的粳稻蛋白质组。这是通过拟南芥中35个报告的START域的训练谱隐马尔可夫模型(HMM)来完成的,然后用于搜索水稻中潜在的同源物。下游调查包括域结构分析,外显子-内含子模式的可视化,START基因的染色体定位,和系统发育研究,顺式调控元件的鉴定和基因调控网络的构建。此外,我们还强调了可用于执行类似分析的各种替代工具和技术,以及显著的特征。
Plant genomes can withstand small- and large-scale duplications, at a far greater success than any other kingdom in the tree of life, resulting in the existence and evolution of gene families, often with over a hundred members! The gene families, in turn, go through subfunctionalization or neofunctionalization, to form protein domains performing unique or grouped functions in context of the original activity. Due to the large number of such cases in the plant kingdom, it has become a routine task for plant biologists to investigate their specific gene family of interest. In this chapter, we provide a simple and standard pipeline for this effort, taking the example of steroidogenic acute regulatory protein (StAR) related lipid transfer (START) domains in rice, as reference. We describe the extraction, processing, and downstream analysis of Oryza sativa var. japonica proteome towards identification and comparative exploration of START domains. This was done by training profile Hidden Markov Models (HMM) of 35 reported START domains in Arabidopsis, which were then used to search potential homologs in rice. Downstream investigations included domain structure analysis, visualization of exon-intron patterns, chromosomal localization of START genes, and phylogenetic studies, followed by identification of cis-regulatory elements and gene regulatory network construction. Additionally, we have also highlighted various alternative tools and techniques that can be used to perform similar analyses, along with salient features.