关键词: Eukaryotes Gene function Gene structure Genome Longest intron Ribosome biogenesis Spliceosome

Mesh : Introns / genetics Animals Humans Arabidopsis / genetics Spliceosomes / genetics metabolism

来  源:   DOI:10.1186/s12864-024-10558-x   PDF(Pubmed)

Abstract:
Despite the fact that introns mean an energy and time burden for eukaryotic cells, they play an irreplaceable role in the diversification and regulation of protein production. As a common feature of eukaryotic genomes, it has been reported that in protein-coding genes, the longest intron is usually one of the first introns. The goal of our work was to find a possible difference in the biological function of genes that fulfill this common feature compared to genes that do not. Data on the lengths of all introns in genes were extracted from the genomes of six vertebrates (human, mouse, koala, chicken, zebrafish and fugu) and two other model organisms (nematode worm and arabidopsis). We showed that more than 40% of protein-coding genes have the relative position of the longest intron located in the second or third tertile of all introns. Genes divided according to the relative position of the longest intron were found to be significantly increased in different KEGG pathways. Genes with the longest intron in the first tertile predominate in a range of pathways for amino acid and lipid metabolism, various signaling, cell junctions or ABC transporters. Genes with the longest intron in the second or third tertile show increased representation in pathways associated with the formation and function of the spliceosome and ribosomes. In the two groups of genes defined in this way, we further demonstrated the difference in the length of the longest introns and the distribution of their absolute positions. We also pointed out other characteristics, namely the positive correlation between the length of the longest intron and the sum of the lengths of all other introns in the gene and the preservation of the exact same absolute and relative position of the longest intron between orthologous genes.
摘要:
尽管内含子意味着真核细胞的能量和时间负担,它们在蛋白质生产的多样化和调节中起着不可替代的作用。作为真核生物基因组的共同特征,据报道,在蛋白质编码基因中,最长的内含子通常是第一个内含子之一。我们工作的目标是发现与不满足这一共同特征的基因相比,满足这一共同特征的基因的生物学功能可能存在差异。从六种脊椎动物的基因组中提取了基因中所有内含子长度的数据(人类,鼠标,考拉,鸡肉,斑马鱼和河豚)和其他两种模式生物(线虫和拟南芥)。我们表明,超过40%的蛋白质编码基因的最长内含子的相对位置位于所有内含子的第二或第三三分位。发现根据最长内含子的相对位置划分的基因在不同的KEGG途径中显着增加。在第一三元组中具有最长内含子的基因在一系列氨基酸和脂质代谢途径中占主导地位,各种信令,细胞连接或ABC转运蛋白。在第二或第三三元组中具有最长内含子的基因在与剪接体和核糖体的形成和功能相关的途径中显示出增加的代表性。在以这种方式定义的两组基因中,我们进一步证明了最长内含子的长度和它们的绝对位置分布的差异。我们还指出了其他特点,即最长内含子的长度与基因中所有其他内含子的长度之和之间的正相关,并且保留了直系同源基因之间最长内含子的完全相同的绝对和相对位置。
公众号