Novel genes

新基因
  • 文章类型: Journal Article
    新的蛋白质编码基因可以通过称为从头基因出现的过程从先前的非编码基因组区域进化。有证据表明,这个过程很可能在整个进化过程中和整个生命树之间发生。然而,自信地识别从头出现的基因仍然具有挑战性。祖先序列重建(ASR)是推断基因是否从头出现的一种有前途的方法,因为它可以使我们检查给定的基因组基因座是否具有祖先的蛋白质编码能力。然而,在从头出现的背景下使用ASR仍处于起步阶段,局限性,总体潜力在很大程度上是未知的。值得注意的是,很难正式评估祖先序列的蛋白质编码能力,特别是当新的候选基因很短的时候。ASR作为检测和研究从头基因的工具有多合适?在这里,我们通过设计包含不同工具和参数集的ASR工作流程,并通过引入允许估计的正式标准来解决这个问题,在理想的信心水平内,当蛋白质编码能力起源于特定基因座时。将此工作流程应用于2,600短,注释出芽酵母基因(<1,000个核苷酸),我们发现ASR有力地预测了最广泛保守基因的古老起源,这构成了“简单”的案例。对于不太稳健的情况,我们计算了一个基于随机化的经验P值,估计观察到的现存阅读框和祖先阅读框之间的保守性是否可以归因于偶然性.这个正式的标准使我们能够为大多数不太可靠的案例确定一个起源分支,鉴定了自酵母属分裂以来可以明确认为从头起源的49个基因,包括37个酿酒酵母特异性基因。我们发现,对于其余的模棱两可的情况,我们不能排除不同的进化场景,包括快速进化和多重损失,或最近的从头起源。总的来说,我们的研究结果表明,ASR是研究从头基因出现的一个有价值的工具,但应谨慎应用,并意识到其局限性.
    New protein-coding genes can evolve from previously noncoding genomic regions through a process known as de novo gene emergence. Evidence suggests that this process has likely occurred throughout evolution and across the tree of life. Yet, confidently identifying de novo emerged genes remains challenging. Ancestral sequence reconstruction is a promising approach for inferring whether a gene has emerged de novo or not, as it allows us to inspect whether a given genomic locus ancestrally harbored protein-coding capacity. However, the use of ancestral sequence reconstruction in the context of de novo emergence is still in its infancy and its capabilities, limitations, and overall potential are largely unknown. Notably, it is difficult to formally evaluate the protein-coding capacity of ancestral sequences, particularly when new gene candidates are short. How well-suited is ancestral sequence reconstruction as a tool for the detection and study of de novo genes? Here, we address this question by designing an ancestral sequence reconstruction workflow incorporating different tools and sets of parameters and by introducing a formal criterion that allows to estimate, within a desired level of confidence, when protein-coding capacity originated at a particular locus. Applying this workflow on ∼2,600 short, annotated budding yeast genes (<1,000 nucleotides), we found that ancestral sequence reconstruction robustly predicts an ancient origin for the most widely conserved genes, which constitute \"easy\" cases. For less robust cases, we calculated a randomization-based empirical P-value estimating whether the observed conservation between the extant and ancestral reading frame could be attributed to chance. This formal criterion allowed us to pinpoint a branch of origin for most of the less robust cases, identifying 49 genes that can unequivocally be considered de novo originated since the split of the Saccharomyces genus, including 37 Saccharomyces cerevisiae-specific genes. We find that for the remaining equivocal cases we cannot rule out different evolutionary scenarios including rapid evolution, multiple gene losses, or a recent de novo origin. Overall, our findings suggest that ancestral sequence reconstruction is a valuable tool to study de novo gene emergence but should be applied with caution and awareness of its limitations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Leukocyte telomere length is believed to measure cellular aging in humans, and short leukocyte telomere length is associated with increased risks of late onset diseases, including cardiovascular disease, dementia, etc. Many studies have shown that leukocyte telomere length is a heritable trait, and several candidate genes have been identified, including TERT, TERC, OBFC1, and CTC1. Unlike most studies that have focused on genetic causes of chronic diseases such as heart disease and diabetes in relation to leukocyte telomere length, the present study examined the genome to identify variants that may contribute to variation in leukocyte telomere length among families with exceptional longevity. From the genome wide association analysis in 4,289 LLFS participants, we identified a novel intergenic SNP rs7680468 located near PAPSS1 and DKK2 on 4q25 (p = 4.7E-8). From our linkage analysis, we identified two additional novel loci with HLOD scores exceeding three, including 4.77 for 17q23.2, and 4.36 for 10q11.21. These two loci harbor a number of novel candidate genes with SNPs, and our gene-wise association analysis identified multiple genes, including DCAF7, POLG2, CEP95, and SMURF2 at 17q23.2; and RASGEF1A, HNRNPF, ANF487, CSTF2T, and PRKG1 at 10q11.21. Among these genes, multiple SNPs were associated with leukocyte telomere length, but the strongest association was observed with one contiguous haplotype in CEP95 and SMURF2. We also show that three previously reported genes-TERC, MYNN, and OBFC1-were significantly associated with leukocyte telomere length at p empirical < 0.05.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    高通量阵列技术的使用在各个领域无处不在,特别是,疾病的早期诊断,传染病的发现,寻找生物标记和筛选潜在的候选药物。这里,我们将基因表达数据与基于网络的方法整合,以鉴定通过与许多差异表达的乳腺癌基因相互连接而在网络中发挥核心作用的新基因.将从乳腺癌基因数据库(BCGD)检索的62个癌基因映射到从斯坦福微阵列数据库(SMD)访问的标准化数据中,以分析其模式。构建了每个基因的相互作用网络,以了解系统水平的转移生物学。各个网络融合在一起,用于检测交互中心,发现38个新基因与中央枢纽节点深度混合。进行了基因本体论研究,不仅通过基因排名,而且通过在0.05的显着性水平上应用超几何检验和BenjaminiHochberg错误发现率(FDR)校正方法来描述枢纽节点的生物学。统计检验的p值分析表明,大多数新基因与信号传导等无序基因具有相同的生物学功能。转录调节剂,酶结合,分子转导和受体信号蛋白活性和与MAPK信号相同的途径,细胞凋亡,Wnt信令,ErbB信号传导和细胞周期。最后,我们确定了3个新基因CHUK,INSR和CREBBP显示与文献中报道的12个新基因以及扰动基因的高度连接。因此,这些基因可以被认为是揭示乳腺癌的基础和途径的重要发现。
    The use of high-throughput array technology is omnipresent in diverse areas specifically, early diagnosis of disease, discovery of infectious agents, search for biological markers and screening of potential drug candidates. Here, we integrated gene expression data with the network-based approach to identify novel genes that were playing central role in the network through interconnecting to a number of differentially expressed breast cancer genes. The 62 cancerous genes retrieved from the Breast Cancer Gene Database (BCGD) were mapped in the normalized data accessed from Stanford Microarray Database (SMD) to analyze their pattern. Interaction networks for each gene were constructed to understand the biology of the metastasis at systems level. The individual networks were fused together for the detection of interacting hubs, 38 novel genes were found to be deeply intermingled with the central hub node. Gene Ontology studies were made to depict the biology of the hub nodes not alone through gene ranking but by applying the Hyper geometric test with the Benjamini Hochberg False Discovery Rate (FDR) correction method at a significance level of 0.05. Analyzing p-values from the statistical test indicated that most of the novel genes were involved in the same biological function as the disordered genes like signal transducer, transcription regulator, enzyme binding, molecular transducer and receptor signaling protein activity and same pathway as MAPK signaling, Apoptosis, Wnt Signaling, ErbB signaling and Cell Cycle. Lastly, we identified 3 novel genes CHUK, INSR and CREBBP showing high connections with the 12 novel genes reported in literatures as well with the perturbed genes. As a result, these genes can be considered as significant finding in revealing the basis and pathways responsible for breast cancer.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号