genome annotations

  • 文章类型: Journal Article
    了解遗传变异如何影响分子表型是功能基因组学的关键目标。目前受到依赖单个单倍体参考基因组的阻碍。这里,我们提供了来自四个供体的1,635个开放获取数据集的EN-TEx资源(~30个组织×~15个测定)。数据集映射到匹配,具有长读段定相和结构变异的二倍体基因组,实例化>100万个等位基因特异性基因座的目录。这些基因座沿着单倍型表现出协调的活性,并且比相应的保守性低,非等位基因特异性的。令人惊讶的是,深度学习转换模型可以仅基于局部核苷酸序列上下文来预测等位基因特异性活性,强调对变体特别敏感的转录因子结合基序的重要性。此外,将EN-TEx与现有的基因组注释相结合,揭示了等位基因特异性和GWAS基因座之间的强关联。它还支持将已知的eQTL转移到难以描述的组织的模型(例如,从皮肤到心脏)。总的来说,EN-TEx为更准确的个人功能基因组学提供丰富的数据和可推广的模型。
    Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × ∼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Thlaspiarvense(田间pennycress)被驯化为冬季年度油料作物,能够在不增加土地利用的情况下改善生态系统并提高农业生产力。它是一个生命周期短的自交二倍体,适合遗传操作,使其成为遗传学和表观遗传学的基于领域的模型物种。高质量参考基因组的可用性对于了解洋菜生理学和阐明其在十字花科中的进化史至关重要。这里,我们提出了var的染色体水平基因组组装。MN106-Ref具有改进的基因注释,并将其用于研究两个高度适合遗传转化的种质(MN108和Spring32-10)之间的基因结构差异。我们描述了非编码RNA,假基因和转座因子,并突出组织特异性表达和甲基化模式。对40个野生种质的重新测序提供了对全基因组遗传变异的见解,并鉴定了幼苗颜色表型的QTL区域。总之,这些数据将作为pennycrip的一般改进和跨十字花科的转化研究的工具。
    Thlaspi arvense (field pennycress) is being domesticated as a winter annual oilseed crop capable of improving ecosystems and intensifying agricultural productivity without increasing land use. It is a selfing diploid with a short life cycle and is amenable to genetic manipulations, making it an accessible field-based model species for genetics and epigenetics. The availability of a high-quality reference genome is vital for understanding pennycress physiology and for clarifying its evolutionary history within the Brassicaceae. Here, we present a chromosome-level genome assembly of var. MN106-Ref with improved gene annotation and use it to investigate gene structure differences between two accessions (MN108 and Spring32-10) that are highly amenable to genetic transformation. We describe non-coding RNAs, pseudogenes and transposable elements, and highlight tissue-specific expression and methylation patterns. Resequencing of forty wild accessions provided insights into genome-wide genetic variation, and QTL regions were identified for a seedling colour phenotype. Altogether, these data will serve as a tool for pennycress improvement in general and for translational research across the Brassicaceae.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Combining integrative genomics and systems biology approaches has revealed new and conserved features in the genome of human herpesvirus 6.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    人类疱疹病毒-6(HHV-6)A和B是普遍存在的β疱疹病毒,感染大多数人口。它们包含大型基因组,我们对它们的蛋白质编码潜力的理解还很不完整。这里,我们采用核糖体谱分析和系统转录本分析,通过实验确定HHV-6翻译产物.我们确定了数百个新的开放阅读框架(ORF),包括上游ORF(uORF)和内部ORF(iORF),生成完整的HHV-6蛋白质组无偏图谱。通过整合原型β疱疹病毒的系统数据,人巨细胞病毒,我们发现了许多在β疱疹病毒中保守的uORF和iORF,并且我们表明uORF富含晚期病毒基因。我们确定了三种高度丰富的HHV-6编码的长链非编码RNA,其中之一产生一个非聚腺苷酸化的稳定内含子,似乎是β疱疹病毒的保守特征。总的来说,我们的工作揭示了HHV-6基因组的复杂性,并突出了β疱疹病毒之间保守的新特征,为未来的功能研究提供了丰富的资源。
    Human herpesvirus-6 (HHV-6) A and B are ubiquitous betaherpesviruses, infecting the majority of the human population. They encompass large genomes and our understanding of their protein coding potential is far from complete. Here, we employ ribosome-profiling and systematic transcript-analysis to experimentally define HHV-6 translation products. We identify hundreds of new open reading frames (ORFs), including upstream ORFs (uORFs) and internal ORFs (iORFs), generating a complete unbiased atlas of HHV-6 proteome. By integrating systematic data from the prototypic betaherpesvirus, human cytomegalovirus, we uncover numerous uORFs and iORFs conserved across betaherpesviruses and we show uORFs are enriched in late viral genes. We identified three highly abundant HHV-6 encoded long non-coding RNAs, one of which generates a non-polyadenylated stable intron appearing to be a conserved feature of betaherpesviruses. Overall, our work reveals the complexity of HHV-6 genomes and highlights novel features conserved between betaherpesviruses, providing a rich resource for future functional studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Evaluation Study
    基因组注释仍然是现代生物学的一项基本工作。通过降低成本和新形式的测序技术,特定于组织类型和实验条件的注释不断生成(例如,组蛋白甲基化标记)。计算两个不同注释之间重叠的统计显著性是许多生物学发现的关键,但以前尚未系统地解决。我们将问题形式化如下:让I和If分别描述具有特定注释的基因组的n和m个间隔的集合。在零假设下,I中的基因组间隔相对于If是随机排列的,If的m个区间中的k与I中的区间相交有什么意义?我们描述了一种工具iSTAT,该工具实现了一种组合算法来准确计算p值。我们将iSTAT应用于模拟和真实数据集以获得精确的估计,并使用置换或参数检验将其与先前的结果进行对比。
    Genome annotation remains a fundamental effort in modern biology. With reducing costs and new forms of sequencing technologies, annotations specific to tissue type and experimental conditions are continually being generated (e.g., histone methylation marks). Computing the statistical significance of overlap between two different annotations is key to many biological findings but has not been systematically addressed previously. We formalize the problem as follows: let I and If each describe a collection of n and m intervals of a genome with particular annotation. Under the null hypothesis that genomic intervals in I are randomly arranged with respect to If, what is the significance of k of m intervals of If intersecting with intervals in I? We describe a tool iSTAT that implements a combinatorial algorithm to accurately compute p values. We applied iSTAT to simulated and real datasets to obtain precise estimates and contrasted them against previous results using permutation or parametric tests.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号