Gene annotation

基因注释
  • 文章类型: Journal Article
    JBrowse2是一个模块化的基因组浏览器,可以可视化许多常见的基因组文件格式。虽然JBrowse2支持各种不同的用法,它特别适合在网站上部署,例如模型生物数据库或其他基于网络的基因组数据资源。此协议提供了在UbuntuLinuxWeb服务器上设置JBrowse2的详细说明,从FASTA格式文件加载参考基因组,并从GFF3格式文件中添加基因注释轨迹。到协议结束时,用户将有一个可通过Web访问的JBrowse2实例。©2024作者(S)。WileyPeriodicalsLLC出版的当前协议。基本协议:在Web服务器上设置JBrowse2。
    JBrowse 2 is a modular genome browser that can visualize many common genomic file formats. While JBrowse 2 supports a variety of different usages, it is particularly suited for deployment on websites, such as model organism databases or other web-based genomic data resources. This protocol provides detailed instructions for setting up JBrowse 2 on an Ubuntu Linux web server, loading a reference genome from a FASTA format file, and adding a gene annotation track from a GFF3 format file. By the end of the protocol, users will have a working JBrowse 2 instance that is accessible via the web. © 2024 The Author(s). Current Protocols published by Wiley Periodicals LLC. Basic Protocol: Setting up JBrowse 2 on your web server.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:在卵母细胞和胚胎中表达的基因的适当调节对于获得哺乳动物的发育能力至关重要。这里,我们假设在卵母细胞和植入前胚胎中表达的几个基因仍然未知.我们的目标是使用短读和长读序列重建卵母细胞(生发囊泡和中期II)和植入前牛胚胎(胚泡)的转录组,以鉴定推定的新基因。
    结果:我们鉴定出274,342个转录序列和3,033个这些基因座与官方注释中存在的基因不匹配,因此是潜在的新基因。值得注意的是,63.67%(1,931/3,033)的潜在新基因表现出编码潜力。同样值得注意的是,97.92%的推定新基因与转座因子的注释重叠。转录物丰度的比较分析鉴定出1,840个新基因(最近添加到注释中)或潜在的新基因在发育阶段之间差异表达(FDR<0.01)。我们还确定,与卵母细胞相比,八个细胞胚胎中的522个新的或潜在的新基因(分别为448和34个)上调(FDR<0.01)。在八细胞胚胎中,共表达了102个新的或推定的新基因(|r|>0.85,P<1×10-8),并注释了与多能性维持和胚胎发育有关的基因本体论生物学过程。CRISPR-Cas9基因组编辑证实,在八细胞胚胎中高表达的新基因之一的破坏降低了胚泡发育(ENSBTAG00000068261,P=1.55×10-7)。
    结论:我们的结果揭示了一些推测的新基因,需要仔细注释。许多推定的新基因在植入前发育过程中具有动态调节,并且是参与多能性和胚泡形成的基因调节网络的重要组成部分。
    BACKGROUND: Appropriate regulation of genes expressed in oocytes and embryos is essential for acquisition of developmental competence in mammals. Here, we hypothesized that several genes expressed in oocytes and pre-implantation embryos remain unknown. Our goal was to reconstruct the transcriptome of oocytes (germinal vesicle and metaphase II) and pre-implantation cattle embryos (blastocysts) using short-read and long-read sequences to identify putative new genes.
    RESULTS: We identified 274,342 transcript sequences and 3,033 of those loci do not match a gene present in official annotations and thus are potential new genes. Notably, 63.67% (1,931/3,033) of potential novel genes exhibited coding potential. Also noteworthy, 97.92% of the putative novel genes overlapped annotation with transposable elements. Comparative analysis of transcript abundance identified that 1,840 novel genes (recently added to the annotation) or potential new genes were differentially expressed between developmental stages (FDR < 0.01). We also determined that 522 novel or potential new genes (448 and 34, respectively) were upregulated at eight-cell embryos compared to oocytes (FDR < 0.01). In eight-cell embryos, 102 novel or putative new genes were co-expressed (|r|> 0.85, P < 1 × 10-8) with several genes annotated with gene ontology biological processes related to pluripotency maintenance and embryo development. CRISPR-Cas9 genome editing confirmed that the disruption of one of the novel genes highly expressed in eight-cell embryos reduced blastocyst development (ENSBTAG00000068261, P = 1.55 × 10-7).
    CONCLUSIONS: Our results revealed several putative new genes that need careful annotation. Many of the putative new genes have dynamic regulation during pre-implantation development and are important components of gene regulatory networks involved in pluripotency and blastocyst formation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    近年来,不同植物物种基因组测序的增加为系统基因组学研究提供了显著优势,允许分析植物中最多样化的基因家族之一:核苷酸结合的富含亮氨酸的重复受体(NLR)。然而,由于NLR基因家族的序列多样性,通过多序列比对识别关键分子特征和功能保守的序列模式具有挑战性。这里,我们提出了一个用于计算管道的分步协议,该计算管道旨在鉴定植物NLR蛋白中的进化保守基序。在这个协议中,我们使用大规模的NLR数据集,包括1,862个从单子叶和双子叶植物中注释的NLR基因,为了预测保守的序列基序,如MADA和EDVID图案,在卷曲螺旋(CC)-NLR亚家族内。我们的管道可用于鉴定在植物物种的进化时间内保持在基因家族中保守的分子特征。关键特征•植物NLR免疫受体家族的系统发育组学分析。•植物NLR中功能保守的序列模式的鉴定。
    In recent years, the increase in genome sequencing across diverse plant species has provided a significant advantage for phylogenomics studies, allowing the analysis of one of the most diverse gene families in plants: nucleotide-binding leucine-rich repeat receptors (NLRs). However, due to the sequence diversity of the NLR gene family, identifying key molecular features and functionally conserved sequence patterns is challenging through multiple sequence alignment. Here, we present a step-by-step protocol for a computational pipeline designed to identify evolutionarily conserved motifs in plant NLR proteins. In this protocol, we use a large-scale NLR dataset, including 1,862 NLR genes annotated from monocot and dicot species, to predict conserved sequence motifs, such as the MADA and EDVID motifs, within the coiled-coil (CC)-NLR subfamily. Our pipeline can be applied to identify molecular signatures that have remained conserved in the gene family over evolutionary time across plant species. Key features • Phylogenomics analysis of plant NLR immune receptor family. • Identification of functionally conserved sequence patterns among plant NLRs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    非模式生物的基因组资源开发正在迅速发展,寻求揭示分子机制和进化适应,从而在不同的环境中蓬勃发展。蝙蝠物种有限的基因组数据阻碍了对其进化过程的认识,特别是在Vespertilionidae家族的各种Myotis属中。在墨西哥,有15种Myotis,与三MVivesi,M.Findleyi,和M.planiceps-是地方性和保护关注。
    我们获得了Myotisvevesi的样本,M.Findleyi,和M.planiceps进行基因组分析。提取三个基因组DNA中的每一个,测序,和组装。通过ntJoin程序内的基因组参考方法,利用Yumanensis基因组进行支架。GapCloser被用来填补空白。重复元素被表征,基因预测是通过MAKER管道的从头算和同源性方法进行的。功能注释涉及InterproScan,BLASTp,和KEGG。非编码RNA用INFERNAL注释,和tRNAscan-SE。使用Orthofinder对直系同源基因进行聚类,并使用IQ-TREE重建了系统发育树。
    我们使用IlluminaNovaSeq6000展示了这些特有物种的基因组组装,每个都超过2.0Gb,根据BUSCO分析,超过90%代表单拷贝基因。转座元素,包括线路和犯罪,占每个基因组的30%以上。Helitron,与Vespertilionids一致,已确定。来自三个组件中每一个的约20,000个基因的值来自基因注释及其与特定功能的相关性。八个Myotis物种之间的直系同源物的比较分析显示20,820组,4,789是单副本正交组。注释了非编码RNA元件。系统发育树分析支持进化翼龙关系。这些资源大大有助于理解基因进化,多样化模式,并协助保护这些濒临灭绝的蝙蝠物种。
    UNASSIGNED: Genomic resource development for non-model organisms is rapidly progressing, seeking to uncover molecular mechanisms and evolutionary adaptations enabling thriving in diverse environments. Limited genomic data for bat species hinder insights into their evolutionary processes, particularly within the diverse Myotis genus of the Vespertilionidae family. In Mexico, 15 Myotis species exist, with three-M. vivesi, M. findleyi, and M. planiceps-being endemic and of conservation concern.
    UNASSIGNED: We obtained samples of Myotis vivesi, M. findleyi, and M. planiceps for genomic analysis. Each of three genomic DNA was extracted, sequenced, and assembled. The scaffolding was carried out utilizing the M. yumanensis genome via a genome-referenced approach within the ntJoin program. GapCloser was employed to fill gaps. Repeat elements were characterized, and gene prediction was done via ab initio and homology methods with MAKER pipeline. Functional annotation involved InterproScan, BLASTp, and KEGG. Non-coding RNAs were annotated with INFERNAL, and tRNAscan-SE. Orthologous genes were clustered using Orthofinder, and a phylogenomic tree was reconstructed using IQ-TREE.
    UNASSIGNED: We present genome assemblies of these endemic species using Illumina NovaSeq 6000, each exceeding 2.0 Gb, with over 90% representing single-copy genes according to BUSCO analyses. Transposable elements, including LINEs and SINEs, constitute over 30% of each genome. Helitrons, consistent with Vespertilionids, were identified. Values around 20,000 genes from each of the three assemblies were derived from gene annotation and their correlation with specific functions. Comparative analysis of orthologs among eight Myotis species revealed 20,820 groups, with 4,789 being single copy orthogroups. Non-coding RNA elements were annotated. Phylogenomic tree analysis supported evolutionary chiropterans\' relationships. These resources contribute significantly to understanding gene evolution, diversification patterns, and aiding conservation efforts for these endangered bat species.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    异常球菌属以其对环境压力的非凡韧性而闻名,包括电离辐射,干燥,和氧化损伤。这种弹性归因于其复杂的DNA修复机制和强大的防御系统,使它能够从广泛的破坏中恢复过来,并在极端条件下茁壮成长。异常球菌研究的中心,D.radiodurans菌株ATCCBAA-816和ATCC13939促进了对这种显着的弹性属的广泛研究。这项研究的重点是通过对我们实验室的ATCC13939样本(ATCC13939K)进行测序并将其与ATCCBAA-816并列来描绘这些菌株之间的遗传差异。我们发现了ATCC13939K中的436个DNA序列差异,包括100个单核苷酸变异,278个插入,和58个删除,这可以诱导移码改变蛋白质编码基因。解释基因融合和基因长度协调的基因注释修订揭示了新的蛋白质编码基因,并完善了已建立的功能分类。此外,分析指出了由于插入序列(IS)元件引起的基因组结构变异,强调了D.radiodurans基因组的可塑性。值得注意的是,ATCC13939K相对于BAA-816表现出六个ISDra2元件的丢失,恢复了由ISDra2片段化的基因,例如编码α/β水解酶和丝氨酸蛋白酶的基因,揭示新的开放阅读框架,包括乙二酮分解必需的基因。这项比较基因组研究提供了对D.radiodurans的代谢能力和弹性策略的重要见解。
    The Deinococcus genus is renowned for its remarkable resilience against environmental stresses, including ionizing radiation, desiccation, and oxidative damage. This resilience is attributed to its sophisticated DNA repair mechanisms and robust defense systems, enabling it to recover from extensive damage and thrive under extreme conditions. Central to Deinococcus research, the D. radiodurans strains ATCC BAA-816 and ATCC 13939 facilitate extensive studies into this remarkably resilient genus. This study focused on delineating genetic discrepancies between these strains by sequencing our laboratory\'s ATCC 13939 specimen (ATCC 13939K) and juxtaposing it with ATCC BAA-816. We uncovered 436 DNA sequence differences within ATCC 13939K, including 100 single nucleotide variations, 278 insertions, and 58 deletions, which could induce frameshifts altering protein-coding genes. Gene annotation revisions accounting for gene fusions and the reconciliation of gene lengths uncovered novel protein-coding genes and refined the functional categorizations of established ones. Additionally, the analysis pointed out genome structural variations due to insertion sequence (IS) elements, underscoring the D. radiodurans genome\'s plasticity. Notably, ATCC 13939K exhibited a loss of six ISDra2 elements relative to BAA-816, restoring genes fragmented by ISDra2, such as those encoding for α/β hydrolase and serine protease, and revealing new open reading frames, including genes imperative for acetoin decomposition. This comparative genomic study offers vital insights into the metabolic capabilities and resilience strategies of D. radiodurans.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Oecanthus是板球的一种,以其独特的chi声而闻名,分布在全球主要的动物地理区域。这项研究的重点是Oecanthusrufescens,并通过基因组测序技术和生物信息学分析对其基因组进行全面检查。成功获得了一个高质量染色体水平的苦参基因组,揭示其基因组结构的显著特征。基因组大小为877.9Mb,包含10个假染色体和70个其他序列,GC含量为41.38%,N50值为157,110,771bp,表明高水平的连续性。BUSCO评估结果表明,基因组的完整性和质量很高(其中96.8%是单拷贝,1.6%是重复的)。还进行了全面的基因组注释,鉴定大约310Mb的重复序列,占总基因组序列的35.3%,发现了15481个tRNA基因,4,082个rRNA基因,和1,212个其他非编码基因。此外,鉴定了15,031个蛋白质编码基因,BUSCO评估结果显示98.4%(其中96.3%是单拷贝,1.6%是重复的)的基因被注释。
    Oecanthus is a genus of cricket known for its distinctive chirping and distributed across major zoogeographical regions worldwide. This study focuses on Oecanthus rufescens, and conducts a comprehensive examination of its genome through genome sequencing technologies and bioinformatic analysis. A high-quality chromosome-level genome of O. rufescens was successfully obtained, revealing significant features of its genome structure. The genome size is 877.9 Mb, comprising ten pseudo-chromosomes and 70 other sequences, with a GC content of 41.38% and an N50 value of 157,110,771 bp, indicating a high level of continuity. BUSCO assessment results demonstrate that the genome\'s integrity and quality are high (of which 96.8% are single-copy and 1.6% are duplicated). Comprehensive genome annotation was also performed, identifying approximately 310 Mb of repetitive sequences, accounting for 35.3% of the total genome sequence, and discovering 15,481 tRNA genes, 4,082 rRNA genes, and 1,212 other noncoding genes. Furthermore, 15,031 protein-coding genes were identified, with BUSCO assessment results showing that 98.4% (of which 96.3% are single-copy and 1.6% are duplicated) of the genes were annotated.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    作为一种值得注意的生物防治真菌,Clonostachyschlooleuca目前缺乏高质量的参考基因组。这里,我们介绍了通过牛津纳米孔长读测序获得的第一个高质量的C.chloocholeuca菌株Cc878基因组组装。Cc878的核基因组被组装成四个重叠群,总计59.38Mb。
    As a noteworthy biocontrol fungus, Clonostachys chloroleuca currently lacks a high-quality reference genome. Here, we present the first high-quality genome assembly of C. chloroleuca strain Cc878 achieved through Oxford Nanopore Long-Read sequencing. The nuclear genome of Cc878 was assembled into four contigs, totaling 59.38 Mb.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Kohlrabi是十字花科重要的肿茎卷心菜品种。然而,该属的完整叶绿体基因组序列很少被报道。这里,获得了一个完整的叶绿体基因组,其四部循环为153,364bp。共鉴定出132个基因,包括87个蛋白质编码基因,37个转移RNA基因和8个核糖体RNA基因。碱基组成分析表明,总体GC含量为完整叶绿体基因组序列的36.36%。相对同义密码子使用频率(RSCU)分析表明,大多数密码子的值大于1以A或U结尾,而大多数值小于1的密码子以C或G结尾。鉴定出35个分散重复序列,其中大多数分布在大型单拷贝(LSC)区域。共发现290个简单序列重复序列(SSR),其中188个分布在LSC区域。系统发育关系分析表明,5个甘蓝亚种聚集为一组,大头菜叶绿体基因组与甘蓝亚种密切相关。Botrytis.我们的结果为理解叶绿体依赖性代谢研究提供了基础,并为理解十字花科物种的多倍体化提供了新的见解。
    Kohlrabi is an important swollen-stem cabbage variety belonging to the Brassicaceae family. However, few complete chloroplast genome sequences of this genus have been reported. Here, a complete chloroplast genome with a quadripartite cycle of 153,364 bp was obtained. A total of 132 genes were identified, including 87 protein-coding genes, 37 transfer RNA genes and eight ribosomal RNA genes. The base composition analysis showed that the overall GC content was 36.36% of the complete chloroplast genome sequence. Relative synonymous codon usage frequency (RSCU) analysis showed that most codons with values greater than 1 ended with A or U, while most codons with values less than 1 ended with C or G. Thirty-five scattered repeats were identified and most of them were distributed in the large single-copy (LSC) region. A total of 290 simple sequence repeats (SSRs) were found and 188 of them were distributed in the LSC region. Phylogenetic relationship analysis showed that five Brassica oleracea subspecies were clustered into one group and the kohlrabi chloroplast genome was closely related to that of B. oleracea var. botrytis. Our results provide a basis for understanding chloroplast-dependent metabolic studies and provide new insight for understanding the polyploidization of Brassicaceae species.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目标:OtteliaPers。属于水生科。属中的物种是水生的,中国是他们在亚洲的起源中心。Otteliaalismoides(L.)Pers。,分布在世界各地,是中国的特色元素,而该属的其他物种是中国特有的。然而,由于某些亚洲国家的栖息地丧失和污染,o.alismoides也被认为濒临灭绝。Otteliaalismoides是唯一一种包含三种二氧化碳浓缩机制的浸没式大型植物,即碳酸氢盐(HCO3-)使用,十字草酸代谢和C4途径。在这项研究中,我们提出了它的第一个基因组组装,以帮助说明各种碳代谢机制,并在未来实现遗传保护。
    方法:使用从一片O.alismoides叶中提取的DNA和RNA,这项工作产生了73.4GbHiFi读取,~126.4Gb全基因组测序短读数和~21.9GbRNA-seq读数。从头基因组组装长度为6,455,939,835bp,具有11,923个支架/重叠群,N50为790,733bp。用基准标记通用单拷贝直系同源物进行的基因组组装完整性评估显示得分为94.4%。装配中的重复序列为4,875,817,144bp(75.5%)。总共预测了116,176个基因。蛋白质序列针对多个数据库进行了功能注释,促进比较基因组分析。
    OBJECTIVE: Ottelia Pers. is in the Hydrocharitaceae family. Species in the genus are aquatic, and China is their centre of origin in Asia. Ottelia alismoides (L.) Pers., which is distributed worldwide, is a distinguishing element in China, while other species of this genus are endemic to China. However, O. alismoides is also considered endangered due to habitat loss and pollution in some Asian countries. Ottelia alismoides is the only submerged macrophyte that contains three carbon dioxide-concentrating mechanisms, i.e. bicarbonate (HCO3-) use, crassulacean acid metabolism and the C4 pathway. In this study, we present its first genome assembly to help illustrate the various carbon metabolism mechanisms and to enable genetic conservation in the future.
    METHODS: Using DNA and RNA extracted from one O. alismoides leaf, this work produced ∼ 73.4 Gb HiFi reads, ∼ 126.4 Gb whole genome sequencing short reads and ∼ 21.9 Gb RNA-seq reads. The de novo genome assembly was 6,455,939,835 bp in length, with 11,923 scaffolds/contigs and an N50 of 790,733 bp. Genome assembly completeness assessment with Benchmarking Universal Single-Copy Orthologs revealed a score of 94.4%. The repetitive sequence in the assembly was 4,875,817,144 bp (75.5%). A total of 116,176 genes were predicted. The protein sequences were functionally annotated against multiple databases, facilitating comparative genomic analysis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Preprint
    Dendroctonusfrontalis,也被称为南方松甲虫(SPB),代表美国东南部最具破坏性的森林害虫。预测的策略,监测和抑制SPB爆发的成功有限。基因组数据对于告知害虫生物学和识别分子靶标以开发改进的管理方法至关重要。这里,我们使用长读数测序数据产生了SPB的染色体水平基因组组装.合成分析证实了鞘翅目Stevens核心元件的保守性,并验证了真正的SPBX染色体。转录组数据用于获得39,588个转录本,对应于13,354个推定的蛋白质编码基因座。对14只甲虫和3种其他昆虫的基因含量的比较分析显示,Dendroctonus进化枝中的保守基因丢失,而SPB和Dendroctonus中的基因增加则富含编码膜蛋白和细胞外基质蛋白的基因座。虽然谱系特异性基因丢失导致Dendroctonus中观察到的基因含量减少,我们还表明,转座因子的广泛错误注释是几种非Dendroctonus物种明显基因扩增的主要原因。我们的发现揭示了SPB基因补体的独特特征,并解开了导致甲虫基因含量变异的生物学和注释相关因素的作用。
    Dendroctonus frontalis, also known as southern pine beetle (SPB), represents the most damaging forest pest in the southeastern United States. Strategies to predict, monitor and suppress SPB outbreaks have had limited success. Genomic data are critical to inform on pest biology and to identify molecular targets to develop improved management approaches. Here, we produced a chromosome-level genome assembly of SPB using long-read sequencing data. Synteny analyses confirmed the conservation of the core coleopteran Stevens elements and validated the bona fide SPB X chromosome. Transcriptomic data were used to obtain 39,588 transcripts corresponding to 13,354 putative protein-coding loci. Comparative analyses of gene content across 14 beetle and 3 other insects revealed several losses of conserved genes in the Dendroctonus clade and gene gains in SPB and Dendroctonus that were enriched for loci encoding membrane proteins and extracellular matrix proteins. While lineage-specific gene losses contributed to the gene content reduction observed in Dendroctonus, we also showed that widespread misannotation of transposable elements represents a major cause of the apparent gene expansion in several non-Dendroctonus species. Our findings uncovered distinctive features of the SPB gene complement and disentangled the role of biological and annotation-related factors contributing to gene content variation across beetles.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号