reference genome

参考基因组
  • 文章类型: Journal Article
    在结核分枝杆菌(MTB)控制中,基于全基因组测序的分子药物敏感性测试(molDST-WGS)已成为一个关键工具.然而,当前对单应变参考的依赖限制了molDST-WGS的真正潜力。为了解决这个问题,我们引入了一个新的全谱系参考基因组,\"MtbRf\"。我们将来自3,614个MTB基因组(751L1;881L2;1,700L3;和282L4)的“未映射”读段组装成35个共享的,注释重叠群(54个CDS)。我们通过以下方式构建了MtbRf:1)在基因组数据库中搜索重叠群同源物,从而在分枝杆菌属中产生独特的结果;2)将基因组与H37Rv(“提升”)进行比较以定义18个插入;3)用插入填补H37Rv中的空白。MtbRf将1.18%的序列添加到H37rv,挽救>60%以前未映射的读取。转录组学证实了新CDS的基因表达。新的变体提供了中等的DST预测值(AUROC0.60-0.75)。因此,MtbRf揭示了以前隐藏的基因组信息,并为谱系特异性melDST-WGS奠定了基础。
    In Mycobacterium tuberculosis (MTB) control, whole genome sequencing-based molecular drug susceptibility testing (molDST-WGS) has emerged as a pivotal tool. However, the current reliance on a single-strain reference limits molDST-WGS\'s true potential. To address this, we introduce a new pan-lineage reference genome, \"MtbRf\". We assembled \"unmapped\" reads from 3,614 MTB genomes (751 L1; 881 L2; 1,700 L3; and 282 L4) into 35 shared, annotated contigs (54 CDSs). We constructed MtbRf through: 1) searching for contig homologs among genome database that precipitating results uniquely within Mycobacteria genus; 2) comparing genomes with H37Rv (\"lift-over\") to define 18 insertions; and 3) filling gaps in H37Rv with insertions. MtbRf adds 1.18% sequences to H37rv, salvaging >60% of previously unmapped reads. Transcriptomics confirmed gene-expression of new CDSs. The new variants provided a moderate DST predictive value (AUROC 0.60-0.75). MtbRf thus unveils previously hidden genomic information, and lays the foundation for lineage-specific molDST-WGS.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    菠萝是全球第三重要的热带水果,有五个品种。迄今为止,不同菠萝品种的基因组已经发布;然而,它们都不是完整的,它们都表现出很大的差距,只代表了五个菠萝品种中的两个。这大大阻碍了菠萝育种工作的进展。在这项研究中,我们对三个品种的基因组进行了测序:一个野生菠萝品种,纤维菠萝品种,和全球栽培的可食用菠萝品种。我们构建了菠萝的第一个无缺口参考基因组(Ref)。通过整合多个证据来源并手动修改每个基因结构注释,我们确定了26,656个蛋白质编码基因。BUSCO评估显示的完整性为99.2%,证明了该基因组中基因结构注释的高质量。利用这些资源,我们确定了三个品种的7209个结构变异。大约30.8%的菠萝基因位于±5kb的结构变异内,包括30个与花青素合成相关的基因。进一步的分析和功能实验表明,AcMYB528的高表达与叶中花色苷的积累相一致,两者都可能受到1.9kb插入片段的影响。此外,我们开发了Ananas基因组数据库,提供数据浏览,检索,分析,和下载功能。该数据库的构建解决了菠萝基因组资源数据库的缺乏。总之,我们获得了具有高质量基因结构注释的无缝菠萝参考基因组,为菠萝基因组学奠定了坚实的基础,为菠萝育种提供了有价值的参考。
    Pineapple is the third most crucial tropical fruit worldwide and available in five varieties. Genomes of different pineapple varieties have been released to date; however, none of them are complete, with all exhibiting substantial gaps and representing only two of the five pineapple varieties. This significantly hinders the advancement of pineapple breeding efforts. In this study, we sequenced the genomes of three varieties: a wild pineapple variety, a fiber pineapple variety, and a globally cultivated edible pineapple variety. We constructed the first gap-free reference genome (Ref) for pineapple. By consolidating multiple sources of evidence and manually revising each gene structure annotation, we identified 26,656 protein-coding genes. The BUSCO evaluation indicated a completeness of 99.2%, demonstrating the high quality of the gene structure annotations in this genome. Utilizing these resources, we identified 7,209 structural variations across the three varieties. Approximately 30.8% of pineapple genes were located within ±5 kb of structural variations, including 30 genes associated with anthocyanin synthesis. Further analysis and functional experiments demonstrated that the high expression of AcMYB528 aligns with the accumulation of anthocyanins in the leaves, both of which may be affected by a 1.9-kb insertion fragment. In addition, we developed the Ananas Genome Database, which offers data browsing, retrieval, analysis, and download functions. The construction of this database addresses the lack of pineapple genome resource databases. In summary, we acquired a seamless pineapple reference genome with high-quality gene structure annotations, providing a solid foundation for pineapple genomics and a valuable reference for pineapple breeding.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    胎生鱼是横跨东北大西洋的常见鱼类,已成功地在环境梯度上定居了栖息地。由于其广泛的分布和可预测的对污染的表型反应,Z.viviparus被用作生物指示生物,几十年来,几个国家一直定期采样以监测海洋环境健康。此外,该物种是研究与环境变化相关的适应过程的有前途的模型,特别是全球变暖。这里,我们报道了Z.viviparus的染色体水平基因组组装,其具有663兆碱基对(mbp)的大小并且由607个支架(N50=26mbp)组成。最大的24条代表单倍体Z。viviparus基因组的24条染色体,其中包含98%的完整BUSCO定义为鱼鳍鱼,表示程序集是高度连续和完整的。Z.Viviparus装配和另外两个eelpout物种的染色体水平基因组之间的比较分析显示出很高的同系性,但也是Z.viviparus基因组中重复元素的积累。我们的参考基因组将是一个重要的资源,能够在未来对这个重要的生物指示物种的环境变化的影响进行深入的基因组分析。
    The viviparous eelpout Zoarces viviparus is a common fish across the North Atlantic and has successfully colonized habitats across environmental gradients. Due to its wide distribution and predictable phenotypic responses to pollution, Z. viviparus is used as an ideal marine bioindicator organism and has been routinely sampled over decades by several countries to monitor marine environmental health. Additionally, this species is a promising model to study adaptive processes related to environmental change, specifically global warming. Here, we report the chromosome-level genome assembly of Z. viviparus, which has a size of 663 Mb and consists of 607 scaffolds (N50 = 26 Mb). The 24 largest represent the 24 chromosomes of the haploid Z. viviparus genome, which harbors 98% of the complete Benchmarking Universal Single-Copy Orthologues defined for ray-finned fish, indicating that the assembly is highly contiguous and complete. Comparative analyses between the Z. viviparus assembly and the chromosome-level genomes of two other eelpout species revealed a high synteny, but also an accumulation of repetitive elements in the Z. viviparus genome. Our reference genome will be an important resource enabling future in-depth genomic analyses of the effects of environmental change on this important bioindicator species.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Artemisiaargyi,菊科蒿属的多年生草本植物,在中国传统医学中具有重要意义,称为“Aicao”。这里,我们报告了艾蒿的高质量参考基因组。贝艾,基因组大小高达4.15Gb,重叠群N50为508.96Kb,用第三代纳米孔测序技术生产。我们预测了147,248个蛋白质编码基因,大约68.86%的组装序列包含重复元素,主要是长末端重复反转录转座子(LTR)。比较基因组学分析表明,阿吉的特定基因家族数量最多,为5121个,并且具有四个或四个以上成员的家族比其他6个植物物种多得多,这与它更扩展的基因家族和更少的签约基因家族是一致的。此外,通过对A.argyi响应于外源MeJA处理的转录组测序,我们已经阐明了对MeJA对苯丙素的影响的获得的监管见解,类黄酮,和萜类生物合成途径。在这项研究中获得的全基因组信息为深入研究A.argyi的栽培和分子育种提供了宝贵的资源。此外,它有望增强菊科其他成员的基因组组装。关键基因的鉴定为开发具有高浓度活性化合物的蒿新品种奠定了坚实的基础。
    Artemisia argyi, a perennial herb of the genus Artemisia in the family Asteraceae, holds significant importance in Chinese traditional medicine, referred to as \"Aicao\". Here, we report a high-quality reference genome of Artemisia argyi L. cv. beiai, with a genome size up to 4.15 Gb and a contig N50 of 508.96 Kb, produced with third-generation Nanopore sequencing technology. We predicted 147,248 protein-coding genes, with approximately 68.86% of the assembled sequences comprising repetitive elements, primarily long terminal repeat retrotransposons(LTRs). Comparative genomics analysis shows that A. argyi has the highest number of specific gene families with 5121, and much more families with four or more members than the other 6 plant species, which is consistent with its more expanded gene families and fewer contracted gene families. Furthermore, through transcriptome sequencing of A. argyi in response to exogenous MeJA treatment, we have elucidated acquired regulatory insights into MeJA\'s impact on the phenylpropanoid, flavonoid, and terpenoid biosynthesis pathways of A. argyi. The whole-genome information obtained in this study serves as a valuable resource for delving deeper into the cultivation and molecular breeding of A. argyi. Moreover, it holds promise for enhancing genome assemblies across other members of the Asteraceae family. The identification of key genes establishes a solid groundwork for developing new varieties of Artemisia with elevated concentrations of active compounds.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Finnhorse是芬兰的本土和国家马品种,与北欧和亚洲马具有遗传亲和力。它对农业具有历史意义,森林工作和运输以及作为战马。Finnhorse在研究书中有四个繁殖部分,并且正在进行保护和表征工作。我们使用PacBio和Omni-C数据对工作马部分中的Finnhorse母马的基因组进行了测序和注释。该基因组可以补充现有的纯种参考基因组(EquCab3.0),并促进欧亚大陆北部马的遗传研究。我们组装了2.4Gb的基因组,N50支架长度为83.8Mb,基因组注释产生了总共19748个蛋白质编码基因,其中1200个是Finnhorse特异性的。该装配体具有高质量和与当前马参考基因组的同质性。我们手动策划了五个感兴趣的基因,并将最终组装件以登录号保存在欧洲核苷酸档案中。PRJEB71364.
    Finnhorse is Finland\'s native and national horse breed and it has genetic affinities to northern European and Asian horses. It has historical importance for agriculture, forest work and transport and as a war horse. Finnhorse has four breeding sections in the studbook and is under conservation and characterisation efforts. We sequenced and annotated the genome of a Finnhorse mare from the working horse section using PacBio and Omni-C data. This genome can complement the existing Thoroughbred reference genome (EquCab 3.0) and facilitate genetic studies of horses from northern Eurasia. We assembled 2.4 Gb of the genome with an N50 scaffold length of 83.8 Mb and the genome annotation resulted in a total of 19 748 protein coding genes of which 1200 were Finnhorse specific. The assembly has high quality and synteny with the current horse reference genome. We manually curated five genes of interest and deposited the final assembly in the European Nucleotide Archive under the accession no. PRJEB71364.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    蜜蜂是自然界中不可或缺的传粉媒介,具有举足轻重的生态,经济,和科学价值。然而,Apismellifera的全长转录组,采用先进的第三代纳米孔测序技术,尚未报告。这里,对未接种和Nosemaceranae接种的A.mellifera工人的中肠组织进行了纳米孔测序,然后基于高质量的长读数构建和注释全长转录组。接下来是A.mellifera的当前参考基因组的序列和注释的改进。在接种N.ceranae和10dpi后7天,从工人的腹部产生了总共5,942,745和6,664,923个原始读数,而7,100,161和6,506,665个原始读数是从相应的未接种工人的肠道生成的。经过严格的质量控制,获得了6,928,170、6,353,066、5,745,048和6,416,987个清洁读数,长度分布范围从1kb到10kb。此外,分别检测到16,824,17,708,15,744和18,246个全长转录本,包括28,019个非冗余的。其中,43,666、30,945、41,771、26,442和24,532个全长转录本可以注释到Nr,KOG,eggNOG,GO,和KEGG数据库,分别。此外,首次鉴定出501个新基因(20,326个新转录本),其中401(20,255),193(13,365),414(19,186),228(12,093),和202(11,703)分别注释到上述五个数据库中的每一个。通过RT-PCR和Sanger测序证实了三种随机选择的新转录物的表达和序列。2082个基因的5个UTR,2029个基因的3个UTR,730个基因的5'和3'UTR均被扩展。此外,17,345SSR,14,789个完整的ORF,1224长非编码RNA(lncRNAs),检测到37个家族的650个转录因子(TFs)。这项工作的发现不仅完善了A.mellifera参考基因组的注释,而且为相关的分子和组学研究提供了宝贵的资源和基础。
    Honeybees are an indispensable pollinator in nature with pivotal ecological, economic, and scientific value. However, a full-length transcriptome for Apis mellifera, assembled with the advanced third-generation nanopore sequencing technology, has yet to be reported. Here, nanopore sequencing of the midgut tissues of uninoculated and Nosema ceranae-inoculated A. mellifera workers was conducted, and the full-length transcriptome was then constructed and annotated based on high-quality long reads. Next followed improvement of sequences and annotations of the current reference genome of A. mellifera. A total of 5,942,745 and 6,664,923 raw reads were produced from midguts of workers at 7 days post-inoculation (dpi) with N. ceranae and 10 dpi, while 7,100,161 and 6,506,665 raw reads were generated from the midguts of corresponding uninoculated workers. After strict quality control, 6,928,170, 6,353,066, 5,745,048, and 6,416,987 clean reads were obtained, with a length distribution ranging from 1 kb to 10 kb. Additionally, 16,824, 17,708, 15,744, and 18,246 full-length transcripts were respectively detected, including 28,019 nonredundant ones. Among these, 43,666, 30,945, 41,771, 26,442, and 24,532 full-length transcripts could be annotated to the Nr, KOG, eggNOG, GO, and KEGG databases, respectively. Additionally, 501 novel genes (20,326 novel transcripts) were identified for the first time, among which 401 (20,255), 193 (13,365), 414 (19,186), 228 (12,093), and 202 (11,703) were respectively annotated to each of the aforementioned five databases. The expression and sequences of three randomly selected novel transcripts were confirmed by RT-PCR and Sanger sequencing. The 5\' UTR of 2082 genes, the 3\' UTR of 2029 genes, and both the 5\' and 3\' UTRs of 730 genes were extended. Moreover, 17,345 SSRs, 14,789 complete ORFs, 1224 long non-coding RNAs (lncRNAs), and 650 transcription factors (TFs) from 37 families were detected. Findings from this work not only refine the annotation of the A. mellifera reference genome, but also provide a valuable resource and basis for relevant molecular and -omics studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    VacciniumfloribundumKunth,被称为“莫尔蒂尼奥”,“是安第斯地区的特有灌木物种,适应高海拔生态系统的恶劣条件。在森林砍伐和人为引起的火灾之后,它作为先锋物种在Paramo生态系统中发挥着重要的生态作用,强调其保护价值。虽然以前的研究提供了对mortiño遗传多样性的见解,仍然缺少全面的基因组研究来充分了解该物种的独特适应性及其种群状况,强调了为这种植物生成参考基因组的重要性。使用ONT和Illumina测序来建立该物种的参考基因组。产生了三种不同的从头基因组组装体,并比较了质量,连续性和完整性。通过过滤掉短的ONT读数,Flye组件被选为最佳和完善的组件,筛选污染物和基因组支架。最终组装的基因组大小为529MB,包含1,317个重叠群和97%完整的BUSCOs,表明基因组的高度完整性。此外,12.93的LAI指数进一步将该组件分类为参考基因组。本研究报告的V.floribundum基因组是该物种产生的第一个参考基因组,为进一步的研究提供了一个有价值的工具。这个高质量的基因组,根据获得的质量和完整性参数,不仅有助于揭示其独特性状和适应高海拔生态系统的遗传机制,但也将有助于安第斯山脉特有物种的保护策略。
    Vaccinium floribundum Kunth, known as \"mortiño,\" is an endemic shrub species of the Andean region adapted to harsh conditions in high-altitude ecosystems. It plays an important ecological role as a pioneer species in the aftermath of deforestation and human-induced fires within paramo ecosystems, emphasizing its conservation value. While previous studies have offered insights into the genetic diversity of mortiño, comprehensive genomic studies are still missing to fully understand the unique adaptations of this species and its population status, highlighting the importance of generating a reference genome for this plant. ONT and Illumina sequencing were used to establish a reference genome for this species. Three different de novo genome assemblies were generated and compared for quality, continuity and completeness. The Flye assembly was selected as the best and refined by filtering out short ONT reads, screening for contaminants and genome scaffolding. The final assembly has a genome size of 529 Mb, containing 1,317 contigs and 97% complete BUSCOs, indicating a high level of integrity of the genome. Additionally, the LTR Assembly Index of 12.93 further categorizes this assembly as a reference genome. The genome of V. floribundum reported in this study is the first reference genome generated for this species, providing a valuable tool for further studies. This high-quality genome, based on the quality and completeness parameters obtained, will not only help uncover the genetic mechanisms responsible for its unique traits and adaptations to high-altitude ecosystems but will also contribute to conservation strategies for a species endemic to the Andes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    公开可用的存储库中的RNA-seq数据能够有效地重新分析现有实验中的转录物丰度。图形用户界面通常仅允许对单个基因和预定义实验的视觉检查。这里,我们描述了如何从序列读取存档或欧洲核苷酸存档中选择实验,数据如何有效地映射到参考转录组,以及如何检查全球成绩单的丰度和模式。我们示例性地应用此分析管道来研究光合生物中光呼吸相关基因的表达,比如蓝细菌,并确定光呼吸转录物丰度增强的条件。
    RNA-seq data in publicly available repositories enable the efficient reanalysis of transcript abundances in existing experiments. Graphical user interfaces usually only allow the visual inspection of a single gene and of predefined experiments. Here, we describe how experiments are selected from the Sequence Read Archive or the European Nucleotide Archive, how data is efficiently mapped onto a reference transcriptome, and how global transcript abundances and patterns are inspected. We exemplarily apply this analysis pipeline to study the expression of photorespiration-related genes in photosynthetic organisms, such as cyanobacteria, and to identify conditions under which photorespiratory transcript abundances are enhanced.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    这里介绍了罕见的非洲窄特有苔藓PhyscomitrellopsisAfricana的第一个染色体尺度参考基因组。从73x纳米孔长读数和163xBGI-seq短读数组装,414Mb参考包含26条染色体和22,925个蛋白质编码基因(BUSCO:C:94.8%[D:13.9%])。这个基因组包含两个基因,它们经受住了微生物污染物的严格过滤,在其他陆地植物中没有同源物,因此被解释为来自微生物的两个独特的水平基因转移。Further,在Physcomitriumpatens中确定的273种已发布的HGT候选人中,PhyscomitrellopsisAfricana共享176种,但是缺少98个,强调,在过去的4000万年中,在P.patens与非洲P.furicana的共同祖先不同之后,可能获得了多达91个基因。这些观察结果表明,通过HGT获得相当连续的基因,然后是潜在的损失,在真菌科的多样化过程中。我们的发现展示了植物HGTs在进化“短”时间尺度上的动态通量,除了成功整合的持久影响,就像那些仍然在现存的非洲物理中保持功能的人一样。此外,这项研究描述了用于区分污染物和候选HGT事件的信息过程。
    The first chromosome-scale reference genome of the rare narrow-endemic African moss Physcomitrellopsis africana (P. africana) is presented here. Assembled from 73 × Oxford Nanopore Technologies (ONT) long reads and 163 × Beijing Genomics Institute (BGI)-seq short reads, the 414 Mb reference comprises 26 chromosomes and 22,925 protein-coding genes [Benchmarking Universal Single-Copy Ortholog (BUSCO) scores: C:94.8% (D:13.9%)]. This genome holds 2 genes that withstood rigorous filtration of microbial contaminants, have no homolog in other land plants, and are thus interpreted as resulting from 2 unique horizontal gene transfers (HGTs) from microbes. Further, P. africana shares 176 of the 273 published HGT candidates identified in Physcomitrium patens (P. patens), but lacks 98 of these, highlighting that perhaps as many as 91 genes were acquired in P. patens in the last 40 million years following its divergence from its common ancestor with P. africana. These observations suggest rather continuous gene gains via HGT followed by potential losses during the diversification of the Funariaceae. Our findings showcase both dynamic flux in plant HGTs over evolutionarily \"short\" timescales, alongside enduring impacts of successful integrations, like those still functionally maintained in extant P. africana. Furthermore, this study describes the informatic processes employed to distinguish contaminants from candidate HGT events.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    超过4亿年的历史,蝎子代表着一群古老的蜘蛛,也是第一批适应陆地生活的动物之一。目前,蝎子缺乏可用的基因组阻碍了对它们进化的研究。这项研究利用超长纳米孔测序和Pore-C来生成沙漠多毛蝎子的第一个染色体水平组装和注释,阿拉伯哈德鲁.组装的基因组大小为2.23Gb,N50为280Mb。Pore-C支架将99.6%的碱基重新定向到9条染色体中,BUSCO鉴定出998(98.6%)完整的节肢动物单拷贝直系同源物。重复元素占组装底座的54.69%,包括872,874(29.39%)线元素。共预测了18,996个蛋白质编码基因和75,256个转录本,提取的蛋白质序列获得了97.2%的BUSCO评分。这是哈氏科家族中第一个组装和注释的基因组,代表了缩小蝎子基因组知识差距的关键资源,解决蜘蛛系统发育,并推进比较和功能基因组学的研究。
    Over 400 million years old, scorpions represent an ancient group of arachnids and one of the first animals to adapt to life on land. Presently, the lack of available genomes within scorpions hinders research on their evolution. This study leverages ultralong nanopore sequencing and Pore-C to generate the first chromosome-level assembly and annotation for the desert hairy scorpion, Hadrurus arizonensis. The assembled genome is 2.23 Gb in size with an N50 of 280 Mb. Pore-C scaffolding reoriented 99.6% of bases into nine chromosomes and BUSCO identified 998 (98.6%) complete arthropod single copy orthologs. Repetitive elements represent 54.69% of the assembled bases, including 872,874 (29.39%) LINE elements. A total of 18,996 protein-coding genes and 75,256 transcripts were predicted, and extracted protein sequences yielded a BUSCO score of 97.2%. This is the first genome assembled and annotated within the family Hadruridae, representing a crucial resource for closing gaps in genomic knowledge of scorpions, resolving arachnid phylogeny, and advancing studies in comparative and functional genomics.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号