Transcriptome annotation

转录组注释
  • 文章类型: Journal Article
    Onchidorismuricata是一种广泛的无壳无壳软体动物,它具有独特的腹足类骨骼元素-表皮下方解石针状体。针状体的一般和精细形态,以及它们在个体发育中的成熟过程,作者已经详细研究了。针状体的独特性在于它们的细胞内形成和外胚层上皮下的位置,这对于子宫造口术更为典型。我们提出了O.muricata作为研究细胞内蛋白质结构钙化的潜在新模型物种。在白海的Kandalaksha湾收集了96个人,手动和水肺潜水。根据形态特征,如标本大小,将所有个体分为三组。针状状况等.这种划分表明O.muricata的胚胎后个体发育存在三个阶段,反映了针状复合体的成熟。在三个生物学重复中从三个大小组的软体动物中分离总RNA样品。从多腺苷酸化RNA级分制备文库,并在NovaSeq6000(Illumina)测序,总共产生112.8Gb的150bp配对末端读数,对应于转录组的近1000倍覆盖。具有代表性的转录组与三位一体从头组装。除了获得O.muricata的转录组序列,还对这三个大小组进行了差异表达分析.这使我们能够追踪软体动物生命过程中分子和生物过程的动力学。然后,获得的数据可以用作密切相关物种的参考转录组,为了研究特定的表达基因,为了识别各种独特的序列,包括蛋白质编码的,为了理解生物过程,包括生物矿化等等。
    Onchidoris muricata is a widespread shell-less species of nudibranch molluscs, which has unique for Gastropoda skeletal elements - subepidermal calcite spicules. The general and fine morphology of the spicules, as well as their maturation process in ontogenesis, have been studied in detail by authors. The uniqueness of spicules lies in their intracellular formation and location under the ectodermal epithelium, which is more typical for deuterostomes. We present O. muricata as a potentially new model species for studying calcification of intracellular protein structure. A total of 96 individuals were collected in the Kandalaksha Bay of the White Sea, both manually and by scuba diving. All individuals were divided into three groups based on morphological characteristics such as specimens\' size, spicule condition etc. This division suggests the existence of three stages in postembryonic ontogenesis of O. muricata reflecting the maturation of the spicule complex. Total RNA samples were isolated from three size groups of molluscs in three biological replicates. Libraries were prepared from the polyadenylated RNA fraction and sequenced at NovaSeq6000 (Illumina), yielding a total of 112.8 Gb of 150 bp paired-end reads, corresponding to almost 1,000-fold coverage of the transcriptome. Representative transcriptome assembled de novo with Trinity. In addition to obtaining the transcriptome sequences of O. muricata, differential expression analysis was also performed for these three size groups. This allows us to trace the dynamics of molecular and biological processes during the life of a mollusc. The obtained data can then be used as a reference transcriptome for closely related species, to study specific expressed genes, to identify various unique sequences, including protein-coding ones, to understand biological processes, including biomineralization and much more.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在水产养殖网围栏上的生物污染社区中繁荣昌盛的Cnidarian由于其刺痛的细胞而对养殖的有鳍鱼类构成了重大的健康风险。与鱼接触的毒素,在净清洁期间,会对他们的行为产生不利影响,福利,和生存,对the有特别严重的健康风险,引起直接组织损伤,如血栓形成和增加继发感染的风险。水样上钩喉是北欧最常见的污损生物之一。然而,尽管经济意义重大,环境,以及对鳍鱼水产养殖的运营影响,该物种的生物学信息很少,其毒液成分从未被研究过。在这项研究中,我们产生了一个完整的喉肠转录组,并鉴定了其推定的表达毒液毒素蛋白(预测的毒素蛋白,未功能表征)基于计算机转录组注释挖掘和蛋白质序列分析。结果揭示了该水样物种的推定毒素蛋白的广泛多样。它的有毒武器库似乎包括广泛而复杂的毒素蛋白,涵盖了在毒化中起重要作用的大量潜在生物学功能。在这个物种中确定的假定毒素,比如神经毒素,GTP酶毒素,金属蛋白酶毒素,离子通道损害毒素,出血性毒素,丝氨酸蛋白酶毒素,磷脂酶毒素,成孔毒素,多功能毒素可能会对猎物造成各种主要的有害影响,捕食者,和竞争对手。这些结果为刺胞动物的毒液组成提供了有价值的新见解,和有毒的海洋生物,并为进一步研究新的和有价值的生物活性分子提供了新的机会,农艺学和生物技术。
    Cnidarians thriving in biofouling communities on aquaculture net pens represent a significant health risk for farmed finfish due to their stinging cells. The toxins coming into contact with the fish, during net cleaning, can adversely affect their behavior, welfare, and survival, with a particularly serious health risk for the gills, causing direct tissue damage such as formation of thrombi and increasing risks of secondary infections. The hydroid Ectopleura larynx is one of the most common fouling organisms in Northern Europe. However, despite its significant economic, environmental, and operational impact on finfish aquaculture, biological information on this species is scarce and its venom composition has never been investigated. In this study, we generated a whole transcriptome of E. larynx, and identified its putative expressed venom toxin proteins (predicted toxin proteins, not functionally characterized) based on in silico transcriptome annotation mining and protein sequence analysis. The results uncovered a broad and diverse repertoire of putative toxin proteins for this hydroid species. Its toxic arsenal appears to include a wide and complex selection of toxin proteins, covering a large panel of potential biological functions that play important roles in envenomation. The putative toxins identified in this species, such as neurotoxins, GTPase toxins, metalloprotease toxins, ion channel impairing toxins, hemorrhagic toxins, serine protease toxins, phospholipase toxins, pore-forming toxins, and multifunction toxins may cause various major deleterious effects in prey, predators, and competitors. These results provide valuable new insights into the venom composition of cnidarians, and venomous marine organisms in general, and offer new opportunities for further research into novel and valuable bioactive molecules for medicine, agronomics and biotechnology.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    蜘蛛毒液已经进化了数千年,优化喂养和防御机制。毒液成分显示出药理和生物技术潜力,对他们学习的兴趣越来越高。然而,分离蜘蛛毒素进行实验评估提出了重大挑战。为了解决这个问题,结合计算工具的转录组学分析已成为表征蜘蛛毒液的一种吸引人的方法。然而,许多序列在自动注释后仍未识别。在这项研究中,我们从Phoneutrianigriventer转录组中手动筛选了一个以前未注释的序列子集,并鉴定了新的推定毒液成分。我们的人工分析显示,29%的分析序列是潜在的毒液成分,29%的假设/未表征的蛋白质,和17%的细胞功能蛋白。只有25%的最初未注释的数据集没有任何标识。大多数重新分类的成分是富含半胱氨酸的肽,包括23种新型推定毒素。我们还发现了富含甘氨酸的肽(GRP),证实了以前对Phoneutriapertyi毒腺中GRP的描述。此外,为了强调蜘蛛毒腺转录本缺乏注释的复发,我们提供了一些已发表的蜘蛛毒转录组学研究中未识别序列百分比的调查。总之,我们的研究强调了人工管理在发现新毒液成分方面的重要性,并强调需要改进注释策略,以充分利用蜘蛛毒液的医学和生物技术潜力.
    Spider venoms have evolved over thousands of years, optimizing feeding and defense mechanisms. Venom components show pharmacological and biotechnological potential, rising interest in their study. However, the isolation of spider toxins for experimental evaluation poses significant challenges. To address this, transcriptomic analysis combined with computational tools has emerged as an appealing approach to characterizing spider venoms. However, many sequences remain unidentified after automatic annotation. In this study, we manually curated a subset of previously unannotated sequences from the Phoneutria nigriventer transcriptome and identified new putative venom components. Our manual analysis revealed 29 % of the analyzed sequences were potential venom components, 29 % hypothetical/uncharacterized proteins, and 17 % cellular function proteins. Only 25 % of the originally unannotated dataset remained without any identification. Most reclassified components were cysteine-rich peptides, including 23 novel putative toxins. We also found glycine-rich peptides (GRP), corroborating the previous description of GRPs in Phoneutria pertyi venom glands. Furthermore, to emphasize the recurrence of the lack of annotation in spider venom glands transcripts, we provide a survey of the percentage of unidentified sequences in several published spider venom transcriptomics studies. In conclusion, our study highlights the importance of manual curation in uncovering novel venom components and underscores the need for improved annotation strategies to fully exploit the medical and biotechnological potential of spider venoms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    长链非编码RNA(lncRNAs)越来越被认为是各种生物过程中的调节剂。然而,由于他们的低表达,它们的系统表征很难确定。这里,我们通过新开发的计算管道进行了抄本注释,称为RNA-seq和小RNA-seq组合策略(RSCS),在各种各样的细胞环境中。RSCS鉴定了数千个高置信度潜在的新转录本,转录组的可靠性通过转录本结构分析得到验证,基础成分,和序列复杂性。长度比较证明了这一点,核心启动子和聚腺苷酸化信号基序的频率,以及转录起始和结束位点的位置,成绩单似乎是全长的。此外,利用我们的战略,我们鉴定了大量内源性逆转录病毒相关的lncRNAs(ERV-lncRNAs),并鉴定了一种新的ERV-lncRNA,该RNA在功能上参与了Yap1表达的控制,对早期胚胎发生至关重要。总之,RSCS可以产生更完整和精确的转录组,我们的发现极大地扩展了哺乳动物群落的转录组注释。
    Long noncoding RNAs (lncRNAs) are increasingly being recognized as modulators in various biological processes. However, due to their low expression, their systematic characterization is difficult to determine. Here, we performed transcript annotation by a newly developed computational pipeline, termed RNA-seq and small RNA-seq combined strategy (RSCS), in a wide variety of cellular contexts. Thousands of high-confidence potential novel transcripts were identified by the RSCS, and the reliability of the transcriptome was verified by analysis of transcript structure, base composition, and sequence complexity. Evidenced by the length comparison, the frequency of the core promoter and the polyadenylation signal motifs, and the locations of transcription start and end sites, the transcripts appear to be full length. Furthermore, taking advantage of our strategy, we identified a large number of endogenous retrovirus-associated lncRNAs, and a novel endogenous retrovirus-lncRNA that was functionally involved in control of Yap1 expression and essential for early embryogenesis was identified. In summary, the RSCS can generate a more complete and precise transcriptome, and our findings greatly expanded the transcriptome annotation for the mammalian community.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:RNA-seq随后进行从头转录组组装已成为非模型生物研究中的一种转化技术,但是RNA-seq数据的计算处理需要许多不同的软件工具。因此,这些从头转录组学工作流程的复杂性为研究人员采用最佳实践方法和最新版本的软件提供了主要障碍。
    结果:在这里,我们提出了一个简化且通用的从头转录组组装和注释管道,transXpress,在Snakemake中实现。transXpress支持两种流行的汇编程序,三位一体和rnaSpades,并允许在异构集群计算硬件上并行执行。
    结论:transXpress简化了从头转录组组装的最佳实践方法和最新软件的使用,并生成标准化的输出文件,这些文件可以使用SequenceServer进行挖掘,以促进在非模型生物中快速发现新基因和蛋白质。
    BACKGROUND: RNA-seq followed by de novo transcriptome assembly has been a transformative technique in biological research of non-model organisms, but the computational processing of RNA-seq data entails many different software tools. The complexity of these de novo transcriptomics workflows therefore presents a major barrier for researchers to adopt best-practice methods and up-to-date versions of software.
    RESULTS: Here we present a streamlined and universal de novo transcriptome assembly and annotation pipeline, transXpress, implemented in Snakemake. transXpress supports two popular assembly programs, Trinity and rnaSPAdes, and allows parallel execution on heterogeneous cluster computing hardware.
    CONCLUSIONS: transXpress simplifies the use of best-practice methods and up-to-date software for de novo transcriptome assembly, and produces standardized output files that can be mined using SequenceServer to facilitate rapid discovery of new genes and proteins in non-model organisms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:混合测序技术的进步正在日益扩大基因组装配,这些基因组装配通常使用混合测序转录组学进行注释,导致改进的基因组表征和鉴定新的基因和同工型在各种各样的生物。
    结果:我们开发了一种易于使用的基因组指导转录组注释管道,该管道使用来自混合测序数据的组装转录本作为输入,并通过整合几种生物信息学方法来区分编码和长非编码RNA。包括与GTF格式的先前注释的基因协调。我们通过正确组装和注释来自鸡SCO-spondin基因(包含超过105个外显子)的所有外显子来证明这种方法的效率,包括通过同源性分配鉴定鸡参考注释中缺失的基因。
    结论:我们的方法有助于改善当前鸡脑的转录组注释。我们的管道,在Anaconda/Nextflow和Docker上实现的是一个易于使用的软件包,可以应用于广泛的物种,组织,和研究领域有助于改进和协调当前的注释。代码和数据集可在https://github.com/cfarkas/annotate_my_genomes上公开获得。
    The advancement of hybrid sequencing technologies is increasingly expanding genome assemblies that are often annotated using hybrid sequencing transcriptomics, leading to improved genome characterization and the identification of novel genes and isoforms in a wide variety of organisms.
    We developed an easy-to-use genome-guided transcriptome annotation pipeline that uses assembled transcripts from hybrid sequencing data as input and distinguishes between coding and long non-coding RNAs by integration of several bioinformatic approaches, including gene reconciliation with previous annotations in GTF format. We demonstrated the efficiency of this approach by correctly assembling and annotating all exons from the chicken SCO-spondin gene (containing more than 105 exons), including the identification of missing genes in the chicken reference annotations by homology assignments.
    Our method helps to improve the current transcriptome annotation of the chicken brain. Our pipeline, implemented on Anaconda/Nextflow and Docker is an easy-to-use package that can be applied to a broad range of species, tissues, and research areas helping to improve and reconcile current annotations. The code and datasets are publicly available at https://github.com/cfarkas/annotate_my_genomes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    动物基因组的注释在阐明经济上重要性状的遗传控制背后的分子机制中起着重要作用。这里,我们采用了长读数测序技术,牛津纳米孔技术,注释来自两只约克郡同窝猪的17个组织的猪转录组。从单个流动池获得了超过980万的读数,并在50108个基因座处鉴定了69781个独特转录本。在这些成绩单中,16255被发现是新的同工型,和22344在Ensembl(版本102)和NCBI(版本106)注释中新颖且未注释的基因座处发现。新的转录本主要在小脑中表达,其次是肺,肝脏,脾,脾还有下丘脑.通过将未注释的转录本与现有数据库进行比较,有21285个(95.3%)与NT数据库(v5)匹配的转录物和13676个(61.2%)与NR数据库(v5)匹配的转录物.此外,有4324份(19.4%)转录本与SwissProt数据库(v5)匹配,对应于11356种蛋白质。组织特异性基因表达分析显示9749个转录本具有高度组织特异性,和小脑包含大多数组织特异性转录本。由于相同的样品用于猪基因组中顺式调节元件的注释,本研究产生的转录组注释为动物基因组功能注释提供了额外的和互补的注释资源,以全面注释猪基因组。
    The annotation of animal genomes plays an important role in elucidating molecular mechanisms behind the genetic control of economically important traits. Here, we employed long-read sequencing technology, Oxford Nanopore Technology, to annotate the pig transcriptome across 17 tissues from two Yorkshire littermate pigs. More than 9.8 million reads were obtained from a single flow cell, and 69 781 unique transcripts at 50 108 loci were identified. Of these transcripts, 16 255 were found to be novel isoforms, and 22 344 were found at loci that were novel and unannotated in the Ensembl (release 102) and NCBI (release 106) annotations. Novel transcripts were mostly expressed in cerebellum, followed by lung, liver, spleen, and hypothalamus. By comparing the unannotated transcripts to existing databases, there were 21 285 (95.3%) transcripts matched to the NT database (v5) and 13 676 (61.2%) matched to the NR database (v5). Moreover, there were 4324 (19.4%) transcripts matched to the SwissProt database (v5), corresponding to 11 356 proteins. Tissue-specific gene expression analyses showed that 9749 transcripts were highly tissue-specific, and cerebellum contained the most tissue-specific transcripts. As the same samples were used for the annotation of cis-regulatory elements in the pig genome, the transcriptome annotation generated by this study provides an additional and complementary annotation resource for the Functional Annotation of Animal Genomes effort to comprehensively annotate the pig genome.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    产蛋率主要由卵巢功能决定,受下丘脑-垂体-性腺轴调节;卵巢调节产卵率的机制仍然知之甚少。目的比较产蛋率相对高、低的凌云乌骨鸡卵巢转录组的差异,筛选与产蛋率相关的候选基因。进行RNA测序(RNA-Seq),从6只凌云乌骨鸡的卵巢组织中探索鸡转录组(G组,n=3)和低(D组,n=3)产蛋率。结果表明,在产蛋率高和低的鸡之间鉴定出235个差异表达基因(DEGs);其中,209个DEG上调,26个DEG下调。基因本体分析表明,上调的209个DEGs在50个GO术语中富集,下调的26个DEGs在40个GO术语中富集。《京都基因与基因组百科全书》(KEGG)分析表明,上调的DEGs在25条通路中显著富集,下调的DEGs在3条通路中显著富集。在这些途径中,我们发现了长寿调节途径——多物种途径,雌激素信号通路和PPAR信号通路可能在调节产蛋率方面具有重要功能。结果突出了产蛋率相对较高和较低的凌云乌骨鸡卵巢组织中的DEGs,并确定了与产蛋率相关的必需候选基因,从而为提高凌云乌骨鸡的产蛋率提供理论依据。
    Egg-laying rate is mainly determined by ovarian function and regulated by the hypothalamic-pituitary-gonadal axis; however, the mechanism by which the ovary regulates the egg-laying rate is still poorly understood. The purpose of this study was to compare the differences in the transcriptomes of the ovary of Lingyun black-bone chickens with relatively high and low egg-laying rates and screen candidate genes related to the egg-laying rate. RNA-sequencing (RNA-Seq) was conducted to explore the chicken transcriptome from the ovarian tissue of six Lingyun black-bone chickens with high (group G, n = 3) and low (group D, n = 3) egg-laying rates. The results showed that 235 differentially expressed genes (DEGs) were identified between the chickens with high and low egg-laying rates; among them, 209 DEGs were up-regulated and 26 DEGs were down-regulated. Gene Ontology analysis showed that the up-regulated 209 DEGs were enriched in 50 GO terms and the down-regulated 26 DEGs were enriched in 40 GO terms. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis showed that up-regulated DEGs were significantly enriched in 25 pathways and down-regulated DEGs were significantly enriched in three pathways. Among the pathways, we found the longevity regulating pathway-multiple species pathway, Estrogen signalling pathway and PPAR signalling pathway may have an essential function in regulating the egg-laying rate. The results highlighted DEGs in the ovarian tissues of relatively high and low laying Lingyun black-bone chicken and identified essential candidate genes related to the egg-laying rate, thereby providing a theoretical basis for improving the egg-laying rate of Lingyun black-bone chicken.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Asian citrus psyllid Diaphorina citri Kuwayama is an important economic pest of citrus, as it transmits Candidatus Liberibacter asiaticus, the causative agent of huanglongbing. In this study, we used RNA-seq to identify novel genes and provide the first high-resolution view of the of D. citri transcriptome throughout development. The transcriptomes of D. citri during eight developmental stages, including the egg, five instars, and male and female adults were sequenced. In total, 115 million clean reads were obtained and assembled into 354,726 unigenes with an average length of 925.65 bp and an N50 length of 1733 bp. Clusters of Orthologous Groups, Gene Ontology, and Kyoto Encyclopedia of Genes and Genomes analyses were conducted to functionally annotate the genes. Differential expression analysis highlighted developmental stage-specific expression patterns. Furthermore, two trehalase genes were characterized with lower expression in adults compared to that in the other stages. The RNA interference (RNAi)-mediated suppression of the two trehalase genes resulted in significantly high D. citri mortality. This study enriched the genomic information regarding D. citri. Importantly, these data represent the most comprehensive transcriptomic resource currently available for D. citri and will facilitate functional genomics studies of this notorious pest.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Henosepilachnavigintioctopunctata(鞘翅目:球藻科)是影响亚洲国家茄科植物的主要害虫。在这项研究中,我们对H.vigintioctopunctata的卵巢和睾丸转录组进行了测序,以鉴定与性腺相关的基因。比较卵巢和睾丸文库中的单基因序列,确定了1,421和5,315个卵巢和睾丸特异性基因,分别。在卵巢特异性基因中,我们选择RC2样和PSHS样基因来研究基因沉默对死亡率的影响,不孕症百分比,产卵前期,繁殖力,每天产卵的数量,成年女性的孵化率。尽管dsRNA治疗中女性的死亡率和不育百分比没有显着差异,dsRC2样和dsPSHS样治疗组的繁殖力显著降低。此外,dsPSHS样治疗的产卵前期明显延长.这是首次报道的H.vigintioctopunctata的RNA测序。卵巢和睾丸文库的转录组序列和基因表达谱将为鉴定H.vigintioctopunctata中与性腺相关的基因提供有用的信息,并促进对该物种生殖生物学的进一步研究。此外,鉴定的性腺特异性基因可能代表抑制H.vigintioctopunctata种群生长的候选靶基因。
    Henosepilachna vigintioctopunctata (Coleoptera: Coccinellidae) is a major pest affecting Solanaceae plants in Asian countries. In this study, we sequenced the ovary and testis transcriptomes of H. vigintioctopunctata to identify gonad-related genes. Comparison of the unigene sequences in ovary and testis libraries identified 1,421 and 5,315 ovary- and testis-specific genes, respectively. Among the ovary-specific genes, we selected the RC2-like and PSHS-like genes to investigate the effects of gene silencing on the mortality, percentage infertility, pre-oviposition period, fecundity, daily number of eggs laid, and hatching rate in female adults. Although the percentage mortality and infertility of females did not differ significantly among dsRNA treatments, fecundity was significantly reduced in the dsRC2-like and dsPSHS-like treatment groups. Moreover, the pre-oviposition period was markedly prolonged in response to dsPSHS-like treatment. This is the first reported RNA sequencing of H. vigintioctopunctata. The transcriptome sequences and gene expression profiles of the ovary and testis libraries will provide useful information for the identification of gonad-related genes in H. vigintioctopunctata and facilitate further research on the reproductive biology of this species. Moreover, the gonad-specific genes identified may represent candidate target genes for inhibiting the population growth of H. vigintioctopunctata.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号