De novo assembly

  • 文章类型: Journal Article
    There is a collective push to diversify human genetic studies by including underrepresented populations. However, analyzing DNA sequence reads involves the initial step of aligning the reads to the GRCh38/hg38 reference genome which is inadequate for non-European ancestries. In this study, using long-read sequencing technology, we constructed de novo genome assemblies from two indigenous Americans from Arizona (IAZ). Each assembly included ∼17 Mb of DNA sequence not present [nonreference sequence (NRS)] in hg38, which consists mostly of repeat elements. Forty NRSs totaling 240 kb were uniquely anchored to the hg38 primary assembly generating a modified hg38-NRS reference genome. DNA sequence alignment and variant calling were then conducted with whole-genome sequencing (WGS) sequencing data from 387 IAZ using both the hg38 and modified hg38-NRS reference maps. Variant calling with the hg38-NRS map identified ∼50,000 single-nucleotide variants present in at least 5% of the WGS samples which were not detected with the hg38 reference map. We also directly assessed the NRSs positioned within genes. Seventeen NRSs anchored to regions including an identical 187 bp NRS found in both de novo assemblies. The NRS is located in HCN2 79 bp downstream of Exon 3 and contains several putative transcriptional regulatory elements. Genotyping of the HCN2-NRS revealed that the insertion is enriched in IAZ (minor allele frequency = 0.45) compared to other reference populations tested. This study shows that inclusion of population-specific NRSs can dramatically change the variant profile in an underrepresented ethnic groups and thereby lead to the discovery of previously missed common variations.






  • 文章类型: Journal Article
    While the air microbiome and its diversity are essential for human health and ecosystem resilience, comprehensive air microbial diversity monitoring has remained rare, so that little is known about the air microbiome\'s composition, distribution, or functionality. Here we show that nanopore sequencing-based metagenomics can robustly assess the air microbiome in combination with active air sampling through liquid impingement and tailored computational analysis. We provide fast and portable laboratory and computational approaches for air microbiome profiling, which we leverage to robustly assess the taxonomic composition of the core air microbiome of a controlled greenhouse environment and of a natural outdoor environment. We show that long-read sequencing can resolve species-level annotations and specific ecosystem functions through de novo metagenomic assemblies despite the low amount of fragmented DNA used as an input for nanopore sequencing. We then apply our pipeline to assess the diversity and variability of an urban air microbiome, using Barcelona, Spain, as an example; this randomized experiment gives first insights into the presence of highly stable location-specific air microbiomes within the city\'s boundaries, and showcases the robust microbial assessments that can be achieved through automatable, fast, and portable nanopore sequencing technology.






  • 文章类型: Journal Article
    The shape of rice grains not only determines the thousand-grain weight but also correlates closely with the grain quality. Here we identified an ultra-large grain accession (ULG) with a thousand-grain weight exceeding 60 g. The integrated analysis of QTL, BSA, de novo genome assembled, transcription sequencing, and gene editing was conducted to dissect the molecular basis of the ULG formation. The ULG pyramided advantageous alleles from at least four known grain-shaping genes, OsLG3, OsMADS1, GS3, GL3.1, and one novel locus, qULG2-b, which encoded a leucine-rich repeat receptor-like kinase. The collective impacts of OsLG3, OsMADS1, GS3, and GL3.1 on grain size were confirmed in transgenic plants and near-isogenic lines. The transcriptome analysis identified 112 genes cooperatively regulated by these four genes that were prominently involved in photosynthesis and carbon metabolism. By leveraging the pleiotropy of these genes, we enhanced the grain yield, appearance, and stress tolerance of rice var. SN265. Beyond showcasing the pyramiding of multiple grain size regulation genes that can produce ULG, our study provides a theoretical framework and valuable genomic resources for improving rice variety by leveraging the pleiotropy of grain size regulated genes.






  • 文章类型: Journal Article
    A pioneering pink cultivar of Auricularia cornea, first commercially cultivated in 2022, lacks genomic data, hindering research in genetic breeding, gene discovery, and product development. Here, we report the de novo assembly of the pink A. cornea Fen-A1 genome and provide a detailed functional annotation. The genome is 73.17 Mb in size, contains 86 scaffolds (N50 ∼ 5.49 Mb), 59.09% GC content and encodes 19,120 predicted genes with a BUSCO completeness of 92.60%. Comparative genomic analysis reveals the phylogenetic relatedness of Fen-A1 and remarkable gene family dynamics. Putative genes were found mapped to 3 antibiotic-related, 36 light-dependent and 25 terpene metabolites. In addition, 789 CAZymes genes were classified, revealing the dynamics of quality loss due to postharvest refrigeration. Overall, our work is the first report on a pink A. cornea genome and provides a comprehensive insight into its complex functions.






  • 文章类型: Journal Article
    Many questions in biology benefit greatly from the use of a variety of model systems. High-throughput sequencing methods have been a triumph in the democratization of diverse model systems. They allow for the economical sequencing of an entire genome or transcriptome of interest, and with technical variations can even provide insight into genome organization and the expression and regulation of genes. The analysis and biological interpretation of such large datasets can present significant challenges that depend on the \'scientific status\' of the model system. While high-quality genome and transcriptome references are readily available for well-established model systems, the establishment of such references for an emerging model system often requires extensive resources such as finances, expertise and computation capabilities. The de novo assembly of a transcriptome represents an excellent entry point for genetic and molecular studies in emerging model systems as it can efficiently assess gene content while also serving as a reference for differential gene expression studies. However, the process of de novo transcriptome assembly is non-trivial, and as a rule must be empirically optimized for every dataset. For the researcher working with an emerging model system, and with little to no experience with assembling and quantifying short-read data from the Illumina platform, these processes can be daunting. In this guide we outline the major challenges faced when establishing a reference transcriptome de novo and we provide advice on how to approach such an endeavor. We describe the major experimental and bioinformatic steps, provide some broad recommendations and cautions for the newcomer to de novo transcriptome assembly and differential gene expression analyses. Moreover, we provide an initial selection of tools that can assist in the journey from raw short-read data to assembled transcriptome and lists of differentially expressed genes.






  • 文章类型: Journal Article
    Genome-wide information has so far been unavailable for ribbon worms of the clade Hoplonemertea, the most species-rich class within the phylum Nemertea. While species within Pilidiophora, the sister clade of Hoplonemertea, possess a pilidium larval stage and lack stylets on their proboscis, Hoplonemertea species have a planuliform larva and are armed with stylets employed for the injection of toxins into their prey. To further compare these developmental, physiological, and behavioral differences from a genomic perspective, the availability of a reference genome for a Hoplonemertea species is crucial. Such data will be highly useful for future investigations toward a better understanding of molecular ecology, venom evolution, and regeneration not only in Nemertea but also in other marine invertebrate phyla. To this end, we herein present the annotated chromosome-level genome assembly for Emplectonema gracile (Nemertea; Hoplonemertea; Monostilifera; Emplectonematidae), an easily collected nemertean well suited for laboratory experimentation. The genome has an assembly size of 157.9 Mb. Hi-C scaffolding yielded chromosome-level scaffolds, with a scaffold N50 of 10.0 Mb and a score of 95.1% for complete BUSCO genes found as a single copy. Annotation predicted 20,684 protein-coding genes. The high-quality reference genome reaches an Earth BioGenome standard level of 7.C.Q50.






  • 文章类型: Journal Article
    Panax japonicus Meyer, a perennial herb of the dicotyledonaceae family Araliaceae, is a rare folk traditional Chinese medicine, known as \"the king of herbal medicine\" in China. To understand the genes involved in secondary pathways under drought and salt stress, the transcriptomic analysis of P. japonicus is of vital importance. The transcriptome of underground rhizomes, stems, and leaves under drought and salt stress in P. japonicus were performed using the Illumina HiSeq platform. After de novo assembly of transcripts, expression profiling and identified differentially expressed genes (DEGs) were performed. Furthermore, putative functions of identified DEGs correlated with ginsenoside in P. japonicus were explored using Gene Ontology terms and Kyoto Encyclopedia of Genes and Genome (KEGG) pathway enrichment analysis. A total of 221,804 unigenes were obtained from the transcriptome of P. japonicus. The further analysis revealed that 10,839 unigenes were mapped to 91 KEGG pathways. Furthermore, a total of two metabolic pathways of P. japonicus in response to drought and salt stress related to triterpene saponin synthesis were screened. The sesquiterpene and triterpene metabolic pathways were annotated and finally putatively involved in ginsenoside content and correlation analysis of the expression of these genes were analyzed to identify four genes, β-amyrin synthase, isoprene synthase, squalene epoxidase, and 1-deoxy-D-ketose-5-phosphate synthase, respectively. Our results paves the way for screening highly expressed genes and mining genes related to triterpenoid saponin synthesis. It also provides valuable references for the study of genes involved in ginsenoside biosynthesis and signal pathway of P. japonicus.






  • 文章类型: Journal Article
    The extremely high levels of genetic polymorphism within the human major histocompatibility complex (MHC) limit the usefulness of reference-based alignment methods for sequence assembly. We incorporate a short read de novo assembly algorithm into a workflow for novel application to the MHC. MHConstructor is a containerized pipeline designed for high-throughput, haplotype-informed, reproducible assembly of both whole genome sequencing and target-capture short read data in large, population cohorts. To-date, no other self-contained tool exists for the generation of de novo MHC assemblies from short read data. MHConstructor facilitates wide-spread access to high quality, alignment-free MHC sequence analysis.






  • 文章类型: Journal Article
    OBJECTIVE: Ottelia Pers. is in the Hydrocharitaceae family. Species in the genus are aquatic, and China is their centre of origin in Asia. Ottelia alismoides (L.) Pers., which is distributed worldwide, is a distinguishing element in China, while other species of this genus are endemic to China. However, O. alismoides is also considered endangered due to habitat loss and pollution in some Asian countries. Ottelia alismoides is the only submerged macrophyte that contains three carbon dioxide-concentrating mechanisms, i.e. bicarbonate (HCO3-) use, crassulacean acid metabolism and the C4 pathway. In this study, we present its first genome assembly to help illustrate the various carbon metabolism mechanisms and to enable genetic conservation in the future.
    METHODS: Using DNA and RNA extracted from one O. alismoides leaf, this work produced ∼ 73.4 Gb HiFi reads, ∼ 126.4 Gb whole genome sequencing short reads and ∼ 21.9 Gb RNA-seq reads. The de novo genome assembly was 6,455,939,835 bp in length, with 11,923 scaffolds/contigs and an N50 of 790,733 bp. Genome assembly completeness assessment with Benchmarking Universal Single-Copy Orthologs revealed a score of 94.4%. The repetitive sequence in the assembly was 4,875,817,144 bp (75.5%). A total of 116,176 genes were predicted. The protein sequences were functionally annotated against multiple databases, facilitating comparative genomic analysis.






  • 文章类型: Journal Article
    In the realm of food nutritional security, the development of mineral-rich grains assumes a pivotal role in combating malnutrition. Within the scope of the current investigation, we endeavoured to discern the transcripts accountable for the improved accumulation of grain-Fe within Indian barnyard millet. This pursuit entailed transcriptome sequencing of genotypes BAR-1433 (with high Fe content) and BAR-1423 (with low Fe content) during two distinct stages of spike development-spike emergence and milking stage. In the context of spike emergence, we identified a cohort of 895 up-regulated transcripts and 126 down-regulated transcripts that delineated the difference between the high and low grain-Fe genotypes. In contrast, during the milking stage, the tally of up-regulated transcripts reached 436, while down-regulated transcripts numbered 285. The transcripts that consistently ascended in both developmental stages underwent functional annotation, aligning their roles with nucleolar proteins, metal-nicotianamine transporters, ribonucleoprotein complexes, vinorine synthases, cellulose synthases, auxin response factors, embryogenesis abundant proteins, cytochrome c oxidases, and zinc finger BED domain-containing proteins. Meanwhile, a heterogeneous spectrum of transcripts exhibited differential expression and upregulation throughout the distinct stages. These transcripts encompassed various facets, such as ABC Transporter family proteins, Calcium-dependent kinase family, Ferritin, Metal ion binding, Iron-sulfur cluster binding, Cytochrome family, Zinc finger transcription factor family, Ferredoxin-NADP reductase type 1 family, Putative laccase, Multicopper oxidase family, and Terpene synthase family. To authenticate the reliability of these transcripts, six contigs representing probable functions, including metal transporters, iron sulfur coordination, metal ion binding, auxin-responsive GH3-like protein 2, and cytochrome P450 71B16, were harnessed for primer design. Subsequently, these primers were utilized in the validation process through qRT-PCR, with the outcomes aligning harmoniously with the transcriptome results. This study chronicles a constellation of genes linked to elevated iron content within barnyard millet, showcasing a proof of concept for leveraging transcriptome insights in marker-assisted selection to fortify barnyard millet with iron. This marks the inaugural comprehensive transcriptome analysis delineating transcripts associated with varying levels of grain-iron content during the panicle developmental stages within the barnyard millet paradigm.





