Transcription start site

  • 文章类型: Journal Article
    The sperm epigenome is thought to affect the developmental programming of the resulting embryo, influencing health and disease in later life. Age-related methylation changes in the sperm of old fathers may mediate the increased risks for reproductive and offspring medical problems. The impact of paternal age on sperm methylation has been extensively studied in humans and, to a lesser extent, in rodents and cattle. Here, we performed a comparative analysis of paternal age effects on protein-coding genes in the human and marmoset sperm methylomes. The marmoset has gained growing importance as a non-human primate model of aging and age-related diseases. Using reduced representation bisulfite sequencing, we identified age-related differentially methylated transcription start site (ageTSS) regions in 204 marmoset and 27 human genes. The direction of methylation changes was the opposite, increasing with age in marmosets and decreasing in humans. None of the identified ageTSS was differentially methylated in both species. Although the average methylation levels of all TSS regions were highly correlated between marmosets and humans, with the majority of TSS being hypomethylated in sperm, more than 300 protein-coding genes were endowed with species-specifically (hypo)methylated TSS. Several genes of the glycosphingolipid (GSL) biosynthesis pathway, which plays a role in embryonic stem cell differentiation and regulation of development, were hypomethylated (<5%) in human and fully methylated (>95%) in marmoset sperm. The expression levels and patterns of defined sets of GSL genes differed considerably between human and marmoset pre-implantation embryo stages and blastocyst tissues, respectively.






  • 文章类型: Journal Article
    BACKGROUND: We recently developed two high-resolution methods for genome-wide mapping of two prominent types of DNA damage, single-strand DNA breaks (SSBs) and abasic (AP) sites and found highly complex and non-random patterns of these lesions in mammalian genomes. One salient feature of SSB and AP sites was the existence of single-nucleotide hotspots for both lesions.
    RESULTS: In this work, we show that SSB hotspots are enriched in the immediate vicinity of transcriptional start sites (TSSs) in multiple normal mammalian tissues, however the magnitude of enrichment varies significantly with tissue type and appears to be limited to a subset of genes. SSB hotspots around TSSs are enriched on the template strand and associate with higher expression of the corresponding genes. Interestingly, SSB hotspots appear to be at least in part generated by the base-excision repair (BER) pathway from the AP sites.
    CONCLUSIONS: Our results highlight complex relationship between DNA damage and regulation of gene expression and suggest an exciting possibility that SSBs at TSSs might function as sensors of DNA damage to activate genes important for DNA damage response.






  • 文章类型: Journal Article
    A member of the Retroviridae, human immunodeficiency virus type 1 (HIV-1), uses the RNA genome packaged into nascent virions to transfer genetic information to its progeny. The genome packaging step is a highly regulated and extremely efficient process as a vast majority of virus particles contain two copies of full-length unspliced HIV-1 RNA that form a dimer. Thus, during virus assembly HIV-1 can identify and selectively encapsidate HIV-1 unspliced RNA from an abundant pool of cellular RNAs and various spliced HIV-1 RNAs. Several \"G\" features facilitate the packaging of a dimeric RNA genome. The viral polyprotein Gag orchestrates virus assembly and mediates RNA genome packaging. During this process, Gag preferentially binds unpaired guanosines within the highly structured 5\' untranslated region (UTR) of HIV-1 RNA. In addition, the HIV-1 unspliced RNA provides a scaffold that promotes Gag:Gag interactions and virus assembly, thereby ensuring its packaging. Intriguingly, recent studies have shown that the use of different guanosines at the junction of U3 and R as transcription start sites results in HIV-1 unspliced RNA species with 99.9% identical sequences but dramatically distinct 5\' UTR conformations. Consequently, one species of unspliced RNA is preferentially packaged over other nearly identical RNAs. These studies reveal how conformations affect the functions of HIV-1 RNA elements and the complex regulation of HIV-1 replication. In this review, we summarize cis- and trans-acting elements critical for HIV-1 RNA packaging, locations of Gag:RNA interactions that mediate genome encapsidation, and the effects of transcription start sites on the structure and packaging of HIV-1 RNA.






  • 文章类型: Journal Article
    Mutations and gene expression are the two most studied genomic features in cancer research. In the last decade, the combined advances in genomic technology and computational algorithms have broadened mutation research with the concept of mutation density and expanded the traditional scope of protein-coding RNA to noncoding RNAs. However, mutation density analysis had yet to be integrated with non-coding RNAs. In this study, we examined long non-coding RNA (lncRNA) mutation density patterns of 57 unique cancer types using 80 cancer cohorts. Our analysis revealed that lncRNAs exhibit mutation density patterns reminiscent to those of protein-coding mRNAs. These patterns include mutation peak and dip around transcription start sites of lncRNA. In many cohorts, these patterns justified statistically significant transcription strand bias, and the transcription strand bias was shared between lncRNAs and mRNAs. We further quantified transcription strand biases with a Log Odds Ratio metric and showed that some of these biases are associated with patient prognosis. The prognostic effect may be exerted due to strong Transcription-coupled repair mechanisms associated with the individual patient. For the first time, our study combined mutational density patterns with lncRNA mutations, and the results demonstrated remarkably comparable patterns between protein-coding mRNA and lncRNA, further illustrating lncRNA\'s potential roles in cancer research.






  • 文章类型: Journal Article
    Recognizing transcription start sites is key to gene identification. Several approaches have been employed in related problems such as detecting translation initiation sites or promoters, many of the most recent ones based on machine learning. Deep learning methods have been proven to be exceptionally effective for this task, but their use in transcription start site identification has not yet been explored in depth. Also, the very few existing works do not compare their methods to support vector machines (SVMs), the most established technique in this area of study, nor provide the curated dataset used in the study. The reduced amount of published papers in this specific problem could be explained by this lack of datasets. Given that both support vector machines and deep neural networks have been applied in related problems with remarkable results, we compared their performance in transcription start site predictions, concluding that SVMs are computationally much slower, and deep learning methods, specially long short-term memory neural networks (LSTMs), are best suited to work with sequences than SVMs. For such a purpose, we used the reference human genome GRCh38. Additionally, we studied two different aspects related to data processing: the proper way to generate training examples and the imbalanced nature of the data. Furthermore, the generalization performance of the models studied was also tested using the mouse genome, where the LSTM neural network stood out from the rest of the algorithms. To sum up, this article provides an analysis of the best architecture choices in transcription start site identification, as well as a method to generate transcription start site datasets including negative instances on any species available in Ensembl. We found that deep learning methods are better suited than SVMs to solve this problem, being more efficient and better adapted to long sequences and large amounts of data. We also create a transcription start site (TSS) dataset large enough to be used in deep learning experiments.






  • 文章类型: Journal Article
    HIV-1 relies on host RNA polymeraseII (Pol II) to transcribe its genome and uses multiple transcription start sites (TSS), including three consecutive guanosines located near the U3-R junction, to generate transcripts containing three, two, and one guanosine at the 5\' end, referred to as 3G, 2G, and 1G RNA, respectively. The 1G RNA is preferentially selected for packaging, indicating that these 99.9% identical RNAs exhibit functional differences and highlighting the importance of TSS selection. Here, we demonstrate that TSS selection is regulated by sequences between the CATA/TATA box and the beginning of R. Furthermore, we have generated two HIV-1 mutants with distinct 2-nucleotide modifications that predominantly express 3G RNA or 1G RNA. Both mutants can generate infectious viruses and undergo multiple rounds of replication in T cells. However, both mutants exhibit replication defects compared to the wild-type virus. The 3G-RNA-expressing mutant displays an RNA genome-packaging defect and delayed replication kinetics, whereas the 1G-RNA-expressing mutant exhibits reduced Gag expression and a replication fitness defect. Additionally, reversion of the latter mutant is frequently observed, consistent with sequence correction by plus-strand DNA transfer during reverse transcription. These findings demonstrate that HIV-1 maximizes its replication fitness by usurping the TSS heterogeneity of host RNA Pol II to generate unspliced RNAs with different specialized roles in viral replication. The three consecutive guanosines at the junction of U3 and R may also maintain HIV-1 genome integrity during reverse transcription. These studies reveal the intricate regulation of HIV-1 RNA and complex replication strategy.






  • 文章类型: Journal Article
    Understanding the genomic control of tissue-specific gene expression and regulation can help to inform the application of genomic technologies in farm animal breeding programs. The fine mapping of promoters [transcription start sites (TSS)] and enhancers (divergent amplifying segments of the genome local to TSS) in different populations of cattle across a wide diversity of tissues provides information to locate and understand the genomic drivers of breed- and tissue-specific characteristics. To this aim, we used Cap Analysis Gene Expression (CAGE) sequencing, of 24 different tissues from 3 populations of cattle, to define TSS and their coexpressed short-range enhancers (<1 kb) in the ARS-UCD1.2_Btau5.0.1Y reference genome (1000bulls run9) and analyzed tissue and population specificity of expressed promoters. We identified 51,295 TSS and 2,328 TSS-Enhancer regions shared across the 3 populations (dairy, beef-dairy cross, and Canadian Kinsella composite cattle from 2 individuals, 1 of each sex, per population). Cross-species comparative analysis of CAGE data from 7 other species, including sheep, revealed a set of TSS and TSS-Enhancers that were specific to cattle. The CAGE data set will be combined with other transcriptomic information for the same tissues to create a new high-resolution map of transcript diversity across tissues and populations in cattle for the BovReg project. Here we provide the CAGE data set and annotation tracks for TSS and TSS-Enhancers in the cattle genome. This new annotation information will improve our understanding of the drivers of gene expression and regulation in cattle and help to inform the application of genomic technologies in breeding programs.






  • 文章类型: Journal Article
    The generation of distinct messenger RNA isoforms through alternative RNA processing modulates the expression and function of genes, often in a cell-type-specific manner. Here, we assess the regulatory relationships between transcription initiation, alternative splicing, and 3\' end site selection. Applying long-read sequencing to accurately represent even the longest transcripts from end to end, we quantify mRNA isoforms in Drosophila tissues, including the transcriptionally complex nervous system. We find that in Drosophila heads, as well as in human cerebral organoids, 3\' end site choice is globally influenced by the site of transcription initiation (TSS). \"Dominant promoters,\" characterized by specific epigenetic signatures including p300/CBP binding, impose a transcriptional constraint to define splice and polyadenylation variants. In vivo deletion or overexpression of dominant promoters as well as p300/CBP loss disrupted the 3\' end expression landscape. Our study demonstrates the crucial impact of TSS choice on the regulation of transcript diversity and tissue identity.






  • 文章类型: Journal Article
    Leptospirosis is an emerging zoonotic disease caused by bacterial species of the genus Leptospira. However, the regulatory mechanisms and pathways underlying the adaptation of pathogenic and non-pathogenic Leptospira spp. in different environmental conditions remain elusive. Leptospira biflexa is a non-pathogenic species of Leptospira that lives exclusively in a natural environment. It is an ideal model not only for exploring molecular mechanisms underlying the environmental survival of Leptospira species but also for identifying virulence factors unique to Leptospira\'s pathogenic species. In this study, we aim to establish the transcription start site (TSS) landscape and the small RNA (sRNA) profile of L. biflexa serovar Patoc grown to exponential and stationary phases via differential RNA-seq (dRNA-seq) and small RNA-seq (sRNA-seq) analyses, respectively. Our dRNA-seq analysis uncovered a total of 2726 TSSs, which are also used to identify other elements, e.g., promoter and untranslated regions (UTRs). Besides, our sRNA-seq analysis revealed a total of 603 sRNA candidates, comprising 16 promoter-associated sRNAs, 184 5\'UTR-derived sRNAs, 230 true intergenic sRNAs, 136 5\'UTR-antisense sRNAs, and 130 open reading frame (ORF)-antisense sRNAs. In summary, these findings reflect the transcriptional complexity of L. biflexa serovar Patoc under different growth conditions and help to facilitate our understanding of regulatory networks in L. biflexa. To the best of our knowledge, this is the first study reporting the TSS landscape of L. biflexa. The TSS and sRNA landscapes of L. biflexa can also be compared with its pathogenic counterparts, e.g., L. borgpetersenii and L. interrogans, to identify features contributing to their environmental survival and virulence.






  • 文章类型: Journal Article
    Histone proteins play a critical role in the primary organization of nucleosomes, which is the fundamental unit of chromatin. Among the five types of the histones, histone H3 has multiple variants, and the number differs among the species. Amongst histone H3 variants, centromeric histone H3 (CENH3) is crucial for centromere identification and proper chromosomal segregation during cell division. In the present study, we have identified 17 putative histone H3 genes of Brassica oleracea. Furthermore, we have done a detailed characterization of the CENH3 gene of B. oleracea. We showed that a single CENH3 gene exhibits allelic diversity with at least two alleles and alternative splicing pattern. Also, we have identified a CENH3 gene-specific co-dominant cleaved amplified polymorphic sequence marker SNP34(A/C) to distinguish CENH3 alleles and follow their expression in leaf and flower tissues. The gene structure analysis of the CENH3 gene revealed the conserved 5\'-CAGCAG-3\' sequence at the intron 3-exon 4 junction in B. oleracea, which serves as an alternative splicing site with one-codon (alanine) addition/deletion. However, this one-codon alternative splicing feature is not conserved in the CENH3 genes of wild allied Brassica species. Our finding suggests that transcriptional complexity and alternative splicing might play a key role in the transcriptional regulation and function of the CENH3 gene in B. oleracea. Altogether, data generated from the present study can serve as a primary information resource and can be used to engineer CENH3 gene towards developing haploid inducer lines in B. oleracea.





