low frequency variants

  • 文章类型: Preprint
    The evolution of gene expression responses are a critical component of adaptation to variable environments. Predicting how DNA sequence influences expression is challenging because the genotype to phenotype map is not well resolved for cis regulatory elements, transcription factor binding, regulatory interactions, and epigenetic features, not to mention how these factors respond to environment. We tested if flexible machine learning models could learn some of the underlying cis-regulatory genotype to phenotype map. We tested this approach using cold-responsive transcriptome profiles in 5 diverse Arabidopsis thaliana accessions. We first tested for evidence that cis regulation plays a role in environmental response, finding 14 and 15 motifs that were significantly enriched within the up- and down-stream regions of cold-responsive differentially regulated genes (DEGs). We next applied convolutional neural networks (CNNs), which learn de novo cis-regulatory motifs in DNA sequences to predict expression response to environment. We found that CNNs predicted differential expression with moderate accuracy, with evidence that predictions were hindered by biological complexity of regulation and the large potential regulatory code. Overall, DEGs between specific environments can be predicted based on variation in cis-regulatory sequences, although more information needs to be incorporated and better models may be required.






  • 文章类型: Journal Article
    NUS1 has been recently identified as a candidate gene for Parkinson\'s disease (PD). Few studies have examined the association of NUS1 variants with PD susceptibility and phenotypes. In the first cohort, whole-exome sequencing was performed to identify variants in NUS1 exon-coding and exon-intron regions in 1542 cases and 1625 controls. 13 variants were totally detected, of which 10 rare variants and 3 low-frequency variants. Burden analysis showed that rare NUS1 variants significantly enriched in PD (p=0.016). We also performed a meta-analysis based on previous and our studies to correlate NUS1 mutations with PD susceptibility. Integrating our previous cohort (3210 cases and 2807 controls) and the first cohort identified the significant association of rs539668656 with PD risk (odds ratio (OR) = 2.82, p = 0.016). The genotype-phenotype association analysis showed that patients carrying rare variants, or rs539668656 were significantly associated with earlier onset age, depression, emotional impairment and severe disease condition. Our results support the role of NUS1 rare variants and rs539668656 towards PD susceptibility and phenotype.






  • 文章类型: Journal Article
    BACKGROUND: Duplex sequencing is the most accurate approach for identification of sequence variants present at very low frequencies. Its power comes from pooling together multiple descendants of both strands of original DNA molecules, which allows distinguishing true nucleotide substitutions from PCR amplification and sequencing artifacts. This strategy comes at a cost-sequencing the same molecule multiple times increases dynamic range but significantly diminishes coverage, making whole genome duplex sequencing prohibitively expensive. Furthermore, every duplex experiment produces a substantial proportion of singleton reads that cannot be used in the analysis and are thrown away.
    RESULTS: In this paper we demonstrate that a significant fraction of these reads contains PCR or sequencing errors within duplex tags. Correction of such errors allows \"reuniting\" these reads with their respective families increasing the output of the method and making it more cost effective.
    CONCLUSIONS: We combine an error correction strategy with a number of algorithmic improvements in a new version of the duplex analysis software, Du Novo 2.0. It is written in Python, C, AWK, and Bash. It is open source and readily available through Galaxy, Bioconda, and Github: https://github.com/galaxyproject/dunovo.







  • 文章类型: Journal Article
    Using whole-genome sequence (WGS) data are supposed to be optimal for genome-wide association studies and genomic predictions. However, sequencing thousands of individuals of interest is expensive. Imputation from single nucleotide polymorphisms panels to WGS data is an attractive approach to obtain highly reliable WGS data at low cost. Here, we conducted a genotype imputation study with a combined reference panel in yellow-feather dwarf broiler population. The combined reference panel was assembled by sequencing 24 key individuals of a yellow-feather dwarf broiler population (internal reference panel) and WGS data from 311 chickens in public databases (external reference panel). Three scenarios were investigated to determine how different factors affect the accuracy of imputation from 600 K array data to WGS data, including: genotype imputation with internal, external and combined reference panels; the number of internal reference individuals in the combined reference panel; and different reference sizes and selection strategies of an external reference panel. Results showed that imputation accuracy from 600 K to WGS data were 0.834±0.012, 0.920±0.007 and 0.982±0.003 for the internal, external and combined reference panels, respectively. Increasing the reference size from 50 to 250 improved the accuracy of genotype imputation from 0.848 to 0.974 for the combined reference panel and from 0.647 to 0.917 for the external reference panel. The selection strategies for the external reference panel had no impact on the accuracy of imputation using the combined reference panel. However, if only an external reference panel with reference size >50 was used, the selection strategy of minimizing the average distance to the closest leaf had the greatest imputation accuracy compared with other methods. Generally, using a combined reference panel provided greater imputation accuracy, especially for low-frequency variants. In conclusion, the optimal imputation strategy with a combined reference panel should comprehensively consider genetic diversity of the study population, availability and properties of external reference panels, sequencing and computing costs, and frequency of imputed variants. This work sheds light on how to design and execute genotype imputation with a combined external reference panel in a livestock population.






  • 文章类型: Journal Article
    While genome-wide association studies have been very successful in identifying associations of common genetic variants with many different traits, the rarer frequency spectrum of the genome has not yet been comprehensively explored. Technological developments increasingly lift restrictions to access rare genetic variation. Dense reference panels enable improved genotype imputation for rarer variants in studies using DNA microarrays. Moreover, the decreasing cost of next generation sequencing makes whole exome and genome sequencing increasingly affordable for large samples. Large-scale efforts based on sequencing, such as ExAC, 100,000 Genomes, and TopMed, are likely to significantly advance this field.The main challenge in evaluating complex trait associations of rare variants is statistical power. The choice of population should be considered carefully because allele frequencies and linkage disequilibrium structure differ between populations. Genetically isolated populations can have favorable genomic characteristics for the study of rare variants.One strategy to increase power is to assess the combined effect of multiple rare variants within a region, known as aggregate testing. A  range of methods have been developed for this. Model performance depends on the genetic architecture of the region of interest.






  • 文章类型: Journal Article
    Common SNPs in nicotinic acetylcholine receptor genes (CHRN genes) have been associated with drug behaviors and personality traits, but the influence of rare genetic variants is not well characterized. The goal of this project was to identify novel rare variants in CHRN genes in the Center for Antisocial Drug Dependence (CADD) and Genetics of Antisocial Drug Dependence (GADD) samples and to determine if low frequency variants are associated with antisocial drug dependence. Two samples of 114 and 200 individuals were selected using a case/control design including the tails of the phenotypic distribution of antisocial drug dependence. The capture, sequencing, and analysis of all variants in 16 CHRN genes (CHRNA1-7, 9, 10, CHRNB1-4, CHRND, CHRNG, CHRNE) were performed independently for each subject in each sample. Sequencing reads were aligned to the human reference sequence using BWA prior to variant calling with the Genome Analysis ToolKit (GATK). Low frequency variants (minor allele frequency < 0.05) were analyzed using SKAT-O and C-alpha to examine the distribution of rare variants among cases and controls. In our larger sample, the region containing the CHRNA6/CHRNB3 gene cluster was significantly associated with disease status using both SKAT-O and C-alpha (unadjusted p values <0.05). More low frequency variants in the CHRNA6/CHRNB3 gene region were observed in cases compared to controls. These data support a role for genetic variants in CHRN genes and antisocial drug behaviors.





