low frequency variants

  • 文章类型: Preprint
    基因表达反应的进化是适应可变环境的关键组成部分。预测DNA序列如何影响表达是具有挑战性的,因为基因型到表型图谱对于顺式调控元件没有很好的解决。转录因子结合,监管互动,和表观遗传特征,更不用说这些因素对环境的反应了。我们测试了灵活的机器学习模型是否可以学习一些潜在的顺式调节基因型到表型图谱。我们在5个不同的拟南芥种质中使用冷响应转录组谱测试了这种方法。我们首先测试了顺式调节在环境响应中起作用的证据,发现14个和15个基序在冷反应差异调节基因(DEGs)的上游和下游区域显着富集。我们接下来应用卷积神经网络(CNN),它学习DNA序列中的从头顺式调控基序,以预测对环境的表达反应。我们发现CNN以中等精度预测差异表达,有证据表明,生物调控的复杂性和巨大的潜在调控代码阻碍了预测。总的来说,可以根据顺式调控序列的变化来预测特定环境之间的DEG,尽管需要纳入更多信息,并且可能需要更好的模型。
    The evolution of gene expression responses are a critical component of adaptation to variable environments. Predicting how DNA sequence influences expression is challenging because the genotype to phenotype map is not well resolved for cis regulatory elements, transcription factor binding, regulatory interactions, and epigenetic features, not to mention how these factors respond to environment. We tested if flexible machine learning models could learn some of the underlying cis-regulatory genotype to phenotype map. We tested this approach using cold-responsive transcriptome profiles in 5 diverse Arabidopsis thaliana accessions. We first tested for evidence that cis regulation plays a role in environmental response, finding 14 and 15 motifs that were significantly enriched within the up- and down-stream regions of cold-responsive differentially regulated genes (DEGs). We next applied convolutional neural networks (CNNs), which learn de novo cis-regulatory motifs in DNA sequences to predict expression response to environment. We found that CNNs predicted differential expression with moderate accuracy, with evidence that predictions were hindered by biological complexity of regulation and the large potential regulatory code. Overall, DEGs between specific environments can be predicted based on variation in cis-regulatory sequences, although more information needs to be incorporated and better models may be required.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    NUS1最近被确定为帕金森病(PD)的候选基因。很少有研究检查NUS1变体与PD易感性和表型的关联。在第一个队列中,在1542例病例和1625例对照中,我们进行了全外显子组测序,以鉴定NUS1外显子编码区和外显子-内含子区的变异.总共检测到13个变体,其中10个罕见变异和3个低频变异。负荷分析显示罕见的NUS1变异体显著富集PD(p=0.016)。我们还基于先前和我们的研究进行了荟萃分析,以将NUS1突变与PD易感性相关联。整合我们以前的队列(3210例和2807例对照)和第一个队列确定rs539668656与PD风险的显着关联(比值比(OR)=2.82,p=0.016)。基因型-表型关联分析显示,携带罕见变异的患者,或rs539668656与早期发病年龄显着相关,抑郁症,情绪障碍和严重的疾病状况。我们的结果支持NUS1稀有变体和rs539668656对PD易感性和表型的作用。
    NUS1 has been recently identified as a candidate gene for Parkinson\'s disease (PD). Few studies have examined the association of NUS1 variants with PD susceptibility and phenotypes. In the first cohort, whole-exome sequencing was performed to identify variants in NUS1 exon-coding and exon-intron regions in 1542 cases and 1625 controls. 13 variants were totally detected, of which 10 rare variants and 3 low-frequency variants. Burden analysis showed that rare NUS1 variants significantly enriched in PD (p=0.016). We also performed a meta-analysis based on previous and our studies to correlate NUS1 mutations with PD susceptibility. Integrating our previous cohort (3210 cases and 2807 controls) and the first cohort identified the significant association of rs539668656 with PD risk (odds ratio (OR) = 2.82, p = 0.016). The genotype-phenotype association analysis showed that patients carrying rare variants, or rs539668656 were significantly associated with earlier onset age, depression, emotional impairment and severe disease condition. Our results support the role of NUS1 rare variants and rs539668656 towards PD susceptibility and phenotype.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:双重测序是鉴定以非常低的频率存在的序列变体的最准确的方法。它的力量来自于将原始DNA分子的两条链的多个后代汇集在一起,这允许将真正的核苷酸取代与PCR扩增和测序伪影区分开。这种策略是以成本为代价的-对同一分子进行多次测序会增加动态范围,但会显着降低覆盖率,使全基因组双链测序过于昂贵。此外,每个双重实验都会产生大量的单例读数,这些读数无法用于分析并被丢弃。
    结果:在本文中,我们证明了这些读段中的很大一部分包含双链体标签内的PCR或测序错误。这种错误的校正允许将这些读段与它们各自的家族“重新结合”,从而增加该方法的输出并使其更具成本效益。
    结论:我们在新版本的双工分析软件中将纠错策略与许多算法改进相结合,DuNovo2.0.它是用Python编写的,C,AWK,还有Bash.它是开源的,可以通过Galaxy随时获得,Bioconda,和Github:https://github.com/galaxyproject/dunovo。
    BACKGROUND: Duplex sequencing is the most accurate approach for identification of sequence variants present at very low frequencies. Its power comes from pooling together multiple descendants of both strands of original DNA molecules, which allows distinguishing true nucleotide substitutions from PCR amplification and sequencing artifacts. This strategy comes at a cost-sequencing the same molecule multiple times increases dynamic range but significantly diminishes coverage, making whole genome duplex sequencing prohibitively expensive. Furthermore, every duplex experiment produces a substantial proportion of singleton reads that cannot be used in the analysis and are thrown away.
    RESULTS: In this paper we demonstrate that a significant fraction of these reads contains PCR or sequencing errors within duplex tags. Correction of such errors allows \"reuniting\" these reads with their respective families increasing the output of the method and making it more cost effective.
    CONCLUSIONS: We combine an error correction strategy with a number of algorithmic improvements in a new version of the duplex analysis software, Du Novo 2.0. It is written in Python, C, AWK, and Bash. It is open source and readily available through Galaxy, Bioconda, and Github: https://github.com/galaxyproject/dunovo.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Using whole-genome sequence (WGS) data are supposed to be optimal for genome-wide association studies and genomic predictions. However, sequencing thousands of individuals of interest is expensive. Imputation from single nucleotide polymorphisms panels to WGS data is an attractive approach to obtain highly reliable WGS data at low cost. Here, we conducted a genotype imputation study with a combined reference panel in yellow-feather dwarf broiler population. The combined reference panel was assembled by sequencing 24 key individuals of a yellow-feather dwarf broiler population (internal reference panel) and WGS data from 311 chickens in public databases (external reference panel). Three scenarios were investigated to determine how different factors affect the accuracy of imputation from 600 K array data to WGS data, including: genotype imputation with internal, external and combined reference panels; the number of internal reference individuals in the combined reference panel; and different reference sizes and selection strategies of an external reference panel. Results showed that imputation accuracy from 600 K to WGS data were 0.834±0.012, 0.920±0.007 and 0.982±0.003 for the internal, external and combined reference panels, respectively. Increasing the reference size from 50 to 250 improved the accuracy of genotype imputation from 0.848 to 0.974 for the combined reference panel and from 0.647 to 0.917 for the external reference panel. The selection strategies for the external reference panel had no impact on the accuracy of imputation using the combined reference panel. However, if only an external reference panel with reference size >50 was used, the selection strategy of minimizing the average distance to the closest leaf had the greatest imputation accuracy compared with other methods. Generally, using a combined reference panel provided greater imputation accuracy, especially for low-frequency variants. In conclusion, the optimal imputation strategy with a combined reference panel should comprehensively consider genetic diversity of the study population, availability and properties of external reference panels, sequencing and computing costs, and frequency of imputed variants. This work sheds light on how to design and execute genotype imputation with a combined external reference panel in a livestock population.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    While genome-wide association studies have been very successful in identifying associations of common genetic variants with many different traits, the rarer frequency spectrum of the genome has not yet been comprehensively explored. Technological developments increasingly lift restrictions to access rare genetic variation. Dense reference panels enable improved genotype imputation for rarer variants in studies using DNA microarrays. Moreover, the decreasing cost of next generation sequencing makes whole exome and genome sequencing increasingly affordable for large samples. Large-scale efforts based on sequencing, such as ExAC, 100,000 Genomes, and TopMed, are likely to significantly advance this field.The main challenge in evaluating complex trait associations of rare variants is statistical power. The choice of population should be considered carefully because allele frequencies and linkage disequilibrium structure differ between populations. Genetically isolated populations can have favorable genomic characteristics for the study of rare variants.One strategy to increase power is to assess the combined effect of multiple rare variants within a region, known as aggregate testing. A  range of methods have been developed for this. Model performance depends on the genetic architecture of the region of interest.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Common SNPs in nicotinic acetylcholine receptor genes (CHRN genes) have been associated with drug behaviors and personality traits, but the influence of rare genetic variants is not well characterized. The goal of this project was to identify novel rare variants in CHRN genes in the Center for Antisocial Drug Dependence (CADD) and Genetics of Antisocial Drug Dependence (GADD) samples and to determine if low frequency variants are associated with antisocial drug dependence. Two samples of 114 and 200 individuals were selected using a case/control design including the tails of the phenotypic distribution of antisocial drug dependence. The capture, sequencing, and analysis of all variants in 16 CHRN genes (CHRNA1-7, 9, 10, CHRNB1-4, CHRND, CHRNG, CHRNE) were performed independently for each subject in each sample. Sequencing reads were aligned to the human reference sequence using BWA prior to variant calling with the Genome Analysis ToolKit (GATK). Low frequency variants (minor allele frequency < 0.05) were analyzed using SKAT-O and C-alpha to examine the distribution of rare variants among cases and controls. In our larger sample, the region containing the CHRNA6/CHRNB3 gene cluster was significantly associated with disease status using both SKAT-O and C-alpha (unadjusted p values <0.05). More low frequency variants in the CHRNA6/CHRNB3 gene region were observed in cases compared to controls. These data support a role for genetic variants in CHRN genes and antisocial drug behaviors.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号