genome-wide association study

全基因组关联研究
  • 文章类型: Journal Article
    数据协调涉及组合来自多个独立源的数据并处理数据以产生一个统一的数据集。已经提出合并单独的基因型或全基因组测序数据集作为通过增加有效样本大小来增加关联测试的统计能力的策略。然而,由于合并数据的困难(包括批次效应和群体分层产生的混淆),数据协调不是一种广泛采用的策略.详细的数据协调协议很少,而且往往相互冲突。此外,适应混合血统样本的数据协调协议实际上是不存在的。必须修改现有的数据协调程序,以确保将混合个体的异质性纳入其他下游分析中,而不会混淆结果。这里,我们提出了一套合并来自混合样本的多平台遗传数据的指南,任何具有基本生物信息学经验的研究者都可以采用这些指南.我们应用这些指南从六个独立的内部数据集中收集了1544个结核病(TB)病例对照样本,并进行了TB易感性的全基因组关联研究(GWAS)。在合并的数据集上执行的GWAS具有比单独分析数据集更高的能力,并且产生没有由批次效应和群体分层引入的偏差的汇总统计。©2024Wiley期刊有限责任公司。基本方案1:处理包含阵列基因型数据的单独数据集替代方案1:处理包含阵列基因型和全基因组测序数据的单独数据集替代方案2:使用本地参考面板执行插补基本方案2:合并单独数据集基本方案3:使用ADMIXTURE和RFMix基本方案4:使用伪病例对照比较进行祖先推断。
    Data harmonization involves combining data from multiple independent sources and processing the data to produce one uniform dataset. Merging separate genotypes or whole-genome sequencing datasets has been proposed as a strategy to increase the statistical power of association tests by increasing the effective sample size. However, data harmonization is not a widely adopted strategy due to the difficulties with merging data (including confounding produced by batch effects and population stratification). Detailed data harmonization protocols are scarce and are often conflicting. Moreover, data harmonization protocols that accommodate samples of admixed ancestry are practically non-existent. Existing data harmonization procedures must be modified to ensure the heterogeneous ancestry of admixed individuals is incorporated into additional downstream analyses without confounding results. Here, we propose a set of guidelines for merging multi-platform genetic data from admixed samples that can be adopted by any investigator with elementary bioinformatics experience. We have applied these guidelines to aggregate 1544 tuberculosis (TB) case-control samples from six separate in-house datasets and conducted a genome-wide association study (GWAS) of TB susceptibility. The GWAS performed on the merged dataset had improved power over analyzing the datasets individually and produced summary statistics free from bias introduced by batch effects and population stratification. © 2024 Wiley Periodicals LLC. Basic Protocol 1: Processing separate datasets comprising array genotype data Alternate Protocol 1: Processing separate datasets comprising array genotype and whole-genome sequencing data Alternate Protocol 2: Performing imputation using a local reference panel Basic Protocol 2: Merging separate datasets Basic Protocol 3: Ancestry inference using ADMIXTURE and RFMix Basic Protocol 4: Batch effect correction using pseudo-case-control comparisons.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    VarCards,一个在线数据库,结合全面的变体和基因水平的注释数据,以简化编码变体的遗传咨询。认识到非编码变异的临床相关性日益增加,致力于解释非编码变化的生物信息学工具的加速发展,包括单核苷酸变异和拷贝数变异。遗憾的是,大多数工具仍然是本地安装的数据库或分散在不同在线平台上的命令行工具。这样的景观给寻求在没有先进的生物信息学专业知识的情况下利用这些资源的遗传顾问带来了不便和挑战。因此,我们开发了VarCards2,它整合了近90亿个人工生成的单核苷酸变异(包括来自线粒体DNA的变异),并根据ACMG-AMP变异解释指南,为遗传咨询编制了重要的注释信息.这些注释包括(I)功能效应;(II)次要等位基因频率;(III)涵盖所有潜在变异的综合功能和致病性预测,例如非同义替换,非规范剪接变体,以及非编码变异和(IV)基因水平信息。此外,VarCards2包含368.820.266记录的短插入和删除以及2.773.555记录的拷贝数变化,辅以其相应的注释和预测工具。总之,VarCards2,通过整合超过150个变量和基因水平的注释来源,大大提高了遗传咨询的效率,可以在http:////www上免费访问。genemed.技术//varcards2//.
    VarCards, an online database, combines comprehensive variant- and gene-level annotation data to streamline genetic counselling for coding variants. Recognising the increasing clinical relevance of non-coding variations, there has been an accelerated development of bioinformatics tools dedicated to interpreting non-coding variations, including single-nucleotide variants and copy number variations. Regrettably, most tools remain as either locally installed databases or command-line tools dispersed across diverse online platforms. Such a landscape poses inconveniences and challenges for genetic counsellors seeking to utilise these resources without advanced bioinformatics expertise. Consequently, we developed VarCards2, which incorporates nearly nine billion artificially generated single-nucleotide variants (including those from mitochondrial DNA) and compiles vital annotation information for genetic counselling based on ACMG-AMP variant-interpretation guidelines. These annotations include (I) functional effects; (II) minor allele frequencies; (III) comprehensive function and pathogenicity predictions covering all potential variants, such as non-synonymous substitutions, non-canonical splicing variants, and non-coding variations and (IV) gene-level information. Furthermore, VarCards2 incorporates 368 820 266 documented short insertions and deletions and 2 773 555 documented copy number variations, complemented by their corresponding annotation and prediction tools. In conclusion, VarCards2, by integrating over 150 variant- and gene-level annotation sources, significantly enhances the efficiency of genetic counselling and can be freely accessed at http://www.genemed.tech/varcards2/.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    专有遗传数据集对于提高全基因组关联研究(GWAS)的统计能力非常有价值,但是它们的使用可能会限制调查人员公开分享最终的汇总统计数据。尽管研究人员可以诉诸共享排除受限数据的下采样版本,下采样会降低功率,并可能改变正在研究的表型的遗传病因。当使用多变量GWAS方法时,这些问题更加复杂,如基因组结构方程建模(基因组SEM),建立了多个性状的遗传相关性模型。这里,我们提出了一种系统的方法来评估包括与排除限制性数据的GWAS汇总统计数据的可比性.用外部化因子的多变量GWAS来说明这种方法,我们评估了下采样对(1)单变量GWAS中遗传信号强度的影响,(2)多元基因组SEM中的因子载荷和模型拟合,(3)因子水平上的遗传信号强弱,(4)来自基因属性分析的见解,(5)与其他性状的遗传相关模式,(6)独立样本的多基因评分分析。对于外部化GWAS,虽然下采样导致遗传信号丢失和较少的全基因组显著基因座;因子负荷和模型拟合,基因属性分析,遗传相关性,和多基因评分分析被发现是稳健的。鉴于数据共享对于推进开放科学的重要性,我们建议产生和分享下采样汇总统计数据的研究者将这些分析报告为随附文档,以支持其他研究者使用汇总统计数据.
    Proprietary genetic datasets are valuable for boosting the statistical power of genome-wide association studies (GWASs), but their use can restrict investigators from publicly sharing the resulting summary statistics. Although researchers can resort to sharing down-sampled versions that exclude restricted data, down-sampling reduces power and might change the genetic etiology of the phenotype being studied. These problems are further complicated when using multivariate GWAS methods, such as genomic structural equation modeling (Genomic SEM), that model genetic correlations across multiple traits. Here, we propose a systematic approach to assess the comparability of GWAS summary statistics that include versus exclude restricted data. Illustrating this approach with a multivariate GWAS of an externalizing factor, we assessed the impact of down-sampling on (1) the strength of the genetic signal in univariate GWASs, (2) the factor loadings and model fit in multivariate Genomic SEM, (3) the strength of the genetic signal at the factor level, (4) insights from gene-property analyses, (5) the pattern of genetic correlations with other traits, and (6) polygenic score analyses in independent samples. For the externalizing GWAS, although down-sampling resulted in a loss of genetic signal and fewer genome-wide significant loci; the factor loadings and model fit, gene-property analyses, genetic correlations, and polygenic score analyses were found robust. Given the importance of data sharing for the advancement of open science, we recommend that investigators who generate and share down-sampled summary statistics report these analyses as accompanying documentation to support other researchers\' use of the summary statistics.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Meta-Analysis
    籽粒品质性状是决定小麦经济价值的关键因素,受遗传和环境的影响很大。在这项研究中,使用数量性状基因座(QTL)的荟萃分析和全面的计算机转录组评估,我们确定了谷物品质性状蛋白质含量的关键基因组区域和推定的候选基因,面筋含量,和测试重量。从2003年至2021年发表的41篇关于小麦三个品质性状QTL定位的文章中,共收集了508个原始QTL。当这些原始QTL被投影到由14,548个标记组成的高密度共识图上时,313个QTL鉴定出分布在21条染色体中的17条染色体上的64个MQTL。大多数meta-QTL(MQTL)分布在亚基因组A和B上。与原始QTL相比,MQTL的置信区间(CI)较小,平均值为4.47cM,而预测的QTLsCI为11.13cM(低2.49倍)。MQTL的相应物理长度范围为0.45至239.01Mb。在至少一项全基因组关联研究中验证了这64个MQTL中的31个。此外,选择了64个MQTL中的5个,并将其指定为核心MQTL。使用来自水稻的211个品质相关基因来鉴定MQTL中的小麦同源物。结合转录和组学分析,从64个MQTL区域鉴定了135个推定的候选基因。这些发现将有助于更好地理解小麦育种中籽粒品质的分子遗传机制以及这些性状的改善。
    Grain quality traits are the key factors that determine the economic value of wheat and are largely influenced by genetics and the environment. In this study, using a meta-analysis of quantitative trait loci (QTLs) and a comprehensive in silico transcriptome assessment, we identified key genomic regions and putative candidate genes for the grain quality traits protein content, gluten content, and test weight. A total of 508 original QTLs were collected from 41 articles on QTL mapping for the three quality traits in wheat published from 2003 to 2021. When these original QTLs were projected onto a high-density consensus map consisting of 14,548 markers, 313 QTLs resulted in the identification of 64 MQTLs distributed across 17 of the 21 chromosomes. Most of the meta-QTLs (MQTLs) were distributed on sub-genomes A and B. Compared with the original QTLs, the confidence interval (CI) of the MQTLs was smaller, with an average CI of 4.47 cM, while the projected QTLs CI was 11.13 cM (2.49-fold lower). The corresponding physical length of the MQTL ranged from 0.45 to 239.01 Mb. Thirty-one of these 64 MQTLs were validated in at least one genome-wide association study. In addition, five of the 64 MQTLs were selected and designated as core MQTLs. The 211 quality-related genes from rice were used to identify wheat homologs in MQTLs. In combination with transcriptional and omics analyses, 135 putative candidate genes were identified from 64 MQTL regions. The findings should contribute to a better understanding of the molecular genetic mechanisms underlying grain quality and the improvement of these traits in wheat breeding.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    基因组学驱动的药物发现对于加速新型治疗靶标的开发是必不可少的。然而,基于全基因组关联研究(GWAS)证据的药物发现框架尚未建立,特别是跨人群GWAS荟萃分析。这里,我们介绍了基因组学驱动的药物发现的实用指南,用于跨群体荟萃分析,作为全球生物库荟萃分析计划(GBMI)的经验教训。我们的药物发现框架包括三种方法,并应用于GBMI靶向的13种常见疾病(N均值=1,329,242)。个别方法互补地优先考虑药物和药物靶标,通过参考先前已知的药物-疾病关系进行了系统验证。三种方法的整合提供了用于重新定位的候选药物的全面目录,针对静脉血栓栓塞的凝血过程中涉及的基因和痛风的白细胞介素-4和白细胞介素-13信号通路,提名有希望的候选药物。我们的研究强调了使用跨群体荟萃分析成功的基因组学驱动药物发现的关键因素。
    Genomics-driven drug discovery is indispensable for accelerating the development of novel therapeutic targets. However, the drug discovery framework based on evidence from genome-wide association studies (GWASs) has not been established, especially for cross-population GWAS meta-analysis. Here, we introduce a practical guideline for genomics-driven drug discovery for cross-population meta-analysis, as lessons from the Global Biobank Meta-analysis Initiative (GBMI). Our drug discovery framework encompassed three methodologies and was applied to the 13 common diseases targeted by GBMI (N mean = 1,329,242). Individual methodologies complementarily prioritized drugs and drug targets, which were systematically validated by referring previously known drug-disease relationships. Integration of the three methodologies provided a comprehensive catalog of candidate drugs for repositioning, nominating promising drug candidates targeting the genes involved in the coagulation process for venous thromboembolism and the interleukin-4 and interleukin-13 signaling pathway for gout. Our study highlighted key factors for successful genomics-driven drug discovery using cross-population meta-analyses.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Primary mitochondrial disease describes a diverse group of neuro-metabolic disorders characterised by impaired oxidative phosphorylation. Diagnosis is challenging; >350 genes, both nuclear and mitochondrial DNA (mtDNA) encoded, are known to cause mitochondrial disease, leading to all possible inheritance patterns and further complicated by heteroplasmy of the multicopy mitochondrial genome. Technological advances, particularly next-generation sequencing, have driven a shift in diagnostic practice from \'biopsy first\' to genome-wide analyses of blood and/or urine DNA. This has led to the need for a reference framework for laboratories involved in mitochondrial genetic testing to facilitate a consistent high-quality service. In the United Kingdom, consensus guidelines have been prepared by a working group of Clinical Scientists from the NHS Highly Specialised Service followed by national laboratory consultation. These guidelines summarise current recommended technologies and methodologies for the analysis of mtDNA and nuclear-encoded genes in patients with suspected mitochondrial disease. Genetic testing strategies for diagnosis, family testing and reproductive options including prenatal diagnosis are outlined. Importantly, recommendations for the minimum levels of mtDNA testing for the most common referral reasons are included, as well as guidance on appropriate referrals and information on the minimal appropriate gene content of panels when analysing nuclear mitochondrial genes. Finally, variant interpretation and recommendations for reporting of results are discussed, focussing particularly on the challenges of interpreting and reporting mtDNA variants.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在小麦中,使用先前鉴定的与干旱胁迫(DS)相关的QTL进行荟萃分析,热应力(HS),盐度胁迫(SS),测井应力(WS),收获前发芽(PHS),和铝胁迫(AS)预测了总共134个meta-QTL(MQTL),这些meta-QTL涉及至少28个一致且稳定的MQTL,赋予了对所研究的五种或全部六种非生物胁迫的耐受性。在132个物理锚定的MQTL中,有76个MQTL也通过全基因组关联研究得到了验证。大约43%的MQTL的遗传和物理置信区间小于1cM和5Mb,分别。因此,在一些选定的MQTL中鉴定了539个基因,它们对5种或所有6种非生物胁迫具有耐受性。MQTL基础基因与四个基于RNA-seq的转录组数据集的比较分析揭示了总共189个差异表达基因,其中还包括至少11个在不同数据集中常见的最有希望的候选基因。启动子分析表明,这些基因的启动子包括许多应激反应性顺式调控元件,如ARE,MBS,富含TC的重复,As-1元素,STRE,LTR,WRE3和WUN-motif等。Further,一些MQTL还与多达34个已知的非生物胁迫耐受性基因重叠。此外,小麦中存在许多正交MQTL,玉米,和水稻基因组被发现。这些发现有助于精细定位和基因克隆,以及小麦多种非生物胁迫耐受性的标记辅助育种。
    In wheat, a meta-analysis was performed using previously identified QTLs associated with drought stress (DS), heat stress (HS), salinity stress (SS), water-logging stress (WS), pre-harvest sprouting (PHS), and aluminium stress (AS) which predicted a total of 134 meta-QTLs (MQTLs) that involved at least 28 consistent and stable MQTLs conferring tolerance to five or all six abiotic stresses under study. Seventy-six MQTLs out of the 132 physically anchored MQTLs were also verified with genome-wide association studies. Around 43% of MQTLs had genetic and physical confidence intervals of less than 1 cM and 5 Mb, respectively. Consequently, 539 genes were identified in some selected MQTLs providing tolerance to 5 or all 6 abiotic stresses. Comparative analysis of genes underlying MQTLs with four RNA-seq based transcriptomic datasets unravelled a total of 189 differentially expressed genes which also included at least 11 most promising candidate genes common among different datasets. The promoter analysis showed that the promoters of these genes include many stress responsiveness cis-regulatory elements, such as ARE, MBS, TC-rich repeats, As-1 element, STRE, LTR, WRE3, and WUN-motif among others. Further, some MQTLs also overlapped with as many as 34 known abiotic stress tolerance genes. In addition, numerous ortho-MQTLs among the wheat, maize, and rice genomes were discovered. These findings could help with fine mapping and gene cloning, as well as marker-assisted breeding for multiple abiotic stress tolerances in wheat.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    结论:在小麦三个主要品质性状的Meta分析中,鉴定出110个置信区间(CI)降低的meta-QTL(MQTL)。五个GWAS验证的MQTL(即,1A.1,1B.2,3B.4,5B.2和6B.2),每个涉及超过20个初始QTL和减少的CI(95%)(<2cM),被选入优质育种计划。包括候选基因挖掘和表达分析在内的功能表征发现了44个与质量性状相关的高置信度候选基因。与面团流变学特性相关的数量性状位点(QTL)的荟萃分析,营养性状,并对小麦的加工品质性状进行了研究。为此,从2013-2020年发表的50项区间作图研究中收集了多达2458个QTL.在总QTL中,将1126个QTL投影到具有249,603个标记的共有图谱上,从而鉴定出110个元QTL(MQTL)。与初始QTL的平均CI相比,这些MQTL的平均CI降低了18.84倍(范围为14.87至95.55cM,平均为40.35cM)。在110个MQTL中,108个MQTL被物理锚定到小麦参考基因组,包括通过早期全基因组关联研究报告的标记-性状关联(MTA)验证的51个MQTL。候选基因(CG)挖掘允许从MQTL区域鉴定2533个独特的基因模型。计算机表达分析发现439个差异表达基因模型,在谷物和相关组织中每百万表达>2个转录本,其中还包括44个高置信度CG,涉及与质量性状相关的各种细胞和生化过程。与谷物蛋白质含量相关的九个功能特征小麦基因,高分子量谷蛋白,还发现淀粉合酶与一些MQTL共定位。小麦和水稻MQTL区域之间的合成分析确定了23个小麦MQTL与16个水稻MQTL与品质性状相关。此外,在44个MQTL区域检测到30个已知水稻基因的64个小麦直系同源物。本研究中鉴定的MQTL侧翼标记可用于标记辅助育种,并在基因组选择模型中用作固定效应,以提高优质育种过程中的预测准确性。来自MQTL的水稻基因和其他CGs的小麦直系同源物可以成为进一步功能验证和更好地了解小麦品质性状背后的分子机制的有希望的目标。
    CONCLUSIONS: Meta-analysis in wheat for three major quality traits identified 110 meta-QTL (MQTL) with reduced confidence interval (CI). Five GWAS validated MQTL (viz., 1A.1, 1B.2, 3B.4, 5B.2, and 6B.2), each involving more than 20 initial QTL and reduced CI (95%) (< 2 cM), were selected for quality breeding programmes. Functional characterization including candidate gene mining and expression analysis discovered 44 high confidence candidate genes associated with quality traits. A meta-analysis of quantitative trait loci (QTL) associated with dough rheology properties, nutritional traits, and processing quality traits was conducted in wheat. For this purpose, as many as 2458 QTL were collected from 50 interval mapping studies published during 2013-2020. Of the total QTL, 1126 QTL were projected onto the consensus map saturated with 249,603 markers which led to the identification of 110 meta-QTL (MQTL). These MQTL exhibited an 18.84-fold reduction in the average CI compared to the average CI of the initial QTL (ranging from 14.87 to 95.55 cM with an average of 40.35 cM). Of the 110, 108 MQTL were physically anchored to the wheat reference genome, including 51 MQTL verified with marker-trait associations (MTAs) reported from earlier genome-wide association studies. Candidate gene (CG) mining allowed the identification of 2533 unique gene models from the MQTL regions. In-silico expression analysis discovered 439 differentially expressed gene models with > 2 transcripts per million expressions in grains and related tissues, which also included 44 high-confidence CGs involved in the various cellular and biochemical processes related to quality traits. Nine functionally characterized wheat genes associated with grain protein content, high-molecular-weight glutenin, and starch synthase enzymes were also found to be co-localized with some of the MQTL. Synteny analysis between wheat and rice MQTL regions identified 23 wheat MQTL syntenic to 16 rice MQTL associated with quality traits. Furthermore, 64 wheat orthologues of 30 known rice genes were detected in 44 MQTL regions. Markers flanking the MQTL identified in the present study can be used for marker-assisted breeding and as fixed effects in the genomic selection models for improving the prediction accuracy during quality breeding. Wheat orthologues of rice genes and other CGs available from MQTLs can be promising targets for further functional validation and to better understand the molecular mechanism underlying the quality traits in wheat.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Both gas chromatography-mass spectrometry (GC-MS) and liquid chromatography-mass spectrometry (LC-MS) are widely used metabolomics approaches to detect and quantify hundreds of thousands of metabolite features. However, the application of these techniques to a large number of samples is subject to more complex interactions, particularly for genome-wide association studies (GWAS). This protocol describes an optimized metabolic workflow, which combines an efficient and fast sample preparation with the analysis of a large number of samples for legume crop species. This slightly modified extraction method was initially developed for the analysis of plant and animal tissues and is based on extraction in methyl tert-butyl ether: methanol solvent to allow the capture of polar and lipid metabolites. In addition, we provide a step-by-step guide for reducing analytical variations, which are essential for the high-throughput evaluation of metabolic variance in GWAS.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Recent genomic studies have shed light on the biology and inter-tumoral heterogeneity underlying pineal parenchymal tumors, in particular pineoblastomas (PBs) and pineal parenchymal tumors of intermediate differentiation (PPTIDs). Previous reports, however, had modest sample sizes and lacked the power to integrate molecular and clinical findings. The different proposed molecular group structures also highlighted a need to reach consensus on a robust and relevant classification system. We performed a meta-analysis on 221 patients with molecularly characterized PBs and PPTIDs. DNA methylation profiles were analyzed through complementary bioinformatic approaches and molecular subgrouping was harmonized. Demographic, clinical, and genomic features of patients and samples from these pineal tumor groups were annotated. Four clinically and biologically relevant consensus PB groups were defined: PB-miRNA1 (n = 96), PB-miRNA2 (n = 23), PB-MYC/FOXR2 (n = 34), and PB-RB1 (n = 25). A final molecularly distinct group, designated PPTID (n = 43), comprised histological PPTID and PBs. Genomic and transcriptomic profiling allowed the characterization of oncogenic drivers for individual tumor groups, specifically, alterations in the microRNA processing pathway in PB-miRNA1/2, MYC amplification and FOXR2 overexpression in PB-MYC/FOXR2, RB1 alteration in PB-RB1, and KBTBD4 insertion in PPTID. Age at diagnosis, sex predilection, and metastatic status varied significantly among tumor groups. While patients with PB-miRNA2 and PPTID had superior outcome, survival was intermediate for patients with PB-miRNA1, and dismal for those with PB-MYC/FOXR2 or PB-RB1. Reduced-dose CSI was adequate for patients with average-risk, PB-miRNA1/2 disease. We systematically interrogated the clinical and molecular heterogeneity within pineal parenchymal tumors and proposed a consensus nomenclature for disease groups, laying the groundwork for future studies as well as routine use in tumor diagnostic classification and clinical trial stratification.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号