association studies

协会研究
  • 文章类型: Journal Article
    我们提出了一种新的方法,通过利用非结构化的全基因组关联研究(GWAS),口头表型描述,以确定与玉米性状相关的基因组区域。利用威斯康星州多样性小组,我们收集了ZeaMaysssp的口头描述。Mays特征,将这些定性观察结果转换为适合GWAS分析的定量数据。首先,我们确定可以从非结构化的口语表型描述中检测到视觉上醒目的表型.接下来,我们开发了两种方法来处理相同的描述以得出性状植物高度,玉米中具有良好特征的表型特征:(1)语义相似性度量,根据每个观察值与\'高度\'概念的相似性分配分数;(2)手动评分系统,对与植物高度相关的短语进行分类和分配值。我们的分析成功地证实了已知的基因组关联,并发现了可能与植物高度相关的新候选基因。这些基因中的一些与基因本体论术语相关,这表明可能参与确定植物的身材。这个概念证明证明了口语表型描述在GWAS中的可行性,并引入了一个可扩展的框架,用于将非结构化语言数据纳入遗传关联研究。这种方法不仅有可能丰富GWAS中使用的表型数据,并增强与复杂性状相关的遗传元件的发现,而且还可以扩展可用于田间环境的表型数据收集方法。
    We present a novel approach to genome-wide association studies (GWAS) by leveraging unstructured, spoken phenotypic descriptions to identify genomic regions associated with maize traits. Utilizing the Wisconsin Diversity panel, we collected spoken descriptions of Zea mays ssp. mays traits, converting these qualitative observations into quantitative data amenable to GWAS analysis. First, we determined that visually striking phenotypes could be detected from unstructured spoken phenotypic descriptions. Next, we developed two methods to process the same descriptions to derive the trait plant height, a well-characterized phenotypic feature in maize: (1) a semantic similarity metric that assigns a score based on the resemblance of each observation to the concept of \'tallness\' and (2) a manual scoring system that categorizes and assigns values to phrases related to plant height. Our analysis successfully corroborated known genomic associations and uncovered novel candidate genes potentially linked to plant height. Some of these genes are associated with gene ontology terms that suggest a plausible involvement in determining plant stature. This proof-of-concept demonstrates the viability of spoken phenotypic descriptions in GWAS and introduces a scalable framework for incorporating unstructured language data into genetic association studies. This methodology has the potential not only to enrich the phenotypic data used in GWAS and to enhance the discovery of genetic elements linked to complex traits but also to expand the repertoire of phenotype data collection methods available for use in the field environment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:卵巢癌通常是晚期的症状诊断。早期疾病更好的生存率表明,改善诊断途径可能会增加生存率。这项研究检查了评估诊断间隔及其与临床和心理结果的关联的文献。方法:Medline,EMBASE,搜索EmCare数据库中的研究,包括至少一个间隔的定量测量,在2000年1月1日至2022年8月9日之间发布。间隔度量和关联(间隔,结果,分析策略)进行了综合。使用Aarhus检查表和ROBINS-E工具评估关联研究的偏倚风险。结果:总的来说,包括65篇论文(20项关联研究),并确定了26个独特的间隔。间隔估计差异很大,并受到使用的汇总统计量(平均值或中位数)和小组关注的影响。在奥尔胡斯定义的间隔中,患者(症状到表现,n=23;范围[中位数]:7-168天)和诊断(向诊断的呈现,n=22;范围[中位数]:7-270天)最常见。19项关联研究检查了生存或阶段结果,大多数,包括五项低偏倚风险研究,没有发现关联。结论:报告卵巢癌诊断间隔的研究受到不一致定义和报告的限制。需要更多地利用奥尔胡斯声明来定义间隔和适当的分析方法,以加强未来研究的发现。
    Background: Ovarian cancer is commonly diagnosed symptomatically at an advanced stage. Better survival for early disease suggests improving diagnostic pathways may increase survival. This study examines literature assessing diagnostic intervals and their association with clinical and psychological outcomes. Methods: Medline, EMBASE, and EmCare databases were searched for studies including quantitative measures of at least one interval, published between January 1, 2000 and August 9, 2022. Interval measures and associations (interval, outcomes, analytic strategy) were synthesized. Risk of bias of association studies was assessed using the Aarhus Checklist and ROBINS-E tool. Results: In total, 65 papers (20 association studies) were included and 26 unique intervals were identified. Interval estimates varied widely and were impacted by summary statistic used (mean or median) and group focused on. Of Aarhus-defined intervals, patient (symptom to presentation, n = 23; range [median]: 7-168 days) and diagnostic (presentation to diagnosis, n = 22; range [median]: 7-270 days) were most common. Nineteen association studies examined survival or stage outcomes with most, including five low risk-of-bias studies, finding no association. Conclusions: Studies reporting intervals for ovarian cancer diagnosis are limited by inconsistent definitions and reporting. Greater utilization of the Aarhus statement to define intervals and appropriate analytic methods is needed to strengthen findings from future studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:自发性冠状动脉夹层(SCAD)已越来越被认为是中青年女性急性心肌梗死(AMI)的重要原因,并通过独立于动脉粥样硬化的机制产生。SCAD具有多因素病因,包括环境,个人,和遗传因素不同于通常与冠状动脉疾病相关的因素。这里,我们总结了目前对SCAD发生的遗传因素的认识,并强调了将SCAD与动脉粥样硬化性冠状动脉疾病区分开来的因素.
    结果:最近的研究揭示了几种SCAD效应大小不同的相关变异,产生复杂的遗传结构。相关基因强调了动脉细胞及其细胞外基质在SCAD发病机制中的重要作用。以及SCAD与其他全身性动脉病变(如纤维肌性发育不良和血管结缔组织疾病)之间的显着遗传重叠。对个体变异(包括相关基因PHACTR1)的进一步研究以及多基因评分分析已证明SCAD和动脉粥样硬化之间的反向遗传关系是AMI的不同原因。SCAD代表了日益公认的AMI病因,与动脉粥样硬化引起的AMI具有相反的临床和遗传风险因素,它通常与复杂的潜在遗传条件有关。在更大范围和更多样化的队列中对SCAD进行遗传研究不仅会进一步加深我们对AMI新定义的遗传谱的理解。但它也将告知在AMI预防和管理中整合基因检测的临床实用性。
    OBJECTIVE: Spontaneous coronary artery dissection (SCAD) has been increasingly recognized as a significant cause of acute myocardial infarction (AMI) in young and middle-aged women and arises through mechanisms independent of atherosclerosis. SCAD has a multifactorial etiology that includes environmental, individual, and genetic factors distinct from those typically associated with coronary artery disease. Here, we summarize the current understanding of the genetic factors contributing to the development of SCAD and highlight those factors which differentiate SCAD from atherosclerotic coronary artery disease.
    RESULTS: Recent studies have revealed several associated variants with varying effect sizes for SCAD, giving rise to a complex genetic architecture. Associated genes highlight an important role for arterial cells and their extracellular matrix in the pathogenesis of SCAD, as well as notable genetic overlap between SCAD and other systemic arteriopathies such as fibromuscular dysplasia and vascular connective tissue diseases. Further investigation of individual variants (including in the associated gene PHACTR1) along with polygenic score analysis have demonstrated an inverse genetic relationship between SCAD and atherosclerosis as distinct causes of AMI. SCAD represents an increasingly recognized cause of AMI with opposing clinical and genetic risk factors from that of AMI due to atherosclerosis, and it is often associated with complex underlying genetic conditions. Genetic study of SCAD on a larger scale and with more diverse cohorts will not only further our evolving understanding of a newly defined genetic spectrum for AMI, but it will also inform the clinical utility of integrating genetic testing in AMI prevention and management moving forward.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    镰状细胞性贫血(SCA)是儿童中风的最常见原因。因为它是一种罕见的疾病,调查SCD患者卒中等并发症的相关性的研究样本量较小.这里,我们对探索遗传变异与卒中关联的研究进行了系统综述和荟萃分析,以更好地表明其与卒中的关联.搜索PubMed和GoogleScholar,以确定对SCA患者中风风险的遗传变异进行关联分析的研究。筛选合格的研究后,提取与卒中关联分析的汇总统计量和其他一般信息。使用工具METAL的固定效应方法进行Meta分析,并使用R程序绘制森林地块。随机效应模型作为观察到显著异质性的基因座的敏感性分析。使用搜索词确定了407项研究,筛选后纳入了37项研究,累计分析了11,373名SCA患者。这37项研究共纳入2222例SCA卒中患者,主要包括非洲血统的个体(N=16)。这些研究中的三项进行了全外显子组测序,而35项进行了基于单核苷酸的基因分型。尽管研究报告与132个基因座相关,荟萃分析只能对12个来自两个或更多研究数据的基因座进行.荟萃分析后,我们观察到四个基因座与卒中风险显着相关:-α3.7kbα-地中海贫血缺失(P=0.00000027),rs489347-TEK(P=0.00081),rs2238432-ADCY9(P=0.00085),rs11853426-ANXA2(P=0.0034),和rs1800629-TNF(P=0.0003396)。需要改善地中海盆地和印度等SCD患病率高的地区的种族代表性,以进行中风等相关并发症的遗传研究。需要对SCD和包括卒中在内的相关并发症进行更大的全基因组协作研究。
    Sickle cell anemia (SCA) is the most common cause of stroke in children. As it is a rare disease, studies investigating the association with complications like stroke in SCD have small sample sizes. Here, we performed a systematic review and meta-analysis of the studies exploring an association of genetic variants with stroke to get a better indication of their association with stroke. PubMed and Google Scholar were searched to identify studies that had performed an association analysis of genetic variants for the risk of stroke in SCA patients. After screening of eligible studies, summary statistics of association analysis with stroke and other general information were extracted. Meta-analysis was performed using the fixed effect method on the tool METAL and forest plots were plotted using the R program. The random effect model was performed as a sensitivity analysis for loci where significant heterogeneity was observed. 407 studies were identified using the search term and after screening 37 studies that cumulatively analyzed 11,373 SCA patients were included. These 37 studies included a total of 2,222 SCA patients with stroke, predominantly included individuals of African ancestry (N = 16). Three of these studies performed whole exome sequencing while 35 performed single nucleotide-based genotyping. Though the studies reported association with 132 loci, meta-analyses could be performed only for 12 loci that had data from two or more studies. After meta-analysis we observed that four loci were significantly associated with risk for stroke: -α3.7 kb Alpha-thalassemia deletion (P = 0.00000027), rs489347-TEK (P = 0.00081), rs2238432-ADCY9 (P = 0.00085), rs11853426-ANXA2 (P = 0.0034), and rs1800629-TNF (P = 0.0003396). Ethnic representation of regions with a high prevalence of SCD like the Mediterranean basin and India needs to be improved for genetic studies on associated complications like stroke. Larger genome-wide collaborative studies on SCD and associated complications including stroke need to be performed.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    早期识别和治疗潜伏性结核感染(LTBI)是预防结核病(TB)的关键。然而,LTBI的出现受到多种因素的影响,其中个体免疫细胞因子的作用仍存在争议。本研究旨在探讨LTBI的影响因素及其与细胞因子对LTBI的影响。
    选择2021-2022年乌鲁木齐市结核病密切接触者为研究对象,进行实地调查。采用Logistic回归模型对LTBI的影响因素进行分析,主成分分析提取细胞因子的复合指标,和结构方程模型,探讨细胞因子及影响因素对LTBI的直接和间接影响。
    在288个TB密切接触者中,LTBI感染率为33.3%。多因素Logistic模型显示,影响LTBI的因素包括教育、每日接触小时数,吃动物的肝脏,和饮用咖啡(P<0.05);控制混杂因素后,利用主成分分析提取细胞因子的复合指标,CXCL5和IFN-γ是LTBI的保护因子(OR=0.572,P=0.047),IL-10和TNF-α是LTBI的危险因素(OR=2.119,P=0.010);结构方程模型显示喝咖啡,吃动物的肝脏,每日接触小时数,IL-10和TNF-α对LTBI有直接影响,教育对LTBI有间接影响(P<0.05)。
    IL-10和TNF-α参与免疫反应,并与LTBI直接相关。通过监测结核病密切接触者的细胞因子水平,并关注其饮食习惯和暴露,我们可以在早期发现和干预LTBI并控制其进展为TB。
    UNASSIGNED: Early recognition and treatment of latent tuberculosis infection(LTBI) is key to tuberculosis(TB) prevention. However, the emergence of LTBI is influenced by a combination of factors, of which the role of individual immune cytokines remains controversial. The aim of this study is to explore the influencing factors of LTBI and their effects with cytokines on LTBI.
    UNASSIGNED: Close contacts of tuberculosis in Urumqi City from 2021 to 2022 were selected for the study to conduct a field survey. It used logistic regression model to analyse the influencing factors of LTBI, principal component analysis to extract a composite indicators of cytokines, and structural equation modelling to explore the direct and indirect effects of cytokines and influencing factors on LTBI.
    UNASSIGNED: LTBI infection rate of 33.3% among 288 TB close contacts. A multifactorial Logistic model showed that factors influencing LTBI included education, daily contact hours, eating animal liver, and drinking coffee (P<0.05); After controlling for confounding factors and extracting composite indicators of cytokines using principal component analysis, CXCL5 and IFN-γ is a protective factor for LTBI(OR=0.572, P=0.047), IL-10 and TNF-α is a risk factor for LTBI(OR=2.119, P=0.010); Structural equation modelling shows drinking coffee, eating animal liver, daily contact hours, and IL-10 and TNF-α had direct effects on LTBI and educations had indirect effects on LTBI(P<0.05).
    UNASSIGNED: IL-10 and TNF-α are involved in the immune response and are directly related to LTBI. By monitoring the cytokine levels of TB close contacts and paying attention to their dietary habits and exposure, we can detect and intervene in LTBI at an early stage and control their progression to TB.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    番茄(Solanumlycopersicum)的冷胁迫敏感性限制了其栽培,对温带地区和种植季节有重大影响。解开负责冷应激恢复力的基因组区域,包括种植的五十个不同的基因型,野生物种,和地方品种通过测序进行基因分型。超过两年和六个试验采用早播和晚播,对这些线进行了评估。基于Illumina的下一代测序从单独测序的文库池每个样品产生多达3百万个读数。流苏管道产生了10,802种变体,随后过滤至3,854个SNP进行全基因组关联分析(GWAS)。通过TASSEL采用聚类方法(种群结构),SNPhylo,和亲属关系矩阵,50个基因型聚集到四个不同的基因库中。GWAS对番茄耐寒性的整合关键性状包括产量。使用代表各种环境的六个独立表型数据集,该研究确定了4,517个耐寒性状的显着标记性状关联。值得注意的是,冷应力耐受性的关键变化(>10%),特别是脯氨酸含量,与标记-性状关联相关。此外,揭示了5,727个显著的产量和产量相关性状的标记-性状关联,光照对果实产量和直接相关属性的影响。该调查在所有检查的性状中确定了685个候选基因,包括与这些基因组区域内的生物过程相关的60个基因。值得注意的是,60个基因中有7个与非生物胁迫耐受性直接相关,直接或间接地充当应激反应基因。确定的基因,特别是那些与应激反应有关的,是提高番茄栽培耐寒性和作物整体生产力的关键。
    The cold stress susceptibility of tomato (Solanum lycopersicum) curtails its cultivation, with significant impact in temperate regions and on cropping seasons. To unravel genomic regions responsible for cold stress resilience, a diverse set of fifty genotypes encompassing cultivated, wild species, and landraces were genotyped using genotyping-by-sequencing. Over two years and six trials employing both early and late sowing, these lines were evaluated. Illumina-based next-generation sequencing produced up to 3 million reads per sample from individually sequenced library pools. The Tassel pipeline yielded 10,802 variants, subsequently filtered to 3,854 SNPs for genome-wide association analysis (GWAS). Employing clustering methods (population structure) via TASSEL, SNPhylo, and Kinship matrix, the fifty genotypes clustered into four distinct gene pools. The GWAS for cold tolerance in tomato integrated key traits including yield. Using six independent phenotypic datasets representing various environments, the study identified 4,517 significant marker-trait associations for cold tolerance traits. Notably, pivotal variations (> 10%) in cold stress tolerance, particularly proline content, were linked to marker-trait associations. Additionally, 5,727 significant marker-trait associations for yield and yield-related traits were unveiled, shedding light on fruit yield and directly associated attributes. The investigation pinpointed 685 candidate genes across all examined traits, including 60 genes associated with biological processes within these genomic regions. Remarkably, 7 out of the 60 genes were directly linked to abiotic stress tolerance, functioning as stress-responsive genes either directly or indirectly. The identified genes, particularly those associated with stress response, could hold the key to enhancing cold tolerance and overall crop productivity in tomato cultivation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:在田间环境中对植物进行表型分析可能涉及多种方法,包括使用自动化仪器和劳动密集型手动测量和评分。研究人员还收集基于语言的表型描述,并使用受控的词汇和结构,如本体,以实现对描述性表型数据的计算,包括确定表型相似性的方法。在这项研究中,收集植物的口头描述,并指示观察者使用自己的词汇来描述存在和可见的植物特征。Further,这些植物被手动测量和评分作为一个更大的研究的一部分,以调查是否口头植物描述可以用来恢复已知的生物现象。
    方法:数据包括玉米威斯康星州多样性面板的686种种质的表型观察,和25个携带可见的阳性对照材料,戏剧性的表型。数据包括种植的种质清单,字段布局,数据收集程序,学生参与者(出于道德原因,其个人数据受到保护)和志愿者观察成绩单,志愿者音频数据文件,植物的地面和航空图像,亚马逊云科技方法选择实验数据,和手动收集的表型(例如,植物高度,耳朵和流苏的特点,等。;测量和分数)。数据是在2021年夏季在爱荷华州立大学的农业工程和农学研究农场收集的。
    OBJECTIVE: Phenotyping plants in a field environment can involve a variety of methods including the use of automated instruments and labor-intensive manual measurement and scoring. Researchers also collect language-based phenotypic descriptions and use controlled vocabularies and structures such as ontologies to enable computation on descriptive phenotype data, including methods to determine phenotypic similarities. In this study, spoken descriptions of plants were collected and observers were instructed to use their own vocabulary to describe plant features that were present and visible. Further, these plants were measured and scored manually as part of a larger study to investigate whether spoken plant descriptions can be used to recover known biological phenomena.
    METHODS: Data comprise phenotypic observations of 686 accessions of the maize Wisconsin Diversity panel, and 25 positive control accessions that carry visible, dramatic phenotypes. The data include the list of accessions planted, field layout, data collection procedures, student participants\' (whose personal data are protected for ethical reasons) and volunteers\' observation transcripts, volunteers\' audio data files, terrestrial and aerial images of the plants, Amazon Web Services method selection experimental data, and manually collected phenotypes (e.g., plant height, ear and tassel features, etc.; measurements and scores). Data were collected during the summer of 2021 at Iowa State University\'s Agricultural Engineering and Agronomy Research Farms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:外显谱低端的遗传变异历来难以解释,因为它们的高人群频率超过了相关疾病的患病率。导致变异和疾病之间缺乏明确的隔离。目前,这些变体的分类存在很大差异,没有正式的分类框架被广泛采用。成立了临床基因组资源低渗透/风险等位基因工作组,以应对这些挑战并促进临床社区内的协调。
    方法:此处介绍的工作是内部和社区Likert规模调查的产物,并结合工作组内的专家共识。
    结果:我们正式认识到风险等位基因和低外显率变异是与导致高渗透性疾病不同的变异类别,需要对其临床分类和报告进行特殊考虑。首先,我们提供了这些变体的首选术语。第二,我们将重点放在风险等位基因上,并详细考虑了相关研究,并提出了对这些变异进行分类的框架.最后,我们讨论了风险等位基因临床报告的注意事项.
    结论:这些建议支持统一的解释,分类,并报告外显谱低端的变体。
    Genetic variants at the low end of the penetrance spectrum have historically been challenging to interpret because their high population frequencies exceed the disease prevalence of the associated condition, leading to a lack of clear segregation between the variant and disease. There is currently substantial variation in the classification of these variants, and no formal classification framework has been widely adopted. The Clinical Genome Resource Low Penetrance/Risk Allele Working Group was formed to address these challenges and promote harmonization within the clinical community.
    The work presented here is the product of internal and community Likert-scaled surveys in combination with expert consensus within the Working Group.
    We formally recognize risk alleles and low-penetrance variants as distinct variant classes from those causing highly penetrant disease that require special considerations regarding their clinical classification and reporting. First, we provide a preferred terminology for these variants. Second, we focus on risk alleles and detail considerations for reviewing relevant studies and present a framework for the classification these variants. Finally, we discuss considerations for clinical reporting of risk alleles.
    These recommendations support harmonized interpretation, classification, and reporting of variants at the low end of the penetrance spectrum.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:自发性冠状动脉夹层(SCAD)是急性心肌梗塞的重要原因,在中青年女性中越来越受到重视。SCAD的病因可能是多因素的,可能包括环境因素和个体因素的相互作用。这里,我们总结了目前对SCAD发生发展的遗传因素的理解。
    结果:SCAD的分子发现已被证明包括具有巨大影响的罕见DNA序列变体的组合,导致复杂遗传结构的常见变异,和具有中等影响的变体。与SCAD相关的基因强调了动脉细胞及其细胞外基质在疾病发病机理中的作用,并阐明了SCAD与其他疾病之间的关系。包括纤维肌发育不良和结缔组织疾病。虽然高达10%的受影响的个体可能会有一个罕见的变异与大的影响,SCAD通常表现为复杂的遗传状况。对更大,更多样化的队列的分析将继续提高我们对风险易感性基因座的理解,并且还将考虑遗传检测策略在SCAD管理中的临床实用性。
    Spontaneous coronary artery dissection (SCAD) is a significant cause of acute myocardial infarction that is increasingly recognized in young and middle-aged women. The etiology of SCAD is likely multifactorial and may include the interaction of environmental and individual factors. Here, we summarize the current understanding of the genetic factors contributing to the development of SCAD.
    The molecular findings underlying SCAD have been demonstrated to include a combination of rare DNA sequence variants with large effects, common variants contributing to a complex genetic architecture, and variants with intermediate impact. The genes associated with SCAD highlight the role of arterial cells and their extracellular matrix in the pathogenesis of the disease and shed light on the relationship between SCAD and other disorders, including fibromuscular dysplasia and connective tissue diseases. While up to 10% of affected individuals may harbor a rare variant with large effect, SCAD most often presents as a complex genetic condition. Analyses of larger and more diverse cohorts will continue to improve our understanding of risk susceptibility loci and will also enable consideration of the clinical utility of genetic testing strategies in the management of SCAD.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:由于遗传变异和种群分层之间的连锁不平衡(LD),在遗传关联研究中,识别与复杂性状相关的变异是一项具有挑战性的任务,与疾病风险无关。现有的种群结构校正方法在对感兴趣的性状和遗传标记之间的关联进行建模时使用具有随机效应的主成分分析或线性混合模型。然而,由于严格的显著性阈值和标记之间的潜在相互作用,这些方法通常无法检测到真正相关的变体。
    结果:为了克服这个问题,我们提议CluStrat,它可以纠正复杂的任意结构的种群,同时利用遗传标记之间的连锁不平衡诱导距离。它使用标记的马氏距离协方差矩阵执行聚集的层次聚类。在模拟研究中,我们表明,我们的方法在检测真正的因果变异方面优于现有的方法。在WTCCC2和英国生物库队列中应用CluStrat,我们在精神分裂症和心肌梗死中发现了生物学相关的关联.CluStrat还能够纠正欧洲人身高多基因适应中的种群结构。
    结论:CluStrat突出了生物学相关距离度量的优势,比如马氏距离,它比欧几里得距离更好地捕获了LD存在下种群内部的隐秘相互作用。
    BACKGROUND: Identifying variants associated with complex traits is a challenging task in genetic association studies due to linkage disequilibrium (LD) between genetic variants and population stratification, unrelated to the disease risk. Existing methods of population structure correction use principal component analysis or linear mixed models with a random effect when modeling associations between a trait of interest and genetic markers. However, due to stringent significance thresholds and latent interactions between the markers, these methods often fail to detect genuinely associated variants.
    RESULTS: To overcome this, we propose CluStrat, which corrects for complex arbitrarily structured populations while leveraging the linkage disequilibrium induced distances between genetic markers. It performs an agglomerative hierarchical clustering using the Mahalanobis distance covariance matrix of the markers. In simulation studies, we show that our method outperforms existing methods in detecting true causal variants. Applying CluStrat on WTCCC2 and UK Biobank cohorts, we found biologically relevant associations in Schizophrenia and Myocardial Infarction. CluStrat was also able to correct for population structure in polygenic adaptation of height in Europeans.
    CONCLUSIONS: CluStrat highlights the advantages of biologically relevant distance metrics, such as the Mahalanobis distance, which captures the cryptic interactions within populations in the presence of LD better than the Euclidean distance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号