关键词: association studies genes maize phenotyping plant height spoken descriptions

Mesh : Zea mays / genetics Genome-Wide Association Study / methods Phenotype Genome, Plant Quantitative Trait Loci Quantitative Trait, Heritable Polymorphism, Single Nucleotide

来  源:   DOI:10.1093/g3journal/jkae161   PDF(Pubmed)

Abstract:
We present a novel approach to genome-wide association studies (GWAS) by leveraging unstructured, spoken phenotypic descriptions to identify genomic regions associated with maize traits. Utilizing the Wisconsin Diversity panel, we collected spoken descriptions of Zea mays ssp. mays traits, converting these qualitative observations into quantitative data amenable to GWAS analysis. First, we determined that visually striking phenotypes could be detected from unstructured spoken phenotypic descriptions. Next, we developed two methods to process the same descriptions to derive the trait plant height, a well-characterized phenotypic feature in maize: (1) a semantic similarity metric that assigns a score based on the resemblance of each observation to the concept of \'tallness\' and (2) a manual scoring system that categorizes and assigns values to phrases related to plant height. Our analysis successfully corroborated known genomic associations and uncovered novel candidate genes potentially linked to plant height. Some of these genes are associated with gene ontology terms that suggest a plausible involvement in determining plant stature. This proof-of-concept demonstrates the viability of spoken phenotypic descriptions in GWAS and introduces a scalable framework for incorporating unstructured language data into genetic association studies. This methodology has the potential not only to enrich the phenotypic data used in GWAS and to enhance the discovery of genetic elements linked to complex traits but also to expand the repertoire of phenotype data collection methods available for use in the field environment.
摘要:
我们提出了一种新的方法,通过利用非结构化的全基因组关联研究(GWAS),口头表型描述,以确定与玉米性状相关的基因组区域。利用威斯康星州多样性小组,我们收集了ZeaMaysssp的口头描述。Mays特征,将这些定性观察结果转换为适合GWAS分析的定量数据。首先,我们确定可以从非结构化的口语表型描述中检测到视觉上醒目的表型.接下来,我们开发了两种方法来处理相同的描述以得出性状植物高度,玉米中具有良好特征的表型特征:(1)语义相似性度量,根据每个观察值与\'高度\'概念的相似性分配分数;(2)手动评分系统,对与植物高度相关的短语进行分类和分配值。我们的分析成功地证实了已知的基因组关联,并发现了可能与植物高度相关的新候选基因。这些基因中的一些与基因本体论术语相关,这表明可能参与确定植物的身材。这个概念证明证明了口语表型描述在GWAS中的可行性,并引入了一个可扩展的框架,用于将非结构化语言数据纳入遗传关联研究。这种方法不仅有可能丰富GWAS中使用的表型数据,并增强与复杂性状相关的遗传元件的发现,而且还可以扩展可用于田间环境的表型数据收集方法。
公众号