关键词: 62J05 62P10 92B15 Association study Generalized linear model Genotype calling Group testing Joint significance test Next-generation sequencing Rare variant Score test Variable collapse test

来  源:   DOI:10.3390/math11112560   PDF(Pubmed)

Abstract:
Association testing has been widely used to study the relationship between genetic variants and phenotypes. Most association testing methods are genotype-based, i.e. first estimate genotype and then regress phenotype on estimated genotype and other variables. Directly testing methods based on next generation sequencing (NGS) data without genotype calling have been proposed and shown advantage over genotype-based methods in the scenarios when genotype calling is not accurate. NGS data-based single-variant testing have been proposed including our previously proposed single-variant testing method, i.e. UNC combo method [1]. NGS data-based group testing methods for continuous phenotype have also been proposed by us using a linear model framework which can handle continuous responses [2]. In this paper, we extend our linear model-based framework to a generalized linear model-based framework so that the methods can handle other types of responses especially binary responses which is commonly-faced in association studies. We have conducted extensive simulation studies to evaluate the performance of different estimators and compare our estimators with their corresponding genotype-based methods. We found that all methods have Type I errors controlled, and our NGS data-based testing methods have better performance than their corresponding genotype-based methods in the literature for other types of responses including binary responses (logistic regression) and count responses (Poisson regression especially when sequencing depth is low. In conclusion, we have extended our previous linear model (LM) framework to a generalized linear model (GLM) framework and derived NGS data-based testing methods for a group of genetic variants. Compared with our previously proposed LM-based methods [2], the new GLM-based methods can handle more complex responses (for example, binary responses and count responses) in addition to continuous responses. Our methods have filled the literature gap and shown advantage over their corresponding genotype-based methods in the literature.
摘要:
关联测试已被广泛用于研究遗传变异与表型之间的关系。大多数关联测试方法是基于基因型的,即首先估计基因型,然后在估计的基因型和其他变量上回归表型。已经提出了基于没有基因型调用的下一代测序(NGS)数据的直接测试方法,并且在基因型调用不准确的情况下显示出优于基于基因型的方法的优势。已经提出了基于NGS数据的单变量测试,包括我们先前提出的单变量测试方法,即UNC组合方法[1]。我们还使用可以处理连续反应的线性模型框架提出了基于NGS数据的连续表型群体测试方法[2]。在本文中,我们将基于线性模型的框架扩展到基于广义线性模型的框架,以便该方法可以处理其他类型的响应,尤其是在关联研究中常见的二元响应。我们进行了广泛的模拟研究,以评估不同估计器的性能,并将我们的估计器与其相应的基于基因型的方法进行比较。我们发现所有方法都控制了I型错误,对于其他类型的响应,包括二元响应(逻辑回归)和计数响应(泊松回归,尤其是在测序深度较低时,我们的基于NGS数据的测试方法比文献中相应的基于基因型的方法具有更好的性能。总之,我们将以前的线性模型(LM)框架扩展到广义线性模型(GLM)框架,并推导了一组遗传变异的基于NGS数据的测试方法.与我们以前提出的基于LM的方法[2]相比,新的基于GLM的方法可以处理更复杂的响应(例如,二进制响应和计数响应)以及连续响应。我们的方法填补了文献空白,并在文献中显示出优于其相应的基于基因型的方法的优势。
公众号