关键词: longitudinal analysis methods longitudinal method comparison polygenic risk scores post-traumatic stress disorder repeated measures simulation study

来  源:   DOI:10.3389/fgene.2024.1203577   PDF(Pubmed)

Abstract:
Cross-sectional data allow the investigation of how genetics influence health at a single time point, but to understand how the genome impacts phenotype development, one must use repeated measures data. Ignoring the dependency inherent in repeated measures can exacerbate false positives and requires the utilization of methods other than general or generalized linear models. Many methods can accommodate longitudinal data, including the commonly used linear mixed model and generalized estimating equation, as well as the less popular fixed-effects model, cluster-robust standard error adjustment, and aggregate regression. We simulated longitudinal data and applied these five methods alongside naïve linear regression, which ignored the dependency and served as a baseline, to compare their power, false positive rate, estimation accuracy, and precision. The results showed that the naïve linear regression and fixed-effects models incurred high false positive rates when analyzing a predictor that is fixed over time, making them unviable for studying time-invariant genetic effects. The linear mixed models maintained low false positive rates and unbiased estimation. The generalized estimating equation was similar to the former in terms of power and estimation, but it had increased false positives when the sample size was low, as did cluster-robust standard error adjustment. Aggregate regression produced biased estimates when predictor effects varied over time. To show how the method choice affects downstream results, we performed longitudinal analyses in an adolescent cohort of African and European ancestry. We examined how developing post-traumatic stress symptoms were predicted by polygenic risk, traumatic events, exposure to sexual abuse, and income using four approaches-linear mixed models, generalized estimating equations, cluster-robust standard error adjustment, and aggregate regression. While the directions of effect were generally consistent, coefficient magnitudes and statistical significance differed across methods. Our in-depth comparison of longitudinal methods showed that linear mixed models and generalized estimating equations were applicable in most scenarios requiring longitudinal modeling, but no approach produced identical results even if fit to the same data. Since result discrepancies can result from methodological choices, it is crucial that researchers determine their model a priori, refrain from testing multiple approaches to obtain favorable results, and utilize as similar as possible methods when seeking to replicate results.
摘要:
横截面数据允许在单个时间点调查遗传学如何影响健康,但要了解基因组如何影响表型发育,必须使用重复测量数据。忽略重复测量中固有的依赖性可能会加剧误报,并且需要使用通用或广义线性模型以外的方法。许多方法可以容纳纵向数据,包括常用的线性混合模型和广义估计方程,以及不太受欢迎的固定效应模型,集群鲁棒标准误差调整,和总体回归。我们模拟了纵向数据,并将这五种方法与幼稚线性回归一起应用,它忽略了依赖关系,并作为基线,为了比较他们的力量,假阳性率,估计精度,和精度。结果表明,在分析随时间固定的预测因子时,幼稚线性回归和固定效应模型会产生很高的假阳性率,使它们无法研究时不变的遗传效应。线性混合模型保持了低的假阳性率和无偏估计。广义估计方程在功率和估计方面与前者相似,但是当样本量低时,它增加了假阳性,集群稳健标准误差调整也是如此。当预测效果随时间变化时,综合回归会产生有偏差的估计。为了显示方法选择如何影响下游结果,我们对非洲和欧洲血统的青少年队列进行了纵向分析.我们研究了如何通过多基因风险预测创伤后应激症状的发展,创伤性事件,遭受性虐待,和收入使用四种方法-线性混合模型,广义估计方程,集群鲁棒标准误差调整,和总体回归。虽然效果方向基本一致,不同方法的系数大小和统计显著性不同。我们对纵向方法的深入比较表明,线性混合模型和广义估计方程适用于大多数需要纵向建模的场景。但是即使适合相同的数据,也没有方法产生相同的结果。由于方法选择会导致结果差异,研究人员先验地确定他们的模型是至关重要的,避免测试多种方法以获得良好的结果,并在寻求复制结果时使用尽可能相似的方法。
公众号