基因组评估过程依赖于基因组水平的密集单核苷酸多态性(SNP)标记与数量性状基因座(QTL)之间的连锁不平衡假设。本研究的目的是评估四种频率方法,包括岭回归,最小绝对收缩和选择算子(LASSO),ElasticNet,基因组最佳线性无偏预测(GBLUP)和包括贝叶斯岭回归(BRR)在内的五种贝叶斯方法,贝叶斯A,贝叶斯LASSO,贝叶斯C,和贝叶斯B,在使用模拟数据的基因组选择中。基于统计显著性(p值)成对评估预测准确性之间的差异(即,t检验和Mann-WhitneyU检验)和实际意义(科恩的d效应大小)为此,数据是基于两种不同标记密度(整个基因组中的4000和8000)的情景进行模拟的。模拟数据包括一个有四个染色体的基因组,每个1摩根,其中100个随机分布的QTL和两个不同密度的均匀分布的SNP(1000和2000),在0.4的遗传力水平,被认为。对于除GBLUP外的频率论方法,正则化参数λ是使用五折交叉验证方法计算的。对于这两种情况,在频率论方法中,通过岭回归和GBLUP观察到最高的预测准确性。岭回归和GBLUP显示了最低和最高的偏差,分别。此外,在贝叶斯方法中,BayesB和BRR显示出最高和最低的预测精度,分别。贝叶斯LASSO记录了两种情况下的最低偏差,第一种和第二种情况下的最高偏差由BRR和贝叶斯B显示,分别。在这两种情况下的所有研究方法中,BayesB、LASSO和ElasticNet显示了最高和最低的精度,分别。不出所料,在GBLUP和BRR之间观察到最大的性能相似性(d=0.007,在第一种情况下,d=0.003,在第二种情况下)。从参数t和非参数Mann-WhitneyU检验获得的结果相似。在第一种和第二种情况下,在每个场景中所研究方法的性能之间进行36t检验,14(P<。001)和2(P<。05)比较显著,分别,这表明随着预测因子数量的增加,不同方法的性能差异减小。这是根据科恩的d效应大小证明的,因此,随着模型复杂性的增加,效应大小并没有被视为非常大。在将这些方法用于基因组评估之前,应通过交叉验证方法优化频率方法中的正则化参数。
The genomic evaluation process relies on the assumption of linkage disequilibrium between dense single-nucleotide polymorphism (SNP) markers at the genome level and quantitative trait loci (QTL). The present study was conducted with the aim of evaluating four frequentist methods including Ridge Regression, Least Absolute Shrinkage and Selection Operator (LASSO), Elastic Net, and Genomic Best Linear Unbiased Prediction (GBLUP) and five Bayesian methods including Bayes Ridge Regression (BRR), Bayes A, Bayesian LASSO, Bayes C, and Bayes B, in genomic selection using simulation data. The difference between prediction accuracy was assessed in pairs based on statistical significance (p-value) (i.e., t test and Mann-Whitney U test) and practical significance (Cohen\'s d effect size) For this purpose, the data were simulated based on two scenarios in different marker densities (4000 and 8000, in the whole genome). The simulated data included a genome with four chromosomes, 1 Morgan each, on which 100 randomly distributed QTL and two different densities of evenly distributed SNPs (1000 and 2000), at the heritability level of 0.4, was considered. For the frequentist methods except for GBLUP, the regularization parameter λ was calculated using a five-fold cross-validation approach. For both scenarios, among the frequentist methods, the highest prediction accuracy was observed by Ridge Regression and GBLUP. The lowest and the highest bias were shown by Ridge Regression and GBLUP, respectively. Also, among the Bayesian methods, Bayes B and BRR showed the highest and lowest prediction accuracy, respectively. The lowest bias in both scenarios was registered by Bayesian LASSO and the highest bias in the first and the second scenario were shown by BRR and Bayes B, respectively. Across all the studied methods in both scenarios, the highest and the lowest accuracy were shown by Bayes B and LASSO and Elastic Net, respectively. As expected, the greatest similarity in performance was observed between GBLUP and BRR ( d = 0.007 , in the first scenario and d = 0.003 , in the second scenario). The results obtained from parametric t and non-parametric Mann-Whitney U tests were similar. In the first and second scenario, out of 36 t test between the performance of the studied methods in each scenario, 14 ( P < . 001 ) and 2 ( P < . 05 ) comparisons were significant, respectively, which indicates that with the increase in the number of predictors, the difference in the performance of different methods decreases. This was proven based on the Cohen\'s d effect size, so that with the increase in the complexity of the model, the effect size was not seen as very large. The regularization parameters in frequentist methods should be optimized by cross-validation approach before using these methods in genomic evaluation.