Genomic prediction

基因组预测
  • 文章类型: Journal Article
    栽培马铃薯,马铃薯L.,被认为是具有12条染色体和四个同源相的同源四倍体。然而,最近的证据发现,由于基因组中频繁的大相位缺失,基因倍性在整个基因组中不是恒定的。发现优良品种“Otava”在所有基因座中的平均基因拷贝数为3.2。优良马铃薯品种的育种计划越来越依赖基因组预测工具来进行选择育种和阐明支撑性状遗传变异的数量性状基因座。这些通常基于匿名单核苷酸多态性(SNP)标记,通常被称为,例如,使用四倍体模型的SNP阵列或测序数据。在这项研究中,我们分析了在单性状加性基因组最佳线性无偏预测(GBLUP)基因组预测(GP)模型和马铃薯的单标记回归全基因组关联研究中,使用基因分型为四倍体或观察到的等位基因频率的全基因组标记的影响,以评估捕获不同倍性对基因组育种中使用的统计模型的影响.使用18个优良育种材料亲本的Dialell杂交的762个后代进行建模。这些通过测序进行基因分型,并对五个关键性能特征进行表型分型:芯片质量,长/宽比,衰老,干物质含量,和产量。我们还估计了从模拟数据中自信地区分杂合三倍体和四倍体状态所需的读段覆盖率。发现与使用观察到的解释真实标记倍性的等位基因频率相比,使用四倍体模型既没有受损也没有改善基因组预测。在全基因组关联研究(GWAS)中,在两个数据集之间观察到信号幅度和支持次要和主要数量性状基因座(QTL)的SNP数量的非常小的变化。然而,使用两个数据集,所有主要QTL均可重复.
    Cultivated potato, Solanum tuberosum L., is considered an autotetraploid with 12 chromosomes with four homologous phases. However, recent evidence found that, due to frequent large phase deletions in the genome, gene ploidy is not constant across the genome. The elite cultivar \"Otava\" was found to have an average gene copy number of 3.2 across all loci. Breeding programs for elite potato cultivars rely increasingly on genomic prediction tools for selection breeding and elucidation of quantitative trait loci underpinning trait genetic variance. These are typically based on anonymous single nucleotide polymorphism (SNP) markers, which are usually called from, for example, SNP array or sequencing data using a tetraploid model. In this study, we analyzed the impact of using whole genome markers genotyped as either tetraploid or observed allele frequencies from genotype-by-sequencing data on single-trait additive genomic best linear unbiased prediction (GBLUP) genomic prediction (GP) models and single-marker regression genome-wide association studies of potato to evaluate the implications of capturing varying ploidy on the statistical models employed in genomic breeding. A panel of 762 offspring of a diallel cross of 18 parents of elite breeding material was used for modeling. These were genotyped by sequencing and phenotyped for five key performance traits: chipping quality, length/width ratio, senescence, dry matter content, and yield. We also estimated the read coverage required to confidently discriminate between a heterozygous triploid and tetraploid state from simulated data. It was found that using a tetraploid model neither impaired nor improved genomic predictions compared to using the observed allele frequencies that account for true marker ploidy. In genome-wide associations studies (GWAS), very minor variations of both signal amplitude and number of SNPs supporting both minor and major quantitative trait loci (QTLs) were observed between the two data sets. However, all major QTLs were reproducible using both data sets.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    预测栽培植物的性能(产量或其他综合性状)是复杂的,因为它不仅涉及估计候选人对选择的遗传价值,基因型与环境(GxE)之间的相互作用,以及给定性状的基因组区域之间的上位相互作用,以及有助于整合性状的性状之间的相互作用。经典基因组预测(GP)模型主要考虑加性效应,不适合估计非加性效应,例如上位。因此,之前已经提出使用机器学习和深度学习方法来对这些非线性效应进行建模。
    在这项研究中,我们提出了一种称为卷积神经网络(CNN)的人工神经网络(ANN),并将其与两种经典的GP回归方法进行了比较,以预测高粱的综合性状:地上鲜重积累。我们还建议使用作物生长模型(CGM)可以通过将其分解为更可遗传的中间性状来增强对综合性状的预测。
    结果表明,CNN在准确性方面优于LASSO和BayesC方法,这表明CNN更适合预测整合特征。此外,CGM-GP组合方法的预测能力超过了没有CGM整合的GP,与使用的回归方法无关。
    这些结果与最近旨在开发基因组到表型模型并倡导使用非线性预测方法的工作一致,并结合使用CGM-GP来增强作物性能的预测。
    UNASSIGNED: Predicting the performance (yield or other integrative traits) of cultivated plants is complex because it involves not only estimating the genetic value of the candidates to selection, the interactions between the genotype and the environment (GxE) but also the epistatic interactions between genomic regions for a given trait, and the interactions between the traits contributing to the integrative trait. Classical Genomic Prediction (GP) models mostly account for additive effects and are not suitable to estimate non-additive effects such as epistasis. Therefore, the use of machine learning and deep learning methods has been previously proposed to model those non-linear effects.
    UNASSIGNED: In this study, we propose a type of Artificial Neural Network (ANN) called Convolutional Neural Network (CNN) and compare it to two classical GP regression methods for their ability to predict an integrative trait of sorghum: aboveground fresh weight accumulation. We also suggest that the use of a crop growth model (CGM) can enhance predictions of integrative traits by decomposing them into more heritable intermediate traits.
    UNASSIGNED: The results show that CNN outperformed both LASSO and Bayes C methods in accuracy, suggesting that CNN are better suited to predict integrative traits. Furthermore, the predictive ability of the combined CGM-GP approach surpassed that of GP without the CGM integration, irrespective of the regression method used.
    UNASSIGNED: These results are consistent with recent works aiming to develop Genome-to-Phenotype models and advocate for the use of non-linear prediction methods, and the use of combined CGM-GP to enhance the prediction of crop performances.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Strigahermonthica(德尔。)第十。,寄生杂草,导致撒哈拉以南非洲(SSA)玉米产量大幅下降。玉米Striga抗性的育种受到优良种质中Striga抗性的遗传多样性和人工Striga侵染下的表型能力的限制。基因组学支持的方法具有加速鉴定用于杂种发育的Striga抗性系的潜力。这项研究的目的是评估基因组选择与Striga抗性和谷物产量(GY)相关的性状的准确性,并预测已测试和未测试的双单倍体(DH)玉米品系的遗传值。我们用8,439个rAmpSeq标记对606个DH系进行了基因分型。在肯尼亚三个地点的人工Striga侵染下,对与两个测试人员交叉的116个DH线的训练集进行了表型分析。Striga抗性参数的遗传力为0.38〜0.65,而GY的遗传力为0.54。跨位置的Striga抗性相关性状的预测准确性,通过交叉验证(CV)确定,CV0为0.24~0.53,CV2为0.20~0.37.对于GY,CV0和CV2的预测精度分别为0.59和0.56。结果显示,在种植后8、10和12周,对于减少数量的Striga植物(STR),300个DH系具有理想的基因组估计育种值(GEBV)。训练集和测试集中与Striga抗性相关性状的DH品系的GEBV大小相似。这些结果突出了基因组选择在玉米Striga抗性育种中的潜在应用。基因组辅助策略和DH技术的整合与主要适应性性状的正向育种相结合,将提高玉米Striga抗性育种的遗传增益。
    Striga hermonthica (Del.) Benth., a parasitic weed, causes substantial yield losses in maize production in sub-Saharan Africa (SSA). Breeding for Striga resistance in maize is constrained by limited genetic diversity for Striga resistance within the elite germplasm and phenotyping capacity under artificial Striga infestation. Genomics-enabled approaches have the potential to accelerate identification of Striga resistant lines for hybrid development. The objectives of this study were to evaluate the accuracy of genomic selection for traits associated with Striga resistance and grain yield (GY) and to predict genetic values of tested and untested doubled haploid (DH) maize lines. We genotyped 606 DH lines with 8,439 rAmpSeq markers. A training set of 116 DH lines crossed to two testers was phenotyped under artificial Striga infestation at three locations in Kenya. Heritability for Striga resistance parameters ranged from 0.38‒0.65 while that for GY was 0.54. The prediction accuracies for Striga resistance-associated traits across locations, as determined by cross validation (CV) were 0.24 to 0.53 for CV0 and from 0.20 to 0.37 for CV2. For GY, the prediction accuracies were 0.59 and 0.56 for CV0 and CV2, respectively. The results revealed 300 DH lines with desirable genomic estimated breeding values (GEBVs) for reduced number of emerged Striga plants (STR) at 8, 10, and 12 weeks after planting. The GEBVs of DH lines for Striga resistance associated traits in the training and testing sets were similar in magnitude. These results highlight the potential application of genomic selection in breeding for Striga resistance in maize. The integration of genomic-assisted strategies and DH technology for line development coupled with forward breeding for major adaptive traits will enhance genetic gains in breeding for Striga resistance in maize.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:结构基因组变异(SVs)在植物基因组中普遍存在,在进化和驯化中发挥了重要作用,因为它们构成了基因组和表型变异性的重要来源。然而,大多数数量遗传学方法侧重于作物改良,例如基因组预测,仅考虑单核苷酸多态性(SNP)。深度学习(DL)是一种有前途的基因组预测策略,但其使用SV和SNP作为遗传标记的性能仍然未知。
    结果:我们使用水稻来研究将SV和SNP组合是否可以比单独的SNP产生更好的性状预测,并研究深度学习(DL)网络相对于贝叶斯线性模型的潜在优势。具体来说,将贝叶斯C(考虑加性效应)和贝叶斯可复制核希尔伯特空间(RKHS)回归(同时考虑加性效应和非加性效应)的性能与两种不同的DL架构的性能进行了比较,多层感知器,和卷积神经网络,通过使用各种标记输入策略来探索他们的预测能力。我们发现,在87%的病例中,利用结构和核苷酸变异略微提高了对复杂性状的预测能力。DL模型在75%的研究案例中优于贝叶斯模型,考虑四个特征和使用的两个验证策略。最后,DL系统地提高了二元性状对贝叶斯模型的预测能力。
    结论:我们的研究表明,使用结构基因组变体可以改善水稻的性状预测,独立于所使用的方法。此外,我们的结果表明,深度学习(DL)网络在二元性状预测方面比贝叶斯模型表现更好,当训练集和目标集不密切相关时,在数量性状中。这突出了DL在特定情况下增强作物改良的潜力,以及在基因组选择中除SNP外考虑SV的重要性。
    BACKGROUND: Structural genomic variants (SVs) are prevalent in plant genomes and have played an important role in evolution and domestication, as they constitute a significant source of genomic and phenotypic variability. Nevertheless, most methods in quantitative genetics focusing on crop improvement, such as genomic prediction, consider only Single Nucleotide Polymorphisms (SNPs). Deep Learning (DL) is a promising strategy for genomic prediction, but its performance using SVs and SNPs as genetic markers remains unknown.
    RESULTS: We used rice to investigate whether combining SVs and SNPs can result in better trait prediction over SNPs alone and examine the potential advantage of Deep Learning (DL) networks over Bayesian Linear models. Specifically, the performances of BayesC (considering additive effects) and a Bayesian Reproducible Kernel Hilbert space (RKHS) regression (considering both additive and non-additive effects) were compared to those of two different DL architectures, the Multilayer Perceptron, and the Convolution Neural Network, to explore their prediction ability by using various marker input strategies. We found that exploiting structural and nucleotide variation slightly improved prediction ability on complex traits in 87% of the cases. DL models outperformed Bayesian models in 75% of the studied cases, considering the four traits and the two validation strategies used. Finally, DL systematically improved prediction ability of binary traits against the Bayesian models.
    CONCLUSIONS: Our study reveals that the use of structural genomic variants can improve trait prediction in rice, independently of the methodology used. Also, our results suggest that Deep Learning (DL) networks can perform better than Bayesian models in the prediction of binary traits, and in quantitative traits when the training and target sets are not closely related. This highlights the potential of DL to enhance crop improvement in specific scenarios and the importance to consider SVs in addition to SNPs in genomic selection.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    蓝莓(Vacciniumspp。)是消费最多的软水果之一,被认为是促进健康化合物的重要来源。在采后储存过程中,由于水果软化和腐烂,高度易腐烂且容易迅速变质,现代育种计划正在寻求最大限度地提高质量和延长新鲜蓝莓的市场寿命。然而,目前尚不确定蓝莓采后品质性状的基因控制。本研究旨在研究蓝莓采后受影响的主要果实品质性状的预测能力和遗传基础,以建立具有延长货架期的品种的育种策略。为了实现这一目标,我们在588个个体的育种群体中进行了目标基因分型,并在一天后评估了几个果实品质性状,一个星期,三周,和7周的采后储存在1℃。使用基于纵向基因组的方法,我们估计了遗传参数并预测了未观察到的表型。我们的结果显示了很大的多样性,中等遗传力,以及大多数性状在采后储存过程中一致的预测准确性。关于水果品质,硬度在采后储存过程中变化最大,即使在冷藏七周后,也有数量惊人的基因型保持或增加其硬度。我们的结果表明,我们可以通过育种有效地提高蓝莓采后品质,并使用基因组预测来长期最大化遗传增益。我们还强调了通过整合收获的已知表型数据,使用纵向基因组预测模型来预测采后延长期果实质量的潜力。
    Blueberry (Vaccinium spp.) is among the most-consumed soft fruit and has been recognized as an important source of health-promoting compounds. Highly perishable and susceptible to rapid spoilage due to fruit softening and decay during postharvest storage, modern breeding programs are looking to maximize quality and extend the market life of fresh blueberries. However, it is uncertain how genetically controlled postharvest quality traits are in blueberries. This study aimed to investigate the prediction ability and genetic basis of the main fruit quality traits affected during blueberry postharvest to create breeding strategies for developing cultivars with an extended shelf life. To achieve this goal, we carried out target genotyping in a breeding population of 588 individuals and evaluated for several fruit quality traits after one day, one week, three weeks, and seven weeks of postharvest storage at 1 °C. Using longitudinal genome-based methods, we estimated genetic parameters and predicted unobserved phenotypes. Our results showed large diversity, moderate heritability, and consistent predictive accuracies along the postharvest storage for most of the traits. Regarding fruit quality, firmness showed the largest variation during postharvest storage, with a surprising number of genotypes maintaining or increasing their firmness even after seven weeks of cold storage. Our results suggest that we can effectively improve blueberry postharvest quality through breeding and use genomic prediction to maximize the genetic gains in the long term. We also emphasize the potential of using longitudinal genomic prediction models to predict fruit quality at extended postharvest periods by integrating known phenotypic data from harvest.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    柳枝草是生物能源或碳捕获计划的潜在作物,但是需要通过选择性育种进一步提高产量以鼓励商业化。为了确定有希望的柳枝枝枝种质,以进行未来的育种工作,我们进行了多位点和多性状基因组预测,使用来自4个柳枝草亚群的630个基因型的多样性面板(海湾,中西部,沿海,和德克萨斯州),测量了10个地点的植物生物量产量。我们的研究重点是使用基因组预测在性状和环境之间共享信息。具体来说,我们仅使用遗传数据和训练集评估了交叉验证(CV)方案的预测能力,(交叉验证1:CV1),站点的子集(交叉验证2:CV2),和/或具有两个产量替代(开花时间和秋季植物高度)。我们发现基因型与环境的相互作用很大程度上是由于站点的南北分布。产量替代与生物量产量之间的遗传相关性通常为正(平均高度r=0.85;平均开花时间r=0.45),并且由于亚群或生长区域而没有变化(北部,中间,南)。基因组预测模型对仅使用遗传数据(CV1)的个体具有-0.02的交叉验证预测能力,但对具有来自一个个体的生物量表现数据的个体具有0.55、0.69、0.76、0.81和0.84,两个,三,训练数据(CV2)中包含的四个和五个站点,分别。为了模拟资源有限的育种计划,我们确定了模型的预测能力:开花时间的一个地点观察(0.39),一次观察开花时间和落高(0.51),一处观察坠落高度(0.52),一处生物量观测(0.55),和五个地点的生物量产量观测值(0.84)。在区域范围内共享信息的能力非常令人鼓舞,但是需要进一步的研究才能将间隔的植物生物量准确地转化为商业规模的草皮生物量性能。
    Switchgrass is a potential crop for bioenergy or carbon capture schemes, but further yield improvements through selective breeding are needed to encourage commercialization. To identify promising switchgrass germplasm for future breeding efforts, we conducted multi-site and multi-trait genomic prediction with a diversity panel of 630 genotypes from 4 switchgrass subpopulations (Gulf, Midwest, Coastal, and Texas), which were measured for spaced plant biomass yield across 10 sites. Our study focused on the use of genomic prediction to share information among traits and environments. Specifically, we evaluated the predictive ability of cross-validation (CV) schemes using only genetic data and the training set, (cross validation 1: CV1), a subset of the sites (cross validation 2: CV2), and/or with two yield surrogates (flowering time and fall plant height). We found that genotype-by-environment interactions were largely due to the north-south distribution of sites. The genetic correlations between yield surrogates and biomass yield were generally positive (mean height r=0.85; mean flowering time r=0.45) and did not vary due to subpopulation or growing region (North, Middle, South). Genomic prediction models had cross-validation predictive abilities of -0.02 for individuals using only genetic data (CV1) but 0.55, 0.69, 0.76, 0.81, and 0.84 for individuals with biomass performance data from one, two, three, four and five sites included in the training data (CV2), respectively. To simulate a resource-limited breeding program, we determined the predictive ability of models provided with: one site observation of flowering time (0.39), one site observation of flowering time and fall height (0.51), one site observation of fall height (0.52), one site observation of biomass (0.55), and five site observations of biomass yield (0.84). The ability to share information at a regional scale is very encouraging but further research is required to accurately translate spaced plant biomass to commercial-scale sward biomass performance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Genomic prediction has emerged as a pivotal technology for the genetic evaluation of livestock, crops, and for predicting human disease risks. However, classical genomic prediction methods face challenges in incorporating biological prior information such as the genetic regulation mechanisms of traits. This study introduces a novel approach that integrates mRNA transcript information to predict complex trait phenotypes. To evaluate the accuracy of the new method, we utilized a Drosophila population that is widely employed in quantitative genetics researches globally. Results indicate that integrating mRNA transcript data can significantly enhance the genomic prediction accuracy for certain traits, though it does not improve phenotype prediction accuracy for all traits. Compared with GBLUP, the prediction accuracy for olfactory response to dCarvone in male Drosophila increased from 0.256 to 0.274. Similarly, the accuracy for cafe in male Drosophila rose from 0.355 to 0.401. The prediction accuracy for survival_paraquat in male Drosophila is improved from 0.101 to 0.138. In female Drosophila, the accuracy of olfactory response to 1hexanol increased from 0.147 to 0.210. In conclusion, integrating mRNA transcripts can substantially improve genomic prediction accuracy of certain traits by up to 43%, with range of 7% to 43%. Furthermore, for some traits, considering interaction effects along with mRNA transcript integration can lead to even higher prediction accuracy.
    基因组预测已成为畜禽、作物遗传评估和人类疾病风险预测的主要技术,但经典的基因组预测方法在性状遗传调控机制等生物学先验信息的整合方面有一定的不足。本研究提出一种将mRNA转录本信息整合应用于复杂性状表型预测的方法。基于国际上广泛应用于数量遗传学研究的果蝇群体,对本研究提出的新方法进行准确性评估。结果显示,整合mRNA转录本,可有效提高部分性状基因组预测准确性,但对部分性状的表型预测准确性没有改善。与GBLUP相比,雄性果蝇D-香芹酮嗅觉反应(dCarvone)准确性由0.256提高到0.274,提高幅度7%。雄性果蝇咖啡因耐受反应(cafe)准确性由0.355提高到0.401,提高幅度13%。雄性果蝇百草枯耐受反应(survival_paraquat)准确性由0.101提高到0.138,提高幅度36%。雌性果蝇1-已醇嗅觉反应(1hexanol)准确性由0.147提高到0.210,提高幅度43%。综上所述,对于部分性状,通过整合mRNA转录本可有效提高基因组预测准确性(提高幅度为7%~43%)。对于部分性状,整合mRNA转录本并考虑互作效应可进一步提高预测准确性。.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    中间麦草(IWG)是一种多年生草,可生产营养谷物,同时提供大量的生态系统服务。这种作物的商业品种大多是通过将一些选定的个体相互配合而发展起来的合成广生种群。由于这些合成种群的发展和世代进步是一个多年的过程,较早的合成世代由育种者测试,随后的世代被释放给种植者。目前缺乏IWG合成品种中的世代比较。在这项研究中,我们使用模拟模型和基因组预测来分析MN-Clearwater的四个合成世代的种群差异和遗传变异趋势,明尼苏达大学发布的商业品种。在四代人口遗传方面几乎没有观察到差异,遗传亲缘关系,和通过连锁不平衡测量的全基因组标记关系。当使用7个亲本来生成合成种群时,观察到遗传变异的减少,而使用20个则导致确定种群变异的最佳结果。植物高度的基因组预测,自由脱粒能力,种子质量,四个合成世代之间的谷物产量在世代之间显示出一些显着差异,但数值差异可忽略不计。基于这些观察,我们得出两个主要结论:1)IWG的早期和后期合成世代大多相似,差异最小;2)建议使用20种基因型来创建合成种群,以在所有合成世代中维持充足的遗传变异和性状表达。
    Intermediate wheatgrass (IWG) is a perennial grass that produces nutritious grain while offering substantial ecosystem services. Commercial varieties of this crop are mostly synthetic panmictic populations that are developed by intermating a few selected individuals. As development and generation advancement of these synthetic populations is a multi-year process, earlier synthetic generations are tested by the breeders and subsequent generations are released to the growers. A comparison of generations within IWG synthetic cultivars is currently lacking. In this study, we used simulation models and genomic prediction to analyze population differences and trends of genetic variance in four synthetic generations of MN-Clearwater, a commercial cultivar released by the University of Minnesota. Little to no differences were observed among the four generations for population genetic, genetic kinship, and genome-wide marker relationships measured via linkage disequilibrium. A reduction in genetic variance was observed when 7 parents were used to generate synthetic populations while using 20 led to the best possible outcome in determining population variance. Genomic prediction of plant height, free threshing ability, seed mass, and grain yield among the four synthetic generations showed a few significant differences among the generations yet the difference in values were negligible. Based on these observations, we make two major conclusions: 1) The earlier and latter synthetic generations of IWG are mostly similar to each other with minimal differences; and 2) Using 20 genotypes to create synthetic populations is recommended to sustain ample genetic variance and trait expression among all synthetic generations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在植物育种中,我们的目标通常是同时改善多个特征。然而,不知道每个特征的经济价值,很难决定关注哪些特征。这就是“所需增益选择指数”派上用场的地方,当经济权重不可用时,根据育种者对期望改进的优先顺序,可以在每个性状中产生最佳收益。然而,它们缺乏最大化选择响应和确定指数与净遗传价值之间相关性的能力。
    这里,我们报告了一种迭代的期望增益选择指数方法的开发,该方法优化了期望增益值的采样,以实现针对多个特征的目标或用户指定的选择响应。对于一个子集或所有研究的性状,该靶向选择响应可以是受约束的或不受约束的。
    我们使用基因组估计育种值(GEBV)对包含3,331个品系的面包小麦(Triticumaestivum)参考育种群体中的七个性状进行了测试,并在七个性状中实现了0.29至0.47的预测准确性。使用3,005个双单倍体品系验证了这些指数,所述双单倍体品系来源于选自参考群体的亲本之间的杂交。我们测试了三种用户指定的响应场景:约束等权重(INDEX1),约束产量优势权重(INDEX2),和一个不受约束的重量(INDEX3)。当约束一组特征时,我们的方法实现了对用户指定选择响应的等效响应,并且该响应比传统的无需迭代的期望增益选择指数方法的响应要好得多。有趣的是,当使用不受约束的重量时,我们的迭代方法最大化了选择响应,并将选择候选的平均GEBV向所需方向移动。
    我们的结果表明,该方法不仅是在经济权重不可用时的最佳选择,但当限制选择反应是一个不利的选择。
    UNASSIGNED: In plant breeding, we often aim to improve multiple traits at once. However, without knowing the economic value of each trait, it is hard to decide which traits to focus on. This is where \"desired gain selection indices\" come in handy, which can yield optimal gains in each trait based on the breeder\'s prioritisation of desired improvements when economic weights are not available. However, they lack the ability to maximise the selection response and determine the correlation between the index and net genetic merit.
    UNASSIGNED: Here, we report the development of an iterative desired gain selection index method that optimises the sampling of the desired gain values to achieve a targeted or a user-specified selection response for multiple traits. This targeted selection response can be constrained or unconstrained for either a subset or all the studied traits.
    UNASSIGNED: We tested the method using genomic estimated breeding values (GEBVs) for seven traits in a bread wheat (Triticum aestivum) reference breeding population comprising 3,331 lines and achieved prediction accuracies ranging between 0.29 and 0.47 across the seven traits. The indices were validated using 3,005 double haploid lines that were derived from crosses between parents selected from the reference population. We tested three user-specified response scenarios: a constrained equal weight (INDEX1), a constrained yield dominant weight (INDEX2), and an unconstrained weight (INDEX3). Our method achieved an equivalent response to the user-specified selection response when constraining a set of traits, and this response was much better than the response of the traditional desired gain selection indices method without iteration. Interestingly, when using unconstrained weight, our iterative method maximised the selection response and shifted the average GEBVs of the selection candidates towards the desired direction.
    UNASSIGNED: Our results show that the method is an optimal choice not only when economic weights are unavailable, but also when constraining the selection response is an unfavourable option.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在现代育种实践中,基因组预测(GP)使用高密度单核苷酸多态性(SNP)标记来预测关键表型的基因组估计育种值(GEBV),从而加快选择育种过程,缩短世代间隔。然而,由于基因型数据的特征通常比SNP标记的样本数少得多,过拟合通常出现在模型训练期间。为了解决这个问题,本研究建立在最小二乘双支持向量回归(LSTSVR)模型的基础上,结合了一个名为ILSTSVR的Lasso正则化项。由于不同数据集的参数调整的复杂性,进一步引入基于减法平均值的优化器(SABO)来优化ILSTSVR,然后获取名为SABO-ILSTSVR的GP模型。在四个不同作物数据集上进行的实验表明,SABO-ILSTSVR的效率优于或等于广泛使用的基因组预测方法。源代码和数据可在以下网址获得:https://github.com/MLBreeding/SABO-ILSTSVR。
    In modern breeding practices, genomic prediction (GP) uses high-density single nucleotide polymorphisms (SNPs) markers to predict genomic estimated breeding values (GEBVs) for crucial phenotypes, thereby speeding up selection breeding process and shortening generation intervals. However, due to the characteristic of genotype data typically having far fewer sample numbers than SNPs markers, overfitting commonly arise during model training. To address this, the present study builds upon the Least Squares Twin Support Vector Regression (LSTSVR) model by incorporating a Lasso regularization term named ILSTSVR. Because of the complexity of parameter tuning for different datasets, subtraction average based optimizer (SABO) is further introduced to optimize ILSTSVR, and then obtain the GP model named SABO-ILSTSVR. Experiments conducted on four different crop datasets demonstrate that SABO-ILSTSVR outperforms or is equivalent in efficiency to widely-used genomic prediction methods. Source codes and data are available at: https://github.com/MLBreeding/SABO-ILSTSVR.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号