产量预测是基因组选择(GS)辅助作物育种的主要目标。由于产量是一个复杂的数量性状,从基因型数据进行预测是具有挑战性的。迁移学习可以通过利用来自不同,但相关,源域,被认为是一种通过整合多性状数据来提高产量预测的巨大潜在方法。然而,由于缺乏有效的实施框架,该方法之前尚未应用于基因型-表型预测.因此,我们开发了TrG2P,基于迁移学习的框架。TrG2P首先采用卷积神经网络(CNN),使用非产量性状表型和基因型数据来训练模型,从而获得预训练的模型。随后,来自这些预训练模型的卷积层参数被转移到产量预测任务,完全连接的层被重新训练,从而获得微调模型。最后,将微调模型的卷积层和第一个全连接层融合,最后一个完全连接的层被训练以增强预测性能。我们将TrG2P应用于来自玉米(Zeamays)的五组基因型和表型数据,水稻(水稻),和小麦(小麦),并将模型精度与其他七个流行的GS工具进行了比较:rrBLUP,随机森林,支持向量回归,LightGBM,CNN,深度,DNNGP。TrG2P将产量预测精度提高了39.9%,6.8%,在大米中占1.8%,玉米,小麦,分别,与性能最佳的比较模型生成的预测进行比较。因此,我们的工作表明,迁移学习是通过整合非产量性状数据中的信息来改善产量预测的有效策略。我们将增强的预测准确性归因于可从与产量相关的性状中获得的有价值的信息以及训练数据集的增强。TrG2P的Python实现可在https://github.com/lijinlong1991/TrG2P获得。基于Web的工具可在http://trg2p获得。ebreed.cn:81.
Yield prediction is the primary goal of genomic selection (GS)-assisted crop breeding. Because yield is a complex quantitative trait, making predictions from genotypic data is challenging. Transfer learning can produce an effective model for a target task by leveraging knowledge from a different, but related, source domain and is considered a great potential method for improving yield prediction by integrating multi-trait data. However, it has not previously been applied to genotype-to-phenotype prediction owing to the lack of an efficient implementation framework. We therefore developed TrG2P, a transfer-learning-based framework. TrG2P first employs convolutional neural networks (CNN) to train models using non-yield-trait phenotypic and genotypic data, thus obtaining pre-trained models. Subsequently, the convolutional layer parameters from these pre-trained models are transferred to the yield prediction task, and the fully connected layers are retrained, thus obtaining fine-tuned models. Finally, the convolutional layer and the first fully connected layer of the fine-tuned models are fused, and the last fully connected layer is trained to enhance prediction performance. We applied TrG2P to five sets of genotypic and phenotypic data from maize (Zea mays), rice (Oryza sativa), and wheat (Triticum aestivum) and compared its model precision to that of seven other popular GS tools: ridge regression best linear unbiased prediction (rrBLUP), random forest, support vector regression, light gradient boosting machine (LightGBM), CNN, DeepGS, and deep neural network for genomic prediction (DNNGP). TrG2P improved the accuracy of yield prediction by 39.9%, 6.8%, and 1.8% in rice, maize, and wheat, respectively, compared with predictions generated by the best-performing comparison model. Our work therefore demonstrates that transfer learning is an effective strategy for improving yield prediction by integrating information from non-yield-trait data. We attribute its enhanced prediction accuracy to the valuable information available from traits associated with yield and to training dataset augmentation. The Python implementation of TrG2P is available at https://github.com/lijinlong1991/TrG2P. The web-based tool is available at http://trg2p.ebreed.cn:81.