关键词: Combining ability Heterosis Hierarchical clustering K-means Unsupervised clustering

Mesh : Hybrid Vigor Hybridization, Genetic Helianthus / genetics Genotype Plant Breeding Asteraceae Machine Learning

来  源:   DOI:10.1038/s41598-024-58049-z   PDF(Pubmed)

Abstract:
Application of machine learning in plant breeding is a recent concept, that has to be optimized for precise utilization in the breeding program of high yielding crop plants. Identification and efficient utilization of heterotic grouping pattern aided with machine learning approaches is of utmost importance in hybrid cultivar breeding as it can save time and resources required to breed a new plant hybrid/variety. In the present study, 109 genotypes of sunflower were investigated at morphological, biochemical (SDS-PAGE) and molecular levels (through micro-satellites (SSR) markers) for heterotic grouping. All the three datasets were combined, scaled, and subjected to unsupervised machine learning algorithms, i.e., Hierarchical clustering, K-means clustering and hybrid clustering algorithm (hierarchical + K-means) for assessment of efficiency and resolution power of these algorithms in practical plant breeding for heterotic grouping identification. Following the application of machine learning unsupervised clustering approach, two major groups were identified in the studied sunflower germplasm, and further classification revealed six smaller classes in each major group through hierarchical and hybrid clustering approach. Due to high resolution, obtained in hierarchical clustering, classification achieved through this algorithm was further used for selection of potential parents. One genotype from each smaller group was selected based on the maximum seed yield potential and hybridized in a line  ×  tester mating design producing 36 F1 cross combinations. These F1s along with their parents were studied in open field conditions for validating the efficacy of identified heterotic groups in sunflowers genetic material under study. Data for 11 agronomic and qualitative traits were recorded. These 36 F1 combinations were tested for their combining ability (General/Specific), heterosis, genotypic and phenotypic correlation and path analysis. Results suggested that F1 hybrids performed better for all the traits under investigation than their respective parents. Findings of the study validated the use of machine learning approaches in practical plant breeding; however, more accurate and robust clustering algorithms need to be developed to handle the data noisiness of open field experiments.
摘要:
机器学习在植物育种中的应用是一个新概念,这必须进行优化,以便在高产作物的育种计划中精确利用。机器学习方法辅助的杂种优势分组模式的识别和有效利用在杂交品种育种中至关重要,因为它可以节省培育新植物杂种/品种所需的时间和资源。在本研究中,109种向日葵基因型进行了形态学研究,生化(SDS-PAGE)和分子水平(通过微卫星(SSR)标记)进行杂种优势分组。所有这三个数据集都被组合在一起,缩放,并接受无监督的机器学习算法,即,分层聚类,K-means聚类和混合聚类算法(分层K-means)用于评估这些算法在实际植物育种中用于杂种优势分组识别的效率和分辨率。在应用机器学习无监督聚类方法之后,在研究的向日葵种质中确定了两个主要群体,进一步分类显示,通过分层和混合聚类方法,每个主要组中有六个较小的类。由于分辨率高,在分层聚类中获得,通过该算法实现的分类被进一步用于选择潜在的父母。根据最大种子产量潜力从每个较小的组中选择一个基因型,并以品系×测试者交配设计杂交,产生36个F1杂交组合。在野外条件下研究了这些F1及其父母,以验证所研究向日葵遗传物质中已鉴定的杂种优势群的功效。记录了11个农艺和质量性状的数据。测试了这36个F1组合的结合能力(一般/特定),杂种优势,基因型和表型相关和通径分析。结果表明,F1杂种在所研究的所有性状上的表现均优于其各自的亲本。研究结果验证了机器学习方法在实际植物育种中的应用;然而,需要开发更准确和健壮的聚类算法来处理开放现场实验的数据噪声。
公众号