Unsupervised clustering

无监督聚类
  • 文章类型: Journal Article
    本文提出了一种创新的技术,高级电气参数预测器,基于机器学习方法预测电子元器件在辐射作用下的退化。术语退化是指电子部件的电参数随照射剂量变化的方式。此方法由两个连续步骤组成,分别定义为“识别数据库中的降解模式”和“在没有任何辐照的情况下对新样品进行降解预测”。该技术可以在称为“纯数据驱动”和“基于模型”的两种不同方法下使用。在本文中,对于双极晶体管,显示了高级电气参数预测器的使用,但是该方法足够通用,可以应用于任何其他组件。
    This paper presents an innovative technique, Advanced Predictor of Electrical Parameters, based on machine learning methods to predict the degradation of electronic components under the effects of radiation. The term degradation refers to the way in which electrical parameters of the electronic components vary with the irradiation dose. This method consists of two sequential steps defined as \'recognition of degradation patterns in the database\' and \'degradation prediction of new samples without any kind of irradiation\'. The technique can be used under two different approaches called \'pure data driven\' and \'model based\'. In this paper, the use of Advanced Predictor of Electrical Parameters is shown for bipolar transistors, but the methodology is sufficiently general to be applied to any other component.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    对于类风湿性关节炎(RA),长期的慢性疾病,识别和描述具有可比的目标状态和分子生物标志物的患者亚型至关重要.本研究旨在开发和验证一种新的分型方案,该方案整合了RA外周血基因的基因组尺度转录组学图谱,为分层治疗提供了新的视角。
    我们利用RA外周血单核细胞(PBMC)的独立微阵列数据集。对上调的差异表达基因(DEGs)进行功能富集分析。然后采用无监督聚类分析来鉴定RA外周血基因表达驱动的亚型。我们基于识别的404个上调的DEGs定义了三种不同的聚类亚型。
    子类型A,名为NE驾驶,富含与中性粒细胞活化和对细菌反应相关的途径。亚型B,称为干扰素驱动(IFN驱动),表现出丰富的B细胞,并显示参与IFN信号传导和对病毒的防御反应的转录本的表达增加。在亚型C中,发现了CD8+T细胞的富集,最终将其定义为CD8+T细胞驱动。使用XGBoost机器学习算法对RA亚型方案进行了验证。我们还评估了生物疾病缓解抗风湿药物的治疗效果。
    这些发现为深层分层提供了有价值的见解,能够设计分子诊断,并作为未来RA患者分层治疗的参考。
    UNASSIGNED: For Rheumatoid Arthritis (RA), a long-term chronic illness, it is essential to identify and describe patient subtypes with comparable goal status and molecular biomarkers. This study aims to develop and validate a new subtyping scheme that integrates genome-scale transcriptomic profiles of RA peripheral blood genes, providing a fresh perspective for stratified treatments.
    UNASSIGNED: We utilized independent microarray datasets of RA peripheral blood mononuclear cells (PBMCs). Up-regulated differentially expressed genes (DEGs) were subjected to functional enrichment analysis. Unsupervised cluster analysis was then employed to identify RA peripheral blood gene expression-driven subtypes. We defined three distinct clustering subtypes based on the identified 404 up-regulated DEGs.
    UNASSIGNED: Subtype A, named NE-driving, was enriched in pathways related to neutrophil activation and responses to bacteria. Subtype B, termed interferon-driving (IFN-driving), exhibited abundant B cells and showed increased expression of transcripts involved in IFN signaling and defense responses to viruses. In Subtype C, an enrichment of CD8+ T-cells was found, ultimately defining it as CD8+ T-cells-driving. The RA subtyping scheme was validated using the XGBoost machine learning algorithm. We also evaluated the therapeutic outcomes of biological disease-modifying anti-rheumatic drugs.
    UNASSIGNED: The findings provide valuable insights for deep stratification, enabling the design of molecular diagnosis and serving as a reference for stratified therapy in RA patients in the future.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本研究旨在通过对影像组学和转录组学数据的无监督聚类来描绘透明细胞肾细胞癌(ccRCC)固有亚型,并评估其与临床病理特征的关联。预后,和分子特征。
    使用回顾性双中心方法,我们收集了癌症基因组图谱中登记的ccRCC患者的转录组和临床数据,以及癌症成像档案和当地数据库中的对比增强计算机断层扫描图像.在图像分割之后,影像组学特征提取,和功能预处理,我们基于“CancerSubtypes”包执行无监督聚类,以识别不同的放射性转录组学亚型,然后与临床病理相关,预后,免疫,和分子特征。
    聚类确定了三个子类型,C1,C2和C3,每个都显示出独特的临床病理,预后,免疫,和分子区别。值得注意的是,C1和C3亚型的生存结局比C2亚型差.路径分析强调了C1中的免疫途径激活和C2中的代谢途径突出。基因突变分析确定VHL和PBRM1是最常见的突变基因,在C3亚型中观察到更多突变基因。尽管类似的肿瘤突变负担,微卫星不稳定,和跨亚型的RNA干扰,C1和C3表现出更大的肿瘤免疫功能障碍和排斥反应。在验证队列中,各种亚型在临床病理特征和预后方面与训练队列中观察到的结果相当,从而证实了我们算法的有效性。
    基于放射性转录组学的无监督聚类可以识别ccRCC的内在亚型,和放射转录组亚型可以表征肿瘤的预后和分子特征,实现非侵入性肿瘤风险分层。
    UNASSIGNED: This study aimed to delineate the clear cell renal cell carcinoma (ccRCC) intrinsic subtypes through unsupervised clustering of radiomics and transcriptomics data and to evaluate their associations with clinicopathological features, prognosis, and molecular characteristics.
    UNASSIGNED: Using a retrospective dual-center approach, we gathered transcriptomic and clinical data from ccRCC patients registered in The Cancer Genome Atlas and contrast-enhanced computed tomography images from The Cancer Imaging Archive and local databases. Following the segmentation of images, radiomics feature extraction, and feature preprocessing, we performed unsupervised clustering based on the \"CancerSubtypes\" package to identify distinct radiotranscriptomic subtypes, which were then correlated with clinical-pathological, prognostic, immune, and molecular characteristics.
    UNASSIGNED: Clustering identified three subtypes, C1, C2, and C3, each of which displayed unique clinicopathological, prognostic, immune, and molecular distinctions. Notably, subtypes C1 and C3 were associated with poorer survival outcomes than subtype C2. Pathway analysis highlighted immune pathway activation in C1 and metabolic pathway prominence in C2. Gene mutation analysis identified VHL and PBRM1 as the most commonly mutated genes, with more mutated genes observed in the C3 subtype. Despite similar tumor mutation burdens, microsatellite instability, and RNA interference across subtypes, C1 and C3 demonstrated greater tumor immune dysfunction and rejection. In the validation cohort, the various subtypes showed comparable results in terms of clinicopathological features and prognosis to those observed in the training cohort, thus confirming the efficacy of our algorithm.
    UNASSIGNED: Unsupervised clustering based on radiotranscriptomics can identify the intrinsic subtypes of ccRCC, and radiotranscriptomic subtypes can characterize the prognosis and molecular features of tumors, enabling noninvasive tumor risk stratification.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    声乐复杂性是许多关于动物交流的进化假设的核心。然而,量化和比较复杂性仍然是一个挑战,特别是当声乐类型高度分级时。雄性婆罗洲猩猩(Pongopygmaeuswurmbii)会产生复杂而可变的“长叫声”发声,其中包括多种声音类型,这些声音类型在个体内部和个体之间各不相同。先前的研究描述了这些复杂发声中的六种不同的呼叫(或脉冲)类型,但是没有人量化它们的离散性或人类观察者对它们进行可靠分类的能力。我们研究了13个人的长电话:(1)评估和量化三个训练有素的观察者的视听分类的可靠性,(2)使用监督分类和无监督聚类区分调用类型,(3)比较不同特征集的性能。使用46个声学特征,我们使用了机器学习(即,支持向量机,亲和繁殖,和模糊c均值)来识别呼叫类型并评估其离散性。我们还使用均匀流形近似和投影(UMAP)使用提取的特征和频谱图表示来可视化脉冲的分离。监督方法显示观察者间可靠性低,分类精度差,表明脉冲类型不是离散的。我们提出了一种更新的脉冲分类方法,该方法在观察者之间具有很高的可重复性,并且使用支持向量机具有很强的分类准确性。尽管呼叫类型的数量较少表明长呼叫相当简单,声音的连续渐变似乎大大提升了这个系统的复杂性。这项工作响应了进行更多定量研究以定义呼叫类型并量化动物声乐系统中的分级性的呼吁,并强调了需要一个更全面的框架来研究相对于分级曲目的声乐复杂性。
    Vocal complexity is central to many evolutionary hypotheses about animal communication. Yet, quantifying and comparing complexity remains a challenge, particularly when vocal types are highly graded. Male Bornean orangutans (Pongo pygmaeus wurmbii) produce complex and variable \"long call\" vocalizations comprising multiple sound types that vary within and among individuals. Previous studies described six distinct call (or pulse) types within these complex vocalizations, but none quantified their discreteness or the ability of human observers to reliably classify them. We studied the long calls of 13 individuals to: (1) evaluate and quantify the reliability of audio-visual classification by three well-trained observers, (2) distinguish among call types using supervised classification and unsupervised clustering, and (3) compare the performance of different feature sets. Using 46 acoustic features, we used machine learning (i.e., support vector machines, affinity propagation, and fuzzy c-means) to identify call types and assess their discreteness. We additionally used Uniform Manifold Approximation and Projection (UMAP) to visualize the separation of pulses using both extracted features and spectrogram representations. Supervised approaches showed low inter-observer reliability and poor classification accuracy, indicating that pulse types were not discrete. We propose an updated pulse classification approach that is highly reproducible across observers and exhibits strong classification accuracy using support vector machines. Although the low number of call types suggests long calls are fairly simple, the continuous gradation of sounds seems to greatly boost the complexity of this system. This work responds to calls for more quantitative research to define call types and quantify gradedness in animal vocal systems and highlights the need for a more comprehensive framework for studying vocal complexity vis-à-vis graded repertoires.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:结直肠癌(CRC)的预后与自然杀伤(NK)细胞有关,但是基于NK细胞的CRC的分子亚型特征仍然未知。本研究旨在鉴定NK细胞相关分子亚型,分析不同亚型CRC患者的生存状况和免疫状况。
    方法:mRNA表达数据,单核苷酸变异(SNV)数据,CRC患者的临床信息来自癌症基因组图谱。通过差异分析获得差异表达基因(DEGs),并与NK细胞相关基因相交,获得103个NK细胞相关CRCDEGs(NCDEGs)。基于NCDEG,通过无监督聚类分析将CRC样本分为三个簇。生存分析,免疫分析,基因集富集分析(GSEA),和肿瘤突变负荷(TMB)分析。最后,使用CMap数据库筛选NCDEG相关小分子药物。
    结果:生存分析显示,Cluster2的生存率低于Cluster1和Cluster3(p<0.05)。免疫浸润分析发现,Cluster1_3的免疫浸润水平和免疫检查点表达水平明显高于Cluster2,肿瘤纯度相反(p<0.05)。GSEA表明,Cluster1_3在趋化因子信号通路中显著富集,ECM受体相互作用,以及抗原加工和呈递途径(p<0.05)。簇1_3的TMB显著高于簇2(p<0.05)。CRC中突变率最高的基因是APC,TP53,TTN,还有KRAS.药物预测结果表明,逆转NCDEGs上调的小分子药物,脱氧胆酸,Dipivefrine,苯乙双胍,其他药物可能改善CRC的预后。
    结论:NK细胞相关亚型可用于评估CRC患者的肿瘤特征,为CRC患者提供重要参考。
    OBJECTIVE: The prognosis of colorectal cancer (CRC) is related to natural killer (NK) cells, but the molecular subtype features of CRC based on NK cells are still unknown. This study aimed to identify NK cell-related molecular subtypes of CRC and analyze the survival status and immune landscape of patients with different subtypes.
    METHODS: mRNA expression data, single nucleotide variant (SNV) data, and clinical information of CRC patients were obtained from The Cancer Genome Atlas. Differentially expressed genes (DEGs) were obtained through differential analysis, and the intersection was taken with NK cell-associated genes to obtain 103 NK cell-associated CRC DEGs (NCDEGs). Based on NCDEGs, CRC samples were divided into three clusters through unsupervised clustering analysis. Survival analysis, immune analysis, Gene Set Enrichment Analysis (GSEA), and tumor mutation burden (TMB) analysis were performed. Finally, NCDEG-related small-molecule drugs were screened using the CMap database.
    RESULTS: Survival analysis revealed that cluster2 had a lower survival rate than cluster1 and cluster3 (p < 0.05). Immune infiltration analysis found that the immune infiltration levels and immune checkpoint expression levels of cluster1_3 were substantially higher than those of cluster2, and the tumor purity was the opposite (p < 0.05). GSEA presented that cluster1_3 was significantly enriched in the chemokine signaling pathway, ECM receptor interaction, and antigen processing and presentation pathways (p < 0.05). The TMB of cluster1_3 was significantly higher than that of cluster2 (p < 0.05). Genes with the highest mutation rate in CRC were APC, TP53, TTN, and KRAS. Drug prediction results showed that small-molecule drugs that reverse the upregulation of NCDEGs, deoxycholic acid, dipivefrine, phenformin, and other drugs may improve the prognosis of CRC.
    CONCLUSIONS: NK cell-associated CRC subtypes can be used to evaluate the tumor characteristics of CRC patients and provide an important reference for CRC patients.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在本文中,我们提出了一种多任务学习(MTL)网络,基于标签级融合元数据和手工制作的特征,通过无监督聚类生成新的聚类标签作为优化目标。我们提出了一个MTL模块(MTLM),它包含了一种注意力机制,使模型能够学习更多的集成,可变信息。我们提出了一种动态策略来调整不同任务的损失权重,并权衡多个分支机构的贡献。而不是特征级融合,我们提出了标签级融合,并将我们提出的MTLM的结果与图像分类网络的结果相结合,以在多个皮肤病学数据集上实现更好的病变预测。我们通过定量和定性措施验证了该模型的有效性。使用多模态线索和标签级融合的MTL网络可以为皮肤病变分类产生显著的性能改进。
    In this paper, we propose a multi-task learning (MTL) network based on the label-level fusion of metadata and hand-crafted features by unsupervised clustering to generate new clustering labels as an optimization goal. We propose a MTL module (MTLM) that incorporates an attention mechanism to enable the model to learn more integrated, variable information. We propose a dynamic strategy to adjust the loss weights of different tasks, and trade off the contributions of multiple branches. Instead of feature-level fusion, we propose label-level fusion and combine the results of our proposed MTLM with the results of the image classification network to achieve better lesion prediction on multiple dermatological datasets. We verify the effectiveness of the proposed model by quantitative and qualitative measures. The MTL network using multi-modal clues and label-level fusion can yield the significant performance improvement for skin lesion classification.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    巨噬细胞,作为肿瘤免疫微环境(TIME)的重要组成部分,可以促进许多癌症的生长和侵袭。然而,巨噬细胞在肿瘤微环境(TME)和免疫疗法在PCa中的作用目前还未被研究.这里,我们研究了巨噬细胞相关基因在分子分层中的作用,预后,TME,和PCa的免疫治疗反应。公共数据库提供单细胞RNA测序(scRNA-seq)和大量RNAseq数据。使用SeuratR包,处理scRNA-seq数据并自动和手动鉴定巨噬细胞簇。使用CellChatR包,细胞间通讯分析显示,肿瘤相关巨噬细胞(TAM)主要通过MIF-(CD74CXCR4)和MIF-(CD74CD44)配体-受体对与PCaTME中的其他细胞相互作用。我们使用WGCNA构建巨噬细胞的共表达网络以鉴定巨噬细胞相关基因。使用R包ConsensusClusterPlus,无监督层次聚类分析确定了两种不同的巨噬细胞相关亚型,它们具有显著不同的通路激活状态,TIME,和免疫治疗功效。接下来,通过LASSOCox回归分析与10倍交叉验证建立了8基因巨噬细胞相关风险特征(MRS),MRS的性能在8个外部PCa队列中得到验证。高危人群有更活跃的免疫相关功能,更多浸润的免疫细胞,较高的HLA和免疫检查点基因表达,更高的免疫评分,和较低的潮汐分数。最后,NCF4基因已使用“mgeneSim”功能被鉴定为MRS中的hub基因。
    Macrophages, as essential components of the tumor immune microenvironment (TIME), could promote growth and invasion in many cancers. However, the role of macrophages in tumor microenvironment (TME) and immunotherapy in PCa is largely unexplored at present. Here, we investigated the roles of macrophage-related genes in molecular stratification, prognosis, TME, and immunotherapeutic response in PCa. Public databases provided single-cell RNA sequencing (scRNA-seq) and bulk RNAseq data. Using the Seurat R package, scRNA-seq data was processed and macrophage clusters were identified automatically and manually. Using the CellChat R package, intercellular communication analysis revealed that tumor-associated macrophages (TAMs) interact with other cells in the PCa TME primarily through MIF - (CD74+CXCR4) and MIF - (CD74+CD44) ligand-receptor pairs. We constructed coexpression networks of macrophages using the WGCNA to identify macrophage-related genes. Using the R package ConsensusClusterPlus, unsupervised hierarchical clustering analysis identified two distinct macrophage-associated subtypes, which have significantly different pathway activation status, TIME, and immunotherapeutic efficacy. Next, an 8-gene macrophage-related risk signature (MRS) was established through the LASSO Cox regression analysis with 10-fold cross-validation, and the performance of the MRS was validated in eight external PCa cohorts. The high-risk group had more active immune-related functions, more infiltrating immune cells, higher HLA and immune checkpoint gene expression, higher immune scores, and lower TIDE scores. Finally, the NCF4 gene has been identified as the hub gene in MRS using the \"mgeneSim\" function.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    基于密度的聚类方法由于其识别异常值的能力而被认为是无监督聚类技术中的一种鲁棒方法。形成不规则形状的簇,并自动确定簇的数量。这些独特的特性帮助了它的开创性算法,基于密度的噪声应用空间聚类(DBSCAN),变得适用于数据集,其中可以检测到不同形状和大小的各种数量的集群,而不会受到用户的太多干扰。然而,原始算法表现出局限性,尤其是对其用户输入参数minPts和♪的敏感性。此外,该算法将不一致的聚类标签分配给在单独聚类的重叠密度区域中发现的数据对象,从而降低其准确性。为了缓解这些特定问题并提高聚类的准确性,我们提出了两种方法,使用来自给定数据集的k-最近邻密度分布的统计数据来确定最优σ值。我们的方法减轻了用户的负担,并自动检测给定数据集的集群。此外,为了解决原算法的不可预测性,提出并实现了一种识别单独聚类的精确边界对象的方法。最后,在我们的实验中,我们表明,与早期方法相比,我们有效地重新实现了对原始算法的自动聚类数据集并提高了相邻聚类成员的聚类质量,从而提高了聚类的准确性和更快的运行时间。
    The density-based clustering method is considered a robust approach in unsupervised clustering technique due to its ability to identify outliers, form clusters of irregular shapes and automatically determine the number of clusters. These unique properties helped its pioneering algorithm, the Density-based Spatial Clustering on Applications with Noise (DBSCAN), become applicable in datasets where various number of clusters of different shapes and sizes could be detected without much interference from the user. However, the original algorithm exhibits limitations, especially towards its sensitivity on its user input parameters minPts and ɛ. Additionally, the algorithm assigned inconsistent cluster labels to data objects found in overlapping density regions of separate clusters, hence lowering its accuracy. To alleviate these specific problems and increase the clustering accuracy, we propose two methods that use the statistical data from a given dataset\'s k-nearest neighbor density distribution in order to determine the optimal ɛ values. Our approach removes the burden on the users, and automatically detects the clusters of a given dataset. Furthermore, a method to identify the accurate border objects of separate clusters is proposed and implemented to solve the unpredictability of the original algorithm. Finally, in our experiments, we show that our efficient re-implementation of the original algorithm to automatically cluster datasets and improve the clustering quality of adjoining cluster members provides increase in clustering accuracy and faster running times when compared to earlier approaches.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本研究探讨了粘膜相关不变T(MAIT)细胞和Vα7.2+/CD161-T细胞在皮肤病中的作用,专注于特应性皮炎。MAIT细胞,对于桥接先天免疫和适应性免疫至关重要,与来自14名特应性皮炎患者和10名健康对照的外周血样本中的Vα7.2+/CD161-T细胞一起分析。采用流式细胞术和机器学习算法进行综合分析。结果表明,特应性皮炎中MAIT细胞和CD69亚群的显着减少,结合升高的CD38和产生TNFα和颗粒酶B(TNFα/GzB)的多功能MAIT细胞。特应性皮炎中的Vα7.2/CD161-T细胞显示CD8和IFNγ产生亚群减少,但CD38激活和IL-22产生亚群增加。这些结果突出了MAIT细胞和Vα7.2+/CD161-T细胞的独特特征及其在特应性皮炎发病机理中的不同作用,并为其在免疫介导的皮肤病中的潜在作用提供了见解。
    This study investigates the roles of mucosal-associated invariant T (MAIT) cells and Vα7.2+/CD161- T cells in skin diseases, focusing on atopic dermatitis. MAIT cells, crucial for bridging innate and adaptive immunity, were analyzed alongside Vα7.2+/CD161- T cells in peripheral blood samples from 14 atopic dermatitis patients and 10 healthy controls. Flow cytometry and machine learning algorithms were employed for a comprehensive analysis. The results indicate a significant decrease in MAIT cells and CD69 subsets in atopic dermatitis, coupled with elevated CD38 and polyfunctional MAIT cells producing TNFα and Granzyme B (TNFα+/GzB+). Vα7.2+/CD161- T cells in atopic dermatitis exhibited a decrease in CD8 and IFNγ-producing subsets but an increase in CD38 activated and IL-22-producing subsets. These results highlight the distinctive features of MAIT cells and Vα7.2+/CD161- T cells and their different roles in the pathogenesis of atopic dermatitis and provide insights into their potential roles in immune-mediated skin diseases.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    机器学习在植物育种中的应用是一个新概念,这必须进行优化,以便在高产作物的育种计划中精确利用。机器学习方法辅助的杂种优势分组模式的识别和有效利用在杂交品种育种中至关重要,因为它可以节省培育新植物杂种/品种所需的时间和资源。在本研究中,109种向日葵基因型进行了形态学研究,生化(SDS-PAGE)和分子水平(通过微卫星(SSR)标记)进行杂种优势分组。所有这三个数据集都被组合在一起,缩放,并接受无监督的机器学习算法,即,分层聚类,K-means聚类和混合聚类算法(分层K-means)用于评估这些算法在实际植物育种中用于杂种优势分组识别的效率和分辨率。在应用机器学习无监督聚类方法之后,在研究的向日葵种质中确定了两个主要群体,进一步分类显示,通过分层和混合聚类方法,每个主要组中有六个较小的类。由于分辨率高,在分层聚类中获得,通过该算法实现的分类被进一步用于选择潜在的父母。根据最大种子产量潜力从每个较小的组中选择一个基因型,并以品系×测试者交配设计杂交,产生36个F1杂交组合。在野外条件下研究了这些F1及其父母,以验证所研究向日葵遗传物质中已鉴定的杂种优势群的功效。记录了11个农艺和质量性状的数据。测试了这36个F1组合的结合能力(一般/特定),杂种优势,基因型和表型相关和通径分析。结果表明,F1杂种在所研究的所有性状上的表现均优于其各自的亲本。研究结果验证了机器学习方法在实际植物育种中的应用;然而,需要开发更准确和健壮的聚类算法来处理开放现场实验的数据噪声。
    Application of machine learning in plant breeding is a recent concept, that has to be optimized for precise utilization in the breeding program of high yielding crop plants. Identification and efficient utilization of heterotic grouping pattern aided with machine learning approaches is of utmost importance in hybrid cultivar breeding as it can save time and resources required to breed a new plant hybrid/variety. In the present study, 109 genotypes of sunflower were investigated at morphological, biochemical (SDS-PAGE) and molecular levels (through micro-satellites (SSR) markers) for heterotic grouping. All the three datasets were combined, scaled, and subjected to unsupervised machine learning algorithms, i.e., Hierarchical clustering, K-means clustering and hybrid clustering algorithm (hierarchical + K-means) for assessment of efficiency and resolution power of these algorithms in practical plant breeding for heterotic grouping identification. Following the application of machine learning unsupervised clustering approach, two major groups were identified in the studied sunflower germplasm, and further classification revealed six smaller classes in each major group through hierarchical and hybrid clustering approach. Due to high resolution, obtained in hierarchical clustering, classification achieved through this algorithm was further used for selection of potential parents. One genotype from each smaller group was selected based on the maximum seed yield potential and hybridized in a line  ×  tester mating design producing 36 F1 cross combinations. These F1s along with their parents were studied in open field conditions for validating the efficacy of identified heterotic groups in sunflowers genetic material under study. Data for 11 agronomic and qualitative traits were recorded. These 36 F1 combinations were tested for their combining ability (General/Specific), heterosis, genotypic and phenotypic correlation and path analysis. Results suggested that F1 hybrids performed better for all the traits under investigation than their respective parents. Findings of the study validated the use of machine learning approaches in practical plant breeding; however, more accurate and robust clustering algorithms need to be developed to handle the data noisiness of open field experiments.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号