Semi-supervised learning

半监督学习
  • 文章类型: Journal Article
    半监督医学图像分割(SSMIS)通过利用有限的标记数据和丰富的未标记数据而取得了重大进展。然而,现有的最先进的(SOTA)方法在准确预测未标记数据的标签时遇到了挑战,在训练过程中会产生破坏性噪音,并且容易受到错误信息的过度拟合。此外,将扰动应用于不准确的预测进一步阻碍了一致性学习。为了解决这些问题,我们提出了一种新的十字头相互平均教学网络(CMMT-Net),结合了弱-强数据增强,从而有利于共同培训和一致性学习。更具体地说,我们的CMMT-Net通过引入两个辅助平均教师模型来扩展十字头共同培训范式,产生更准确的预测并提供补充监督。利用一个平均老师生成的弱增强样本得出的预测来指导另一个具有强增强样本的学生的训练。此外,在像素和区域级别引入了两个不同但协同的数据扰动。我们提出了相互虚拟对抗训练(MDAT)来平滑决策边界并增强特征表示,和交叉集CutMix策略,以生成更多样化的训练样本,用于捕获固有的结构数据信息。值得注意的是,CMMT-Net同时实现数据,功能,和网络扰动,放大模型多样性和泛化性能。在三个公开可用数据集上的实验结果表明,我们的方法在各种半监督场景中都比以前的SOTA方法有了显着改进。该代码可在https://github.com/Leesoon1984/CMMT-Net上获得。
    Semi-supervised medical image segmentation (SSMIS) has witnessed substantial advancements by leveraging limited labeled data and abundant unlabeled data. Nevertheless, existing state-of-the-art (SOTA) methods encounter challenges in accurately predicting labels for the unlabeled data, giving rise to disruptive noise during training and susceptibility to erroneous information overfitting. Moreover, applying perturbations to inaccurate predictions further impedes consistent learning. To address these concerns, we propose a novel cross-head mutual mean-teaching network (CMMT-Net) incorporated weak-strong data augmentations, thereby benefiting both co-training and consistency learning. More concretely, our CMMT-Net extends the cross-head co-training paradigm by introducing two auxiliary mean teacher models, which yield more accurate predictions and provide supplementary supervision. The predictions derived from weakly augmented samples generated by one mean teacher are leveraged to guide the training of another student with strongly augmented samples. Furthermore, two distinct yet synergistic data perturbations at the pixel and region levels are introduced. We propose mutual virtual adversarial training (MVAT) to smooth the decision boundary and enhance feature representations, and a cross-set CutMix strategy to generate more diverse training samples for capturing inherent structural data information. Notably, CMMT-Net simultaneously implements data, feature, and network perturbations, amplifying model diversity and generalization performance. Experimental results on three publicly available datasets indicate that our approach yields remarkable improvements over previous SOTA methods across various semi-supervised scenarios. The code is available at https://github.com/Leesoon1984/CMMT-Net.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    非侵入式负载监控(NILM)可以获得用户内各个设备的细粒度功耗信息,而无需安装额外的硬件传感器。随着深度学习模型的快速发展,许多方法已被用来解决NILM问题,并实现了增强的设备识别性能。然而,监督学习模型需要大量注释数据才能有效运行,这很耗时,辛苦,并且很难在真实场景中实现。在本文中,我们提出了一种新颖的半监督学习方法,该方法结合了一致性正则化和伪标签,以帮助识别具有有限标记数据和大量未标记数据的设备。此外,鉴于各种设备类别的不同学习困难,例如,对于多态电器来说,特征学习比双态电器更困难,例如,在每个时间步以灵活的方式调整用于不同器具的阈值,使得可以递送信息性的未标记数据和它们的伪标签。已经在公开可用的数据集上进行了实验,结果表明,与尖端方法相比,该方法具有更高的设备识别性能。
    Non-intrusive load monitoring (NILM) can obtain fine-grained power consumption information for individual appliances within the user without installing additional hardware sensors. With the rapid development of the deep learning model, many methods have been utilized to address NILM problems and have achieved enhanced appliance identification performance. However, supervised learning models require a substantial volume of annotated data to function effectively, which is time-consuming, laborious, and difficult to implement in real scenarios. In this paper, we propose a novel semi-supervised learning method that combines consistency regularization and pseudo-labels to help identification of appliances with limited labeled data and an abundance of unlabeled data. In addition, given the different learning difficulties of various appliance categories, for example, feature learning is more difficult for multi-state appliances than two-state appliances, the thresholds employed for different appliances are adjusted in a flexible way at each time step so that the informative unlabeled data and their pseudo-labels can be delivered. Experiments have been conducted on publicly available datasets, and the results indicate that the proposed method attains superior appliance identification performance compared to cutting-edge methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    电子健康记录(EHR)在塑造预测模型中起着至关重要的作用,然而,他们遇到了巨大的数据差距和阶级失衡等挑战。传统的图神经网络(GNN)方法在充分利用邻域数据或要求密集的正则化计算要求方面存在局限性。为了应对这一挑战,我们介绍CliqueFluxNet,一种新颖的框架,创新地构建患者相似性图以最大化集团,从而突出了强大的患者间的联系。CliqueFluxNet的核心在于其随机边缘通量策略-一个动态过程,涉及训练过程中的随机边缘添加和删除。该策略旨在增强模型的可泛化性并减轻过拟合。我们的实证分析,在MIMIC-III和eICU数据集上进行,重点是死亡率和再入院预测的任务。它展示了表征学习的重大进展,特别是在数据可用性有限的情况下。定性评估进一步强调了CliqueFluxNet在提取有意义的EHR表示方面的有效性,巩固其在医疗保健分析中推进GNN应用的潜力。
    Electronic Health Records (EHRs) play a crucial role in shaping predictive are models, yet they encounter challenges such as significant data gaps and class imbalances. Traditional Graph Neural Network (GNN) approaches have limitations in fully leveraging neighbourhood data or demanding intensive computational requirements for regularisation. To address this challenge, we introduce CliqueFluxNet, a novel framework that innovatively constructs a patient similarity graph to maximise cliques, thereby highlighting strong inter-patient connections. At the heart of CliqueFluxNet lies its stochastic edge fluxing strategy - a dynamic process involving random edge addition and removal during training. This strategy aims to enhance the model\'s generalisability and mitigate overfitting. Our empirical analysis, conducted on MIMIC-III and eICU datasets, focuses on the tasks of mortality and readmission prediction. It demonstrates significant progress in representation learning, particularly in scenarios with limited data availability. Qualitative assessments further underscore CliqueFluxNet\'s effectiveness in extracting meaningful EHR representations, solidifying its potential for advancing GNN applications in healthcare analytics.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    三维头影测量分析在颅颌面评估中至关重要,其中颅颌面(CMF)CT扫描中的标志检测是关键组成部分。然而,为此任务创建强大的深度学习模型通常需要由经验丰富的医疗专业人员注释的大量CMFCT数据集,一个耗时耗力的过程。相反,获取大量未标记的CMFCT数据相对简单。因此,半监督学习(SSL)利用有限的标记数据,辅以足够的未标记数据集,可能是解决这一挑战的可行方案。
    我们开发了一个SSL模型,名为CephaloMatch,基于强-弱扰动一致性框架。所提出的SSL模型通过粗略检测结合了头部位置校正技术,以增强标记和未标记数据集之间的一致性,以及用于扩展扰动空间的多层扰动方法。使用362次CMFCT扫描对所提出的SSL模型进行了评估,分成一个训练集(60次扫描),验证集(14次扫描),和一个未标记的集合(288次扫描)。
    所提出的SSL模型获得了1.60±0.87mm的检测误差,显着超过传统的全监督学习模型的性能(1.94±1.12mm)。值得注意的是,所提出的SSL模型仅使用标记数据集的一半即可实现等效的检测精度(1.91±1.00mm),与完全监督学习模型相比。
    所提出的SSL模型在使用有限的标记CMFCT数据集的地标检测中展示了出色的性能,显着减少医疗专业人员的工作量,并提高3D头影测量分析的准确性。
    UNASSIGNED: Three-dimensional cephalometric analysis is crucial in craniomaxillofacial assessment, with landmarks detection in craniomaxillofacial (CMF) CT scans being a key component. However, creating robust deep learning models for this task typically requires extensive CMF CT datasets annotated by experienced medical professionals, a process that is time-consuming and labor-intensive. Conversely, acquiring large volume of unlabeled CMF CT data is relatively straightforward. Thus, semi-supervised learning (SSL), leveraging limited labeled data supplemented by sufficient unlabeled dataset, could be a viable solution to this challenge.
    UNASSIGNED: We developed an SSL model, named CephaloMatch, based on a strong-weak perturbation consistency framework. The proposed SSL model incorporates a head position rectification technique through coarse detection to enhance consistency between labeled and unlabeled datasets and a multilayers perturbation method which is employed to expand the perturbation space. The proposed SSL model was assessed using 362 CMF CT scans, divided into a training set (60 scans), a validation set (14 scans), and an unlabeled set (288 scans).
    UNASSIGNED: The proposed SSL model attained a detection error of 1.60 ± 0.87 mm, significantly surpassing the performance of conventional fully supervised learning model (1.94 ± 1.12 mm). Notably, the proposed SSL model achieved equivalent detection accuracy (1.91 ± 1.00 mm) with only half the labeled dataset, compared to the fully supervised learning model.
    UNASSIGNED: The proposed SSL model demonstrated exceptional performance in landmarks detection using a limited labeled CMF CT dataset, significantly reducing the workload of medical professionals and enhances the accuracy of 3D cephalometric analysis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    基于雷达信号的人体动作识别(HAR)技术因其出色的隐私保护功能而受到工业界和学术界的广泛关注,非接触传感特性,对照明条件不敏感。然而,精确标记的人体雷达数据的稀缺对满足基于深度模型的HAR技术所需的大规模训练数据集的需求提出了重大挑战,从而严重阻碍了这一领域的技术进步。为了解决这个问题,半监督学习算法,MF-Match,是本文提出的。该算法计算大规模无监督雷达数据的伪标签,使模型能够提取嵌入的人类行为信息,提高HAR算法的准确性。此外,该方法结合了对比学习原理,以提高模型生成的伪标签的质量,并减轻错误标记的伪标签对识别性能的影响。实验结果表明,该方法在两个广泛使用的雷达频谱数据集上的动作识别准确率分别为86.69%和91.48%,分别,仅利用10%的标记数据,从而验证了所提出方法的有效性。
    Human action recognition (HAR) technology based on radar signals has garnered significant attention from both industry and academia due to its exceptional privacy-preserving capabilities, noncontact sensing characteristics, and insensitivity to lighting conditions. However, the scarcity of accurately labeled human radar data poses a significant challenge in meeting the demand for large-scale training datasets required by deep model-based HAR technology, thus substantially impeding technological advancements in this field. To address this issue, a semi-supervised learning algorithm, MF-Match, is proposed in this paper. This algorithm computes pseudo-labels for larger-scale unsupervised radar data, enabling the model to extract embedded human behavioral information and enhance the accuracy of HAR algorithms. Furthermore, the method incorporates contrastive learning principles to improve the quality of model-generated pseudo-labels and mitigate the impact of mislabeled pseudo-labels on recognition performance. Experimental results demonstrate that this method achieves action recognition accuracies of 86.69% and 91.48% on two widely used radar spectrum datasets, respectively, utilizing only 10% labeled data, thereby validating the effectiveness of the proposed approach.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    合成数据集具有为临床数据提供具有成本效益的替代方案的潜力,确保隐私保护并可能解决临床数据中的偏见。我们提出了一种利用此类数据集来训练作为计算机辅助检测(CADe)系统一部分应用的机器学习算法的方法。
    我们提出的方法利用临床获得的计算机断层摄影(CT)扫描物理拟人模型,其中插入制造的病变来训练机器学习算法。我们将从拟人化体模获得的训练数据库视为临床数据的简化表示,并使用一组随机和参数化的增强来增加该数据集中的变异性。此外,为了减轻幻影和临床数据集之间的固有差异,我们调查了将未标记的临床数据添加到训练管道中.
    我们将我们提出的方法应用于CT扫描中肺结节CADe系统的假阳性减少阶段,其中包含潜在病变的感兴趣区域被分类为结节或非结节区域。实验结果证明了所提出方法的有效性;在来自物理体模扫描的标记数据和未标记的临床数据上训练的系统在每次扫描八个假阳性时实现了90%的灵敏度。此外,实验结果证明了物理体模的益处,其中当通过从物理体模获得的扫描扩大由50次临床CT扫描组成的训练集时,在竞争性能度量方面的性能提高了6%.
    合成数据集的可扩展性可以提高CADe性能,特别是在标记的临床数据大小有限或存在固有偏差的情况下。我们提出的方法证明了有效利用合成数据集来训练机器学习算法。
    UNASSIGNED: Synthetic datasets hold the potential to offer cost-effective alternatives to clinical data, ensuring privacy protections and potentially addressing biases in clinical data. We present a method leveraging such datasets to train a machine learning algorithm applied as part of a computer-aided detection (CADe) system.
    UNASSIGNED: Our proposed approach utilizes clinically acquired computed tomography (CT) scans of a physical anthropomorphic phantom into which manufactured lesions were inserted to train a machine learning algorithm. We treated the training database obtained from the anthropomorphic phantom as a simplified representation of clinical data and increased the variability in this dataset using a set of randomized and parameterized augmentations. Furthermore, to mitigate the inherent differences between phantom and clinical datasets, we investigated adding unlabeled clinical data into the training pipeline.
    UNASSIGNED: We apply our proposed method to the false positive reduction stage of a lung nodule CADe system in CT scans, in which regions of interest containing potential lesions are classified as nodule or non-nodule regions. Experimental results demonstrate the effectiveness of the proposed method; the system trained on labeled data from physical phantom scans and unlabeled clinical data achieves a sensitivity of 90% at eight false positives per scan. Furthermore, the experimental results demonstrate the benefit of the physical phantom in which the performance in terms of competitive performance metric increased by 6% when a training set consisting of 50 clinical CT scans was enlarged by the scans obtained from the physical phantom.
    UNASSIGNED: The scalability of synthetic datasets can lead to improved CADe performance, particularly in scenarios in which the size of the labeled clinical data is limited or subject to inherent bias. Our proposed approach demonstrates an effective utilization of synthetic datasets for training machine learning algorithms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    深度神经网络通常用于自动医学图像分割,但是模型通常很难在不同的成像模式中很好地推广。由于注释数据的可用性有限,这个问题尤其成问题。在目标和源模态中,使得在更大范围内部署这些模型变得困难。为了克服这些挑战,我们提出了一种新的半监督训练策略,称为MoDATTS。我们的方法设计用于在不成对的双模态数据集上进行准确的跨模态3D肿瘤分割。模态之间的图像到图像转换策略用于在期望的模态中产生合成但注释的图像和标签,并改善对未注释的目标模态的概括。我们还为图像翻译(TransUNet)和分割(Medformer)任务使用强大的视觉转换器架构,并在后面的任务中引入迭代的自我训练过程,以进一步缩小模态之间的领域差距。因此也训练目标模态中的未标记图像。MoDATTS还允许利用具有半监督目标的图像级标签的可能性,该目标鼓励模型从背景中解开肿瘤。这种半监督方法特别有助于在源模态数据集中还存在像素级标签稀缺性时保持下游分割性能。或源数据集包含健康控件时。与CrossMoDA2022前庭神经鞘瘤(VS)分割挑战的参与团队的其他方法相比,所提出的模型实现了卓越的性能,正如其报告的VS分割的最高Dice得分为0.87±0.04所证明的那样。在由BraTS2020挑战数据集的四个不同对比组成的跨模态成人脑胶质瘤分割任务中,MoDATTS还在Dice得分上比基线得到了一致的改善。当没有目标模态注释可用时,达到95%的目标监督模型性能。我们报告说,如果额外注释了20%和50%的目标数据,则可以达到99%和100%的最大性能。这进一步证明了MoDATTS可以用来减轻注释负担。
    Deep neural networks are commonly used for automated medical image segmentation, but models will frequently struggle to generalize well across different imaging modalities. This issue is particularly problematic due to the limited availability of annotated data, both in the target as well as the source modality, making it difficult to deploy these models on a larger scale. To overcome these challenges, we propose a new semi-supervised training strategy called MoDATTS. Our approach is designed for accurate cross-modality 3D tumor segmentation on unpaired bi-modal datasets. An image-to-image translation strategy between modalities is used to produce synthetic but annotated images and labels in the desired modality and improve generalization to the unannotated target modality. We also use powerful vision transformer architectures for both image translation (TransUNet) and segmentation (Medformer) tasks and introduce an iterative self-training procedure in the later task to further close the domain gap between modalities, thus also training on unlabeled images in the target modality. MoDATTS additionally allows the possibility to exploit image-level labels with a semi-supervised objective that encourages the model to disentangle tumors from the background. This semi-supervised methodology helps in particular to maintain downstream segmentation performance when pixel-level label scarcity is also present in the source modality dataset, or when the source dataset contains healthy controls. The proposed model achieves superior performance compared to other methods from participating teams in the CrossMoDA 2022 vestibular schwannoma (VS) segmentation challenge, as evidenced by its reported top Dice score of 0.87±0.04 for the VS segmentation. MoDATTS also yields consistent improvements in Dice scores over baselines on a cross-modality adult brain gliomas segmentation task composed of four different contrasts from the BraTS 2020 challenge dataset, where 95% of a target supervised model performance is reached when no target modality annotations are available. We report that 99% and 100% of this maximum performance can be attained if 20% and 50% of the target data is additionally annotated, which further demonstrates that MoDATTS can be leveraged to reduce the annotation burden.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    本文提出了一种设备上半监督的人类活动检测系统,该系统可以实时学习和预测人类活动模式。临床目的是监测和检测用户的不健康的久坐生活方式。所提出的半监督学习(SSL)框架使用从作为可穿戴设备安装的惯性测量单元传感器获取的稀疏标记的用户活动事件。该方法中提出的基于聚类的学习模型使用来自同一目标用户的数据进行训练,从而在提供个性化活动检测服务的同时保护数据隐私。两种不同的集群标签策略,即,基于人口和基于距离的战略,用于实现所需的分类性能。对于不同的算法参数,所提出的系统具有很高的准确性和计算效率。这在典型的可穿戴设备上的有限计算资源的背景下是相关的。已经对来自公共领域的多用户人类活动数据进行了广泛的实验和仿真研究,以分析具有不同算法超参数的所提出的学习范式的分类准确性和计算复杂性之间的权衡。8000个活动事件的训练时间为4.17小时,所提出的SSL方法最多消耗20KB的CPU内存空间,同时提供90%和100%的分类率的最高准确率。
    This paper presents an on-device semi-supervised human activity detection system that can learn and predict human activity patterns in real time. The clinical objective is to monitor and detect the unhealthy sedentary lifestyle of a user. The proposed semi-supervised learning (SSL) framework uses sparsely labelled user activity events acquired from Inertial Measurement Unit sensors installed as wearable devices. The proposed cluster-based learning model in this approach is trained with data from the same target user, thus preserving data privacy while providing personalized activity detection services. Two different cluster labelling strategies, namely, population-based and distance-based strategies, are employed to achieve the desired classification performance. The proposed system is shown to be highly accurate and computationally efficient for different algorithmic parameters, which is relevant in the context of limited computing resources on typical wearable devices. Extensive experimentation and simulation study have been conducted on multi-user human activity data from the public domain in order to analyze the trade-off between classification accuracy and computation complexity of the proposed learning paradigm with different algorithmic hyper-parameters. With 4.17 h of training time for 8000 activity episodes, the proposed SSL approach consumes at most 20 KB of CPU memory space, while providing a maximum accuracy of 90% and 100% classification rates.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    心脏造影(CTG)测量对于在监测期间评估胎儿健康状况至关重要,准确的评估需要可追溯的CTG信号。当前的FHR计算算法,基于自相关的多普勒超声(DUS)信号,由于无法区分信号,通常会导致损失期。我们假设按类型对DUS信号进行分类可能是一种解决方案,并提出基于人工智能(AI)的方法可用于分类。然而,由于数据可用性有限,有限的研究将人工智能用于DUS信号。因此,这项研究的重点是评估半监督学习在提高分类精度方面的有效性,即使在有限的数据集中,用于DUS信号。包括胎儿心跳的数据,神器,另外两个类别是根据非压力测试和劳动力DUS信号创建的。标记和未标记的数据总计9,600和48,000个数据点,分别,半监督学习模型始终优于监督学习模型,平均分类准确率为80.9%。初步发现表明,将半监督学习应用于使用DUS信号开发AI模型可以实现较高的泛化精度并减少工作量。这种方法可以提高胎儿监护的质量。
    Cardiotocography (CTG) measurements are critical for assessing fetal wellbeing during monitoring, and accurate assessment requires well-traceable CTG signals. The current FHR calculation algorithm, based on autocorrelation to Doppler ultrasound (DUS) signals, often results in periods of loss owing to its inability to differentiate signals. We hypothesized that classifying DUS signals by type could be a solution and proposed that an artificial intelligence (AI)-based approach could be used for classification. However, limited studies have incorporated the use of AI for DUS signals because of the limited data availability. Therefore, this study focused on evaluating the effectiveness of semi-supervised learning in enhancing classification accuracy, even in limited datasets, for DUS signals. Data comprising fetal heartbeat, artifacts, and two other categories were created from non-stress tests and labor DUS signals. With labeled and unlabeled data totaling 9,600 and 48,000 data points, respectively, the semi-supervised learning model consistently outperformed the supervised learning model, achieving an average classification accuracy of 80.9%. The preliminary findings indicate that applying semi-supervised learning to the development of AI models using DUS signals can achieve high generalization accuracy and reduce the effort. This approach may enhance the quality of fetal monitoring.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:认知评估在早期发现认知障碍中起着关键作用,特别是在预防和管理认知疾病,如阿尔茨海默病和路易体痴呆。大规模筛查在很大程度上依赖于认知评估量表作为主要工具,一些低灵敏度和其他昂贵的。尽管机器学习在认知功能评估方面取得了重大进展,它在这个特定的筛选领域的应用仍未得到充分开发,通常需要劳动密集型的专家注释。
    目的:本文介绍了一种基于带有回撤的伪标签(SS-PP)的半监督学习算法,旨在通过利用未标记样本的分布来提高预测认知障碍高风险(HR-CI)的模型效率。
    方法:该研究涉及来自真实世界的189个标记样品和215,078个未标记样品。设计了一种半监督分类算法,并与由14种传统机器学习方法和其他先进的半监督算法组成的监督方法进行了比较。
    结果:最优SS-PP模型,基于GBDT,AUC为0.947。与监督学习模型和半监督方法的比较分析表明,AUC平均提高了8%,并且性能先进。反复。
    结论:这项研究开创了利用有限的标记数据进行HR-CI预测的探索,并评估了纳入体格检查数据的好处。对制定相关医疗保健领域的成本效益战略具有重要意义。
    BACKGROUND: Cognitive assessment plays a pivotal role in the early detection of cognitive impairment, particularly in the prevention and management of cognitive diseases such as Alzheimer\'s and Lewy body dementia. Large-scale screening relies heavily on cognitive assessment scales as primary tools, with some low sensitivity and others expensive. Despite significant progress in machine learning for cognitive function assessment, its application in this particular screening domain remains underexplored, often requiring labor-intensive expert annotations.
    OBJECTIVE: This paper introduces a semi-supervised learning algorithm based on pseudo-label with putback (SS-PP), aiming to enhance model efficiency in predicting the high risk of cognitive impairment (HR-CI) by utilizing the distribution of unlabeled samples.
    METHODS: The study involved 189 labeled samples and 215,078 unlabeled samples from real world. A semi-supervised classification algorithm was designed and evaluated by comparison with supervised methods composed by 14 traditional machine-learning methods and other advanced semi-supervised algorithms.
    RESULTS: The optimal SS-PP model, based on GBDT, achieved an AUC of 0.947. Comparative analysis with supervised learning models and semi-supervised methods demonstrated an average AUC improvement of 8% and state-of-art performance, repectively.
    CONCLUSIONS: This study pioneers the exploration of utilizing limited labeled data for HR-CI predictions and evaluates the benefits of incorporating physical examination data, holding significant implications for the development of cost-effective strategies in relevant healthcare domains.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号