semisupervised learning

半监督学习
  • 文章类型: Journal Article
    超疏水性的跳跃-液滴凝结和结霜在各种工程应用中具有巨大的潜力,从传热过程到防雾/霜冻技术。然而,由于液滴行为的高频率,监测这样的液滴是具有挑战性的,液滴尺寸的跨尺度分布,和表面形态的多样性。利用深度学习,我们开发了一个半监督框架,监测冷凝和结霜的光学可观察过程。该系统擅长识别瞬态液滴分布和动态活动,例如液滴聚结,跳跃,结霜,在各种超疏水表面上。利用这种瞬态和动态信息,各种物理性质,比如热通量,跳跃的特点,和结霜率,可以进一步量化,感知和全面地输送各表面的传热和抗冻性能。此外,该框架仅依赖于少量的注释数据,并且可以有效地适应具有变化的表面形态和照明技术的新冷凝条件。这种适应性对于优化表面设计以增强冷凝热传递和抗结霜性能是有益的。
    Superhydrophobicity-enabled jumping-droplet condensation and frosting have great potential in various engineering applications, ranging from heat transfer processes to antifog/frost techniques. However, monitoring such droplets is challenging due to the high frequency of droplet behaviors, cross-scale distribution of droplet sizes, and diversity of surface morphologies. Leveraging deep learning, we develop a semisupervised framework that monitors the optical observable process of condensation and frosting. This system is adept at identifying transient droplet distributions and dynamic activities, such as droplet coalescence, jumping, and frosting, on a variety of superhydrophobic surfaces. Utilizing this transient and dynamic information, various physical properties, such as heat flux, jumping characteristics, and frosting rate, can be further quantified, conveying the heat transfer and antifrost performances of each surface perceptually and comprehensively. Furthermore, this framework relies on only a small amount of annotated data and can efficiently adapt to new condensation conditions with varying surface morphologies and illumination techniques. This adaptability is beneficial for optimizing surface designs to enhance condensation heat transfer and antifrosting performance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    分子预测任务通常需要一系列专业实验来标记目标分子,它遭受了有限的标记数据问题。半监督学习范式之一,被称为自我训练,利用标记和未标记的数据。具体来说,教师模型使用标记数据进行训练,并为未标记数据生成伪标签。然后将这些标记和伪标记的数据联合用于训练学生模型。然而,从教师模型生成的伪标签通常不够准确。因此,我们提出了一种稳健的自我训练策略,通过探索稳健的损失函数来处理两个范式中的这种嘈杂的标签,也就是说,通用和自适应。我们已经对具有四个骨干模型的三个分子生物学预测任务进行了实验,以逐步评估所提出的稳健自我训练策略的性能。结果表明,该方法提高了所有任务的预测性能,特别是在分子回归任务中,平均增幅为41.5%。此外,可视化分析证实了我们方法的优越性。我们提出的健壮自我训练是一种简单而有效的策略,可以有效地提高分子生物学预测性能。它通过利用标记和未标记的数据来解决分子生物学中标记数据不足的问题。此外,它可以很容易地嵌入任何预测任务,它是生物信息学界的通用方法。
    Molecular prediction tasks normally demand a series of professional experiments to label the target molecule, which suffers from the limited labeled data problem. One of the semisupervised learning paradigms, known as self-training, utilizes both labeled and unlabeled data. Specifically, a teacher model is trained using labeled data and produces pseudo labels for unlabeled data. These labeled and pseudo-labeled data are then jointly used to train a student model. However, the pseudo labels generated from the teacher model are generally not sufficiently accurate. Thus, we propose a robust self-training strategy by exploring robust loss function to handle such noisy labels in two paradigms, that is, generic and adaptive. We have conducted experiments on three molecular biology prediction tasks with four backbone models to gradually evaluate the performance of the proposed robust self-training strategy. The results demonstrate that the proposed method enhances prediction performance across all tasks, notably within molecular regression tasks, where there has been an average enhancement of 41.5%. Furthermore, the visualization analysis confirms the superiority of our method. Our proposed robust self-training is a simple yet effective strategy that efficiently improves molecular biology prediction performance. It tackles the labeled data insufficient issue in molecular biology by taking advantage of both labeled and unlabeled data. Moreover, it can be easily embedded with any prediction task, which serves as a universal approach for the bioinformatics community.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在许多现代机器学习应用中,协变量分布的变化和获取结果信息的难度对稳健的模型训练和评估提出了挑战。已经开发了许多迁移学习方法,以使用源种群中的现有标记数据将模型本身鲁棒地适应一些未标记的目标种群。然而,关于转移绩效指标的文献很少,特别是接收机工作特性(ROC)参数,一个经过训练的模型。在本文中,我们旨在基于ROC分析评估经过训练的二元分类器对未标记目标人群的性能.我们提出了半监督传输精度度量(STEAM),一种有效的三步估计程序,采用(1)双指数建模来构建校准的密度比权重和(2)稳健的插补来利用大量未标记的数据来提高估计效率。在密度比模型或结果模型的正确规范下,我们建立了所提出的估计器的一致性和渐近正态。我们还通过交叉验证校正了有限样本中估计器的潜在过拟合偏差。我们将我们提出的估计器与现有方法进行了比较,并通过模拟显示了偏差的减少和效率的提高。我们说明了所提出的方法在评估随时间发展的EHR队列中类风湿关节炎(RA)表型模型的预测性能方面的实际实用性。
    In many modern machine learning applications, changes in covariate distributions and difficulty in acquiring outcome information have posed challenges to robust model training and evaluation. Numerous transfer learning methods have been developed to robustly adapt the model itself to some unlabeled target populations using existing labeled data in a source population. However, there is a paucity of literature on transferring performance metrics, especially receiver operating characteristic (ROC) parameters, of a trained model. In this paper, we aim to evaluate the performance of a trained binary classifier on unlabeled target population based on ROC analysis. We proposed Semisupervised Transfer lEarning of Accuracy Measures (STEAM), an efficient three-step estimation procedure that employs (1) double-index modeling to construct calibrated density ratio weights and (2) robust imputation to leverage the large amount of unlabeled data to improve estimation efficiency. We establish the consistency and asymptotic normality of the proposed estimator under the correct specification of either the density ratio model or the outcome model. We also correct for potential overfitting bias in the estimators in finite samples with cross-validation. We compare our proposed estimators to existing methods and show reductions in bias and gains in efficiency through simulations. We illustrate the practical utility of the proposed method on evaluating prediction performance of a phenotyping model for rheumatoid arthritis (RA) on a temporally evolving EHR cohort.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    使用正电子发射断层扫描/计算机断层扫描(PET/CT)多模态图像对胰腺癌肿瘤进行准确分割对于临床诊断和预后评估至关重要。然而,用于自动医学图像分割的深度学习方法需要大量的手动标记数据,使其耗时耗力。此外,多模态图像的添加或简单拼接导致冗余信息,未能充分利用多模态图像的互补信息。因此,我们开发了一种半监督多模式网络,该网络利用了有限的标记样本,并引入了交叉融合和互信息最小化(MIM)策略,用于胰腺肿瘤的PET/CT3D分割.
    我们的方法将交叉多模态融合(CMF)模块与交叉注意力机制相结合。将互补的多模态特征融合以形成多特征集,以增强特征提取的有效性,同时保留每个模态图像的特定特征。此外,我们设计了一个MIM模块来减轻冗余的高级模态信息,并计算PET和CT的潜在损失。最后,我们的方法采用不确定性感知均值教师半监督框架,使用少量标记数据和大量未标记数据从PET/CT图像中分割出感兴趣区域.
    我们在胰腺癌的私有数据集上评估了MIM和CMF半监督分割网络(MIM-CMFNet)的组合,平均骰子系数为73.14%,Jaccard指数平均得分为60.56%,平均95%Hausdorff距离(95HD)为6.30mm。此外,为了验证我们方法的广泛适用性,我们使用了头颈癌的公开数据集,平均骰子系数为68.71%,Jaccard指数平均得分为57.72%,和7.88毫米的平均95HD。
    实验结果证明了我们的MIM-CMFNet优于现有的半监督技术。我们的方法可以实现类似于完全监督分割方法的性能,同时显着降低80%的数据注释成本,提示临床应用具有较高的实用性。
    UNASSIGNED: Accurate segmentation of pancreatic cancer tumors using positron emission tomography/computed tomography (PET/CT) multimodal images is crucial for clinical diagnosis and prognosis evaluation. However, deep learning methods for automated medical image segmentation require a substantial amount of manually labeled data, making it time-consuming and labor-intensive. Moreover, addition or simple stitching of multimodal images leads to redundant information, failing to fully exploit the complementary information of multimodal images. Therefore, we developed a semisupervised multimodal network that leverages limited labeled samples and introduces a cross-fusion and mutual information minimization (MIM) strategy for PET/CT 3D segmentation of pancreatic tumors.
    UNASSIGNED: Our approach combined a cross multimodal fusion (CMF) module with a cross-attention mechanism. The complementary multimodal features were fused to form a multifeature set to enhance the effectiveness of feature extraction while preserving specific features of each modal image. In addition, we designed an MIM module to mitigate redundant high-level modal information and compute the latent loss of PET and CT. Finally, our method employed the uncertainty-aware mean teacher semi-supervised framework to segment regions of interest from PET/CT images using a small amount of labeled data and a large amount of unlabeled data.
    UNASSIGNED: We evaluated our combined MIM and CMF semisupervised segmentation network (MIM-CMFNet) on a private dataset of pancreatic cancer, yielding an average Dice coefficient of 73.14%, an average Jaccard index score of 60.56%, and an average 95% Hausdorff distance (95HD) of 6.30 mm. In addition, to verify the broad applicability of our method, we used a public dataset of head and neck cancer, yielding an average Dice coefficient of 68.71%, an average Jaccard index score of 57.72%, and an average 95HD of 7.88 mm.
    UNASSIGNED: The experimental results demonstrate the superiority of our MIM-CMFNet over existing semisupervised techniques. Our approach can achieve a performance similar to that of fully supervised segmentation methods while significantly reducing the data annotation cost by 80%, suggesting it is highly practicable for clinical application.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    提出了一种基于半监督学习的视频实例分割新方法。我们的Cluster2Former模型利用基于涂鸦的注释进行训练,显着减少对全面的像素级掩模的需要。我们增加了一个视频实例分割器,例如,Mask2Former建筑,具有基于相似性的约束损失,可以有效地处理部分注释。我们证明,尽管使用了轻量级注释(仅使用注释像素的0.5%),Cluster2Former在标准基准上实现了竞争性能。该方法为视频实例分割提供了一种具有成本效益和计算效率的解决方案,特别是在注释资源有限的场景中。
    A novel approach for video instance segmentation is presented using semisupervised learning. Our Cluster2Former model leverages scribble-based annotations for training, significantly reducing the need for comprehensive pixel-level masks. We augment a video instance segmenter, for example, the Mask2Former architecture, with similarity-based constraint loss to handle partial annotations efficiently. We demonstrate that despite using lightweight annotations (using only 0.5% of the annotated pixels), Cluster2Former achieves competitive performance on standard benchmarks. The approach offers a cost-effective and computationally efficient solution for video instance segmentation, especially in scenarios with limited annotation resources.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    要使用基于扩散的深度学习模型从股骨近端的低分辨率图像中恢复骨骼微观结构,创伤性骨质疏松性骨折的常见部位。
    这项回顾性研究中的训练和测试数据包括高分辨率尸体显微CT扫描(n=26),作为地面真相。在用于模型训练之前对图像进行下采样。该模型用于将这些低分辨率图像的空间分辨率提高三倍,从0.72毫米到0.24毫米,足以可视化骨骼微观结构。使用微结构指标和有限元模拟得出的小梁区域刚度验证了模型性能。还评估了少数图像质量评估指标的性能。使用组内相关系数(ICC)和Pearson相关系数评估了模型性能与地面实况之间的相关性。
    与流行的深度学习基线相比,所提出的模型表现出更高的准确性(所提出模型的平均ICC,下一个最佳方法的0.92vsICC,0.83)和较低的偏差(均值差异,3.80%vs10.00%,分别)跨生理指标。两个基于梯度的图像质量度量与结构和机械标准的准确性密切相关(r>0.89)。
    所提出的方法可以在与当前临床成像协议相当的辐射剂量下精确测量骨骼结构和强度,提高临床CT评估骨健康的可行性。关键词:CT,图像后处理,骨骼-阑尾,长骨,辐射效应,量化,预后,本文提供了半监督学习在线补充材料。©RSNA,2023年。
    UNASSIGNED: To use a diffusion-based deep learning model to recover bone microstructure from low-resolution images of the proximal femur, a common site of traumatic osteoporotic fractures.
    UNASSIGNED: Training and testing data in this retrospective study consisted of high-resolution cadaveric micro-CT scans (n = 26), which served as ground truth. The images were downsampled prior to use for model training. The model was used to increase spatial resolution in these low-resolution images threefold, from 0.72 mm to 0.24 mm, sufficient to visualize bone microstructure. Model performance was validated using microstructural metrics and finite element simulation-derived stiffness of trabecular regions. Performance was also evaluated across a handful of image quality assessment metrics. Correlations between model performance and ground truth were assessed using intraclass correlation coefficients (ICCs) and Pearson correlation coefficients.
    UNASSIGNED: Compared with popular deep learning baselines, the proposed model exhibited greater accuracy (mean ICC of proposed model, 0.92 vs ICC of next best method, 0.83) and lower bias (mean difference in means, 3.80% vs 10.00%, respectively) across the physiologic metrics. Two gradient-based image quality metrics strongly correlated with accuracy across structural and mechanical criteria (r > 0.89).
    UNASSIGNED: The proposed method may enable accurate measurements of bone structure and strength with a radiation dose on par with current clinical imaging protocols, improving the viability of clinical CT for assessing bone health.Keywords: CT, Image Postprocessing, Skeletal-Appendicular, Long Bones, Radiation Effects, Quantification, Prognosis, Semisupervised Learning Online supplemental material is available for this article. © RSNA, 2023.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:自动皮肤病变识别已证明可以有效地增加获得可靠的皮肤病学评估;但是,大多数现有的算法完全依赖于图像。许多诊断规则,包括三点检查表,没有被人工智能算法考虑,它包含人类知识,反映人类专家的诊断过程。
    目的:在本文中,我们旨在开发一种半监督模型,该模型不仅可以整合3点检查表中的皮肤镜特征和评分规则,还可以自动化特征注释过程。
    方法:我们首先在一个小的,带有疾病和皮肤镜特征标签的注释数据集,并尝试通过使用排名损失函数集成3点检查表来提高分类准确性。然后我们用了一个大的,仅具有疾病标签的未标记数据集,以从训练的算法中学习,从而自动对皮肤病变和特征进行分类。
    结果:将3点检查表添加到我们的模型后,在5倍交叉验证下,其黑色素瘤分类性能从平均值0.8867(SD0.0191)提高至0.8943(SD0.0115).经过训练的半监督模型可以从3点检查表中自动检测3个皮肤特征,最佳性能为0.80(曲线下面积[AUC]0.8380),0.89(AUC0.9036),和0.76(AUC0.8444),在某些情况下,表现优于人类注释者。
    结论:我们提出的半监督学习框架可以基于其检测皮肤特征和自动化标签注释过程的能力来帮助自动诊断皮肤病。该框架还可以帮助将语义知识与计算机算法相结合,以获得更准确,更可解释的诊断结果,可以应用于更广泛的用例。
    BACKGROUND: Automatic skin lesion recognition has shown to be effective in increasing access to reliable dermatology evaluation; however, most existing algorithms rely solely on images. Many diagnostic rules, including the 3-point checklist, are not considered by artificial intelligence algorithms, which comprise human knowledge and reflect the diagnosis process of human experts.
    OBJECTIVE: In this paper, we aimed to develop a semisupervised model that can not only integrate the dermoscopic features and scoring rule from the 3-point checklist but also automate the feature-annotation process.
    METHODS: We first trained the semisupervised model on a small, annotated data set with disease and dermoscopic feature labels and tried to improve the classification accuracy by integrating the 3-point checklist using ranking loss function. We then used a large, unlabeled data set with only disease label to learn from the trained algorithm to automatically classify skin lesions and features.
    RESULTS: After adding the 3-point checklist to our model, its performance for melanoma classification improved from a mean of 0.8867 (SD 0.0191) to 0.8943 (SD 0.0115) under 5-fold cross-validation. The trained semisupervised model can automatically detect 3 dermoscopic features from the 3-point checklist, with best performances of 0.80 (area under the curve [AUC] 0.8380), 0.89 (AUC 0.9036), and 0.76 (AUC 0.8444), in some cases outperforming human annotators.
    CONCLUSIONS: Our proposed semisupervised learning framework can help with the automatic diagnosis of skin disease based on its ability to detect dermoscopic features and automate the label-annotation process. The framework can also help combine semantic knowledge with a computer algorithm to arrive at a more accurate and more interpretable diagnostic result, which can be applied to broader use cases.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    开发了一种新颖的半监督高光谱成像技术来检测生禽肉上的异物(FM)。结合高光谱成像和深度学习在识别食品安全和质量属性方面显示出希望。然而,挑战在于获取大量准确注释/标记的数据用于模型训练。本文提出了一种新的基于生成对抗网络的半监督高光谱深度学习模型,利用改进的1DU-Net作为鉴别器,检测生鸡胸肉片的FM。通过使用来自1000-1700nm近红外波长范围内的干净鸡胸肉片的高光谱图像的大约879,000光谱响应来训练该模型。测试涉及加工厂中常见的30种不同类型的FM,以两种标称尺寸制备:2×2mm2和5×5mm2。FM检测技术在光谱像素级和异物级均取得了令人印象深刻的效果。在光谱像素级别,该模型达到了100%的精度,超过93%的召回,F1得分为96.8%,和96.9%的平衡精度。当将丰富的一维光谱数据与二维空间信息相结合时,目标级别的FM检测精度达到96.5%。总之,通过这项研究获得的令人印象深刻的结果证明了其在准确识别和定位FMs方面的有效性。此外,该技术在其他农业和食品相关领域的推广和应用潜力凸显了其更广泛的意义。
    A novel semisupervised hyperspectral imaging technique was developed to detect foreign materials (FMs) on raw poultry meat. Combining hyperspectral imaging and deep learning has shown promise in identifying food safety and quality attributes. However, the challenge lies in acquiring a large amount of accurately annotated/labeled data for model training. This paper proposes a novel semisupervised hyperspectral deep learning model based on a generative adversarial network, utilizing an improved 1D U-Net as its discriminator, to detect FMs on raw chicken breast fillets. The model was trained by using approximately 879,000 spectral responses from hyperspectral images of clean chicken breast fillets in the near-infrared wavelength range of 1000-1700 nm. Testing involved 30 different types of FMs commonly found in processing plants, prepared in two nominal sizes: 2 × 2 mm2 and 5 × 5 mm2. The FM-detection technique achieved impressive results at both the spectral pixel level and the foreign material object level. At the spectral pixel level, the model achieved a precision of 100%, a recall of over 93%, an F1 score of 96.8%, and a balanced accuracy of 96.9%. When combining the rich 1D spectral data with 2D spatial information, the FM-detection accuracy at the object level reached 96.5%. In summary, the impressive results obtained through this study demonstrate its effectiveness at accurately identifying and localizing FMs. Furthermore, the technique\'s potential for generalization and application to other agriculture and food-related domains highlights its broader significance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在实践中学习分类模型通常需要大量标记数据进行训练。然而,基于实例的注释对于人类执行来说可能是低效的。在这篇文章中,我们提出并研究了一种新型的人工监督,该监督快速执行且对模型学习有用。而不是标记单个实例,人类为数据区域提供监督,它们是输入数据空间的子空间,代表数据的亚群。由于现在标记是在区域级别上执行的,0/1标签变得不精确。因此,我们设计区域标签是对班级比例的定性评估,这粗略地保持了标签的精度,但也很容易为人类做。要确定用于标记和学习的信息区域,我们进一步设计了一个递推构建区域层次结构的分层主动学习过程。这个过程是半监督的,因为它是由主动学习策略和人类专业知识驱动的,人类可以提供辨别特征。为了评估我们的框架,我们对9个数据集进行了广泛的实验,并对结直肠癌患者的生存分析进行了真实的用户研究。结果清楚地表明了我们基于区域的主动学习框架相对于许多基于实例的主动学习方法的优越性。
    Learning classification models in practice usually requires numerous labeled data for training. However, instance-based annotation can be inefficient for humans to perform. In this article, we propose and study a new type of human supervision that is fast to perform and useful for model learning. Instead of labeling individual instances, humans provide supervision to data regions, which are subspaces of the input data space, representing subpopulations of data. Since labeling now is performed on a region level, 0/1 labeling becomes imprecise. Thus, we design the region label to be a qualitative assessment of the class proportion, which coarsely preserves the labeling precision but is also easy for humans to do. To identify informative regions for labeling and learning, we further devise a hierarchical active learning process that recursively constructs a region hierarchy. This process is semisupervised in the sense that it is driven by both active learning strategies and human expertise, where humans can provide discriminative features. To evaluate our framework, we conducted extensive experiments on nine datasets as well as a real user study on a survival analysis of colorectal cancer patients. The results have clearly demonstrated the superiority of our region-based active learning framework against many instance-based active learning methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:观察性生物医学研究促进了大规模电子健康记录(EHR)利用的新策略,以支持精准医学。然而,数据标签不可访问性是临床预测中越来越重要的问题,尽管使用了合成和半监督的数据学习。很少有研究旨在揭示EHR的潜在图形结构。
    目的:提出了一种基于网络的生成对抗半监督方法。目的是在标签缺陷EHR上训练临床预测模型,以实现与监督方法相当的学习性能。
    方法:选取来自浙江大学附属第二医院的3个公开数据集和1个大肠癌数据集作为基准。所提出的模型在5%至25%的标记数据上进行了训练,并针对常规的半监督和监督方法对分类指标进行了评估。数据质量,模型安全,和内存可伸缩性也进行了评估。
    结果:在相同的设置下,提出的半监督分类方法优于相关的半监督方法,四个数据集的接收器工作特征曲线(AUC)下的平均面积达到0.945、0.673、0.611和0.588,分别,其次是基于图的半监督学习(分别为0.450、0.454、0.425和0.5676)和标签传播(分别为0.475、0.344、0.440和0.477)。10%标记数据的平均分类AUC分别为0.929、0.719、0.652和0.650,与监督学习方法逻辑回归(分别为0.601、0.670、0.731和0.710)相当,支持向量机(分别为0.733、0.720、0.720和0.721),和随机森林(分别为0.982、0.750、0.758和0.740)。通过现实的数据合成和强大的隐私保护,可以缓解有关数据二次使用和数据安全的担忧。
    结论:在数据驱动的研究中,对标签缺陷型EHR的临床预测模型进行训练是必不可少的。所提出的方法具有利用EHR的内在结构并实现与监督方法相当的学习性能的巨大潜力。
    BACKGROUND: Observational biomedical studies facilitate a new strategy for large-scale electronic health record (EHR) utilization to support precision medicine. However, data label inaccessibility is an increasingly important issue in clinical prediction, despite the use of synthetic and semisupervised learning from data. Little research has aimed to uncover the underlying graphical structure of EHRs.
    OBJECTIVE: A network-based generative adversarial semisupervised method is proposed. The objective is to train clinical prediction models on label-deficient EHRs to achieve comparable learning performance to supervised methods.
    METHODS: Three public data sets and one colorectal cancer data set gathered from the Second Affiliated Hospital of Zhejiang University were selected as benchmarks. The proposed models were trained on 5% to 25% labeled data and evaluated on classification metrics against conventional semisupervised and supervised methods. The data quality, model security, and memory scalability were also evaluated.
    RESULTS: The proposed method for semisupervised classification outperforms related semisupervised methods under the same setup, with the average area under the receiver operating characteristics curve (AUC) reaching 0.945, 0.673, 0.611, and 0.588 for the four data sets, respectively, followed by graph-based semisupervised learning (0.450, 0.454, 0.425, and 0.5676, respectively) and label propagation (0.475,0.344, 0.440, and 0.477, respectively). The average classification AUCs with 10% labeled data were 0.929, 0.719, 0.652, and 0.650, respectively, comparable to that of the supervised learning methods logistic regression (0.601, 0.670, 0.731, and 0.710, respectively), support vector machines (0.733, 0.720, 0.720, and 0.721, respectively), and random forests (0.982, 0.750, 0.758, and 0.740, respectively). The concerns regarding the secondary use of data and data security are alleviated by realistic data synthesis and robust privacy preservation.
    CONCLUSIONS: Training clinical prediction models on label-deficient EHRs is indispensable in data-driven research. The proposed method has great potential to exploit the intrinsic structure of EHRs and achieve comparable learning performance to supervised methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号