unsupervised machine learning

无监督机器学习
  • 文章类型: Journal Article
    检测甲型流感(H3N2)病毒的进化转变是有效疫苗设计和开发的主要障碍。在这项研究中,我们描述了新型流感病毒A检测器(NIAVID),一个无监督的机器学习工具,善于识别这些过渡,使用HA1序列和相关的物理化学性质。NIAViD在训练和验证中的敏感性为88.9%(95%CI,56.5-98.0%)和72.7%(95%CI,43.4-90.3%),分别,优于未校准的零模型-33.3%(95%CI,12.1-64.6%),不需要潜在的偏差,耗时和昂贵的实验室化验。博曼指数的关键作用,指示病毒的细胞表面结合潜力,被强调,提高检测抗原转换的精度。NIAVID的功效不仅在于鉴定属于新型抗原簇的流感分离株,而且在确定驱动显著抗原变化的潜在位点方面,不依赖于血凝素抑制滴度的明确建模。我们相信这种方法有望扩大现有的监控网络,为更新的发展提供及时的见解,有效的流感疫苗。因此,没有,结合其他资源,可用于支持监测工作,并为更新的流感疫苗的开发提供信息。
    The detection of evolutionary transitions in influenza A (H3N2) viruses\' antigenicity is a major obstacle to effective vaccine design and development. In this study, we describe Novel Influenza Virus A Detector (NIAViD), an unsupervised machine learning tool, adept at identifying these transitions, using the HA1 sequence and associated physico-chemical properties. NIAViD performed with 88.9% (95% CI, 56.5-98.0%) and 72.7% (95% CI, 43.4-90.3%) sensitivity in training and validation, respectively, outperforming the uncalibrated null model-33.3% (95% CI, 12.1-64.6%) and does not require potentially biased, time-consuming and costly laboratory assays. The pivotal role of the Boman\'s index, indicative of the virus\'s cell surface binding potential, is underscored, enhancing the precision of detecting antigenic transitions. NIAViD\'s efficacy is not only in identifying influenza isolates that belong to novel antigenic clusters, but also in pinpointing potential sites driving significant antigenic changes, without the reliance on explicit modelling of haemagglutinin inhibition titres. We believe this approach holds promise to augment existing surveillance networks, offering timely insights for the development of updated, effective influenza vaccines. Consequently, NIAViD, in conjunction with other resources, could be used to support surveillance efforts and inform the development of updated influenza vaccines.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    脑电图(EEG)的自动癫痫发作检测由于具有方便和经济的优点,在帮助癫痫的诊断和治疗中具有重要意义。现有的癫痫发作检测方法通常是针对患者的,培训和测试是在同一个病人身上进行的,限制了他们对其他患者的可扩展性。为了解决这个问题,我们提出了一种通过无监督域适应的跨主题癫痫发作检测方法。所提出的方法旨在通过浅层和深层特征对齐获得特定的信息。对于浅特征对齐,我们使用卷积神经网络(CNN)来提取与癫痫发作相关的特征。通过多核最大平均差异(MK-MMD)将不同患者之间的浅特征的分布间隙最小化。对于深层特征对齐,利用对抗性学习。特征提取器尝试学习试图混淆域分类器的特征表示,使提取的深层特征更易于推广到新患者。在基于时代的实验中,在CHB-MIT和Siena数据库上评估了我们方法的性能。此外,基于事件的实验也在CHB-MIT数据集上进行。结果验证了我们的方法在减少不同患者之间的领域差异方面的可行性。
    Automatic seizure detection from Electroencephalography (EEG) is of great importance in aiding the diagnosis and treatment of epilepsy due to the advantages of convenience and economy. Existing seizure detection methods are usually patient-specific, the training and testing are carried out on the same patient, limiting their scalability to other patients. To address this issue, we propose a cross-subject seizure detection method via unsupervised domain adaptation. The proposed method aims to obtain seizure specific information through shallow and deep feature alignments. For shallow feature alignment, we use convolutional neural network (CNN) to extract seizure-related features. The distribution gap of the shallow features between different patients is minimized by multi-kernel maximum mean discrepancies (MK-MMD). For deep feature alignment, adversarial learning is utilized. The feature extractor tries to learn feature representations that try to confuse the domain classifier, making the extracted deep features more generalizable to new patients. The performance of our method is evaluated on the CHB-MIT and Siena databases in epoch-based experiments. Additionally, event-based experiments are also conducted on the CHB-MIT dataset. The results validate the feasibility of our method in diminishing the domain disparities among different patients.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Objective.深度学习显着增强了稀疏视图计算机断层扫描重建的性能。然而,这些方法对使用高质量配对数据集的监督训练的依赖性,以及在各种身体获取条件下进行再培训的必要性,限制它们在新的成像环境和设置中的通用性。方法。为了克服这些限制,我们提出了一种基于深度图像先验框架的无监督方法。我们的方法通过结合多级线性扩散噪声,超越了传统的单噪声级输入,显着降低过度拟合的风险。此外,我们将非局部自相似性作为深度隐式先验嵌入到自我注意力网络结构中,提高模型识别和利用整个图像重复模式的能力。此外,利用成像物理学,在图像域和投影数据空间之间进行梯度反向传播以优化网络权重。主要结果。模拟和临床病例的评估证明了我们的方法在各种投影视图中的有效零射适应性,突出其鲁棒性和灵活性。此外,我们的方法有效地消除了噪声和条纹伪影,同时显着恢复复杂的图像细节。意义。我们的方法旨在克服当前基于监督深度学习的稀疏视图CT重建的局限性,提供改进的泛化性和适应性,而不需要大量的成对训练数据。
    Objective.Deep learning has markedly enhanced the performance of sparse-view computed tomography reconstruction. However, the dependence of these methods on supervised training using high-quality paired datasets, and the necessity for retraining under varied physical acquisition conditions, constrain their generalizability across new imaging contexts and settings.Approach.To overcome these limitations, we propose an unsupervised approach grounded in the deep image prior framework. Our approach advances beyond the conventional single noise level input by incorporating multi-level linear diffusion noise, significantly mitigating the risk of overfitting. Furthermore, we embed non-local self-similarity as a deep implicit prior within a self-attention network structure, improving the model\'s capability to identify and utilize repetitive patterns throughout the image. Additionally, leveraging imaging physics, gradient backpropagation is performed between the image domain and projection data space to optimize network weights.Main Results.Evaluations with both simulated and clinical cases demonstrate our method\'s effective zero-shot adaptability across various projection views, highlighting its robustness and flexibility. Additionally, our approach effectively eliminates noise and streak artifacts while significantly restoring intricate image details.Significance. Our method aims to overcome the limitations in current supervised deep learning-based sparse-view CT reconstruction, offering improved generalizability and adaptability without the need for extensive paired training data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:无监督机器学习描述了一系列强大的技术,旨在识别未标记数据中的隐藏模式。这些技术可以大致分为降维,它转换并组合原始的一组测量值以简化数据,和聚类分析,它试图根据某种相似性度量对受试者进行分组。与现有的基于胃肠道症状的罗马IV定义相比,无监督机器学习可用于探索肠脑相互作用(DGBI)障碍的替代亚型。
    目的:本综述旨在使读者熟悉使用可访问定义的无监督机器学习的基本概念,并提供其在DGBI亚型评估中的应用的关键摘要。通过考虑罗马IV临床定义和识别的集群之间的重叠,随着临床和生理的见解,本文推测了对DGBI的可能影响。还考虑了无监督机器学习社区中的算法发展,这可能有助于利用越来越多可用的组学数据来探索生物学上知情的定义。无监督机器学习挑战DGBI的现代亚型,有了必要的临床验证,有可能增强罗马标准的未来迭代,以识别更多的同质,可诊断,和可治疗的患者群体。
    BACKGROUND: Unsupervised machine learning describes a collection of powerful techniques that seek to identify hidden patterns in unlabeled data. These techniques can be broadly categorized into dimension reduction, which transforms and combines the original set of measurements to simplify data, and cluster analysis, which seeks to group subjects based on some measure of similarity. Unsupervised machine learning can be used to explore alternative subtyping of disorders of gut-brain interaction (DGBI) compared to the existing gastrointestinal symptom-based definitions of Rome IV.
    OBJECTIVE: This present review aims to familiarize the reader with fundamental concepts of unsupervised machine learning using accessible definitions and provide a critical summary of their application to the evaluation of DGBI subtyping. By considering the overlap between Rome IV clinical definitions and identified clusters, along with clinical and physiological insights, this paper speculates on the possible implications for DGBI. Also considered are algorithmic developments in the unsupervised machine learning community that may help leverage increasingly available omics data to explore biologically informed definitions. Unsupervised machine learning challenges the modern subtyping of DGBI and, with the necessary clinical validation, has the potential to enhance future iterations of the Rome criteria to identify more homogeneous, diagnosable, and treatable patient populations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:本研究的目的是使用机器学习定义脑出血(ICH)的临床意义表型。
    方法:我们使用了两个美国医疗中心的患者数据和抗高血压治疗急性脑出血-II临床试验。我们使用k-原型来划分患者入院数据。然后,我们使用轮廓法计算和弯头法启发式方法来优化聚类。表型之间的关联,并发症(例如,缉获物),使用Kruskal-WallisH检验或χ2检验评估功能结局。
    结果:有916例患者,平均年龄为63.8±14.1岁,426例患者为女性(46.5%)。出现了三种不同的临床表型:小血肿患者,血压升高,格拉斯哥昏迷评分>12(n=141,26.6%);血肿扩大且国际标准化比率升高的患者(n=204,38.4%);血肿体积中位数为24(四分位距8.2-59.5)mL的患者,更常见的是黑人或非裔美国人,和可能患有脑室内出血的人(n=186,35.0%)。临床表型与癫痫发作之间存在相关性(P=0.024)。住院时间(P=0.001),放电配置(P<0.001),3个月随访时的死亡或残疾(改良Rankin量表评分4-6分)(P<0.001)。我们在一个独立的队列(n=385)中复制了这三种ICH临床表型,以进行外部验证。
    结论:机器学习确定了三种具有临床意义的ICH表型,与患者并发症有关,并与功能结果相关。小脑血肿是我们数据源中代表性不足的另一种表型。
    BACKGROUND: The objective of this study was to define clinically meaningful phenotypes of intracerebral hemorrhage (ICH) using machine learning.
    METHODS: We used patient data from two US medical centers and the Antihypertensive Treatment of Acute Cerebral Hemorrhage-II clinical trial. We used k-prototypes to partition patient admission data. We then used silhouette method calculations and elbow method heuristics to optimize the clusters. Associations between phenotypes, complications (e.g., seizures), and functional outcomes were assessed using the Kruskal-Wallis H-test or χ2 test.
    RESULTS: There were 916 patients; the mean age was 63.8 ± 14.1 years, and 426 patients were female (46.5%). Three distinct clinical phenotypes emerged: patients with small hematomas, elevated blood pressure, and Glasgow Coma Scale scores > 12 (n = 141, 26.6%); patients with hematoma expansion and elevated international normalized ratio (n = 204, 38.4%); and patients with median hematoma volumes of 24 (interquartile range 8.2-59.5) mL, who were more frequently Black or African American, and who were likely to have intraventricular hemorrhage (n = 186, 35.0%). There were associations between clinical phenotype and seizure (P = 0.024), length of stay (P = 0.001), discharge disposition (P < 0.001), and death or disability (modified Rankin Scale scores 4-6) at 3-months\' follow-up (P < 0.001). We reproduced these three clinical phenotypes of ICH in an independent cohort (n = 385) for external validation.
    CONCLUSIONS: Machine learning identified three phenotypes of ICH that are clinically significant, associated with patient complications, and associated with functional outcomes. Cerebellar hematomas are an additional phenotype underrepresented in our data sources.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    近年来,毒理学领域取得了重大进展,特别是采用新的方法方法(NAMs)来理解和预测化学毒性。聚类和分类等基于类的方法是NAM开发和应用的关键,帮助了解与化学品组相关的危险和风险问题,无需额外的实验室工作。计算化学的进展,数据生成和可用性,和机器学习算法代表了持续改进这些技术以优化其用于特定监管和研究目的的重要机会。然而,由于它们的复杂性,深入的理解和仔细的选择是必要的,以使适当的方法与其预期的应用保持一致。
    本评论旨在通过阐明化学相似性(结构和生物学)在聚类和分类方法(CCA)中的关键作用,加深对基于类别的方法的理解。它解决了一般终点与不可知相似性之间的二分法,通常需要无监督分析,和终点特定的相似性需要监督学习。目标是突出这些方法的细微差别,他们的应用,和常见的误用。
    了解相似性对于涉及CCA的毒理学研究至关重要。这些方法的有效性取决于相似性的正确定义和度量,这取决于研究的背景和目标。这种选择受到化学结构如何表示和指示生物活性的相应标签的影响。如果适用。无监督聚类和监督分类方法之间的区别至关重要,要求使用终点不可知与端点特定的相似性定义。这些方法的单独使用或组合需要仔细考虑,以防止偏见并确保与研究目标的相关性。无监督方法使用终点不可知的相似性度量来揭示一般的结构模式和关系,帮助假设生成并促进数据集的探索,而无需预定义的标签或明确的指导。相反,监督技术要求特定于终点的相似性,将化学品分为预定义的类别或训练分类模型,允许对新的化学物质进行准确的预测。当将无监督方法应用于特定于终点的上下文时,可能会出现误用,比如阅读中的模拟选择,导致错误的结论。这篇评论提供了对相似性的重要性及其在监督分类和无监督聚类方法中的作用的见解。https://doi.org/10.1289/EHP14001.
    UNASSIGNED: The field of toxicology has witnessed substantial advancements in recent years, particularly with the adoption of new approach methodologies (NAMs) to understand and predict chemical toxicity. Class-based methods such as clustering and classification are key to NAMs development and application, aiding the understanding of hazard and risk concerns associated with groups of chemicals without additional laboratory work. Advances in computational chemistry, data generation and availability, and machine learning algorithms represent important opportunities for continued improvement of these techniques to optimize their utility for specific regulatory and research purposes. However, due to their intricacy, deep understanding and careful selection are imperative to align the adequate methods with their intended applications.
    UNASSIGNED: This commentary aims to deepen the understanding of class-based approaches by elucidating the pivotal role of chemical similarity (structural and biological) in clustering and classification approaches (CCAs). It addresses the dichotomy between general end point-agnostic similarity, often entailing unsupervised analysis, and end point-specific similarity necessitating supervised learning. The goal is to highlight the nuances of these approaches, their applications, and common misuses.
    UNASSIGNED: Understanding similarity is pivotal in toxicological research involving CCAs. The effectiveness of these approaches depends on the right definition and measure of similarity, which varies based on context and objectives of the study. This choice is influenced by how chemical structures are represented and the respective labels indicating biological activity, if applicable. The distinction between unsupervised clustering and supervised classification methods is vital, requiring the use of end point-agnostic vs. end point-specific similarity definition. Separate use or combination of these methods requires careful consideration to prevent bias and ensure relevance for the goal of the study. Unsupervised methods use end point-agnostic similarity measures to uncover general structural patterns and relationships, aiding hypothesis generation and facilitating exploration of datasets without the need for predefined labels or explicit guidance. Conversely, supervised techniques demand end point-specific similarity to group chemicals into predefined classes or to train classification models, allowing accurate predictions for new chemicals. Misuse can arise when unsupervised methods are applied to end point-specific contexts, like analog selection in read-across, leading to erroneous conclusions. This commentary provides insights into the significance of similarity and its role in supervised classification and unsupervised clustering approaches. https://doi.org/10.1289/EHP14001.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在反射共聚焦显微镜(RCM)图像上准确识别表皮细胞对于研究健康和患病皮肤的表皮结构和拓扑结构非常重要。然而,这些图像的分析目前手动完成,因此耗时,并受到人为错误和专家之间的解释。由于噪声和异质性,它还受到低图像质量的阻碍。
    我们旨在设计一种自动化管道,用于从RCM图像分析表皮结构。
    已经进行了两种自动定位表皮细胞的尝试,称为角质形成细胞,在RCM图像上:第一个是基于旋转对称误差函数掩码,第二个是细胞形态特征。这里,我们提出了一个双任务网络来自动识别RCM图像上的角质形成细胞。每个任务由一个循环生成对抗网络组成。第一项任务旨在将真实的RCM图像转换为二进制图像,从而学习RCM图像的噪声和纹理模型,而第二个任务将Gabor过滤的RCM图像映射为二进制图像,学习RCM图像上可见的表皮结构。两个任务的组合允许一个任务限制另一个任务的解空间,从而提高整体效果。我们通过应用预先训练的StarDist算法来检测星凸形状来完善我们的细胞识别,从而关闭任何不完整的膜并分离相邻的细胞。
    在模拟数据和手动注释的真实RCM数据上评估结果。准确性是使用召回率和精确度指标来衡量的,总结为F1分数。
    我们证明了所提出的完全无监督的方法成功地识别了表皮RCM图像上的角质形成细胞,准确性与专家的细胞识别相当,不受有限的可用注释数据的约束,并且可以扩展到使用各种成像技术获取的图像,而无需重新训练。
    UNASSIGNED: Accurate identification of epidermal cells on reflectance confocal microscopy (RCM) images is important in the study of epidermal architecture and topology of both healthy and diseased skin. However, analysis of these images is currently done manually and therefore time-consuming and subject to human error and inter-expert interpretation. It is also hindered by low image quality due to noise and heterogeneity.
    UNASSIGNED: We aimed to design an automated pipeline for the analysis of the epidermal structure from RCM images.
    UNASSIGNED: Two attempts have been made at automatically localizing epidermal cells, called keratinocytes, on RCM images: the first is based on a rotationally symmetric error function mask, and the second on cell morphological features. Here, we propose a dual-task network to automatically identify keratinocytes on RCM images. Each task consists of a cycle generative adversarial network. The first task aims to translate real RCM images into binary images, thus learning the noise and texture model of RCM images, whereas the second task maps Gabor-filtered RCM images into binary images, learning the epidermal structure visible on RCM images. The combination of the two tasks allows one task to constrict the solution space of the other, thus improving overall results. We refine our cell identification by applying the pre-trained StarDist algorithm to detect star-convex shapes, thus closing any incomplete membranes and separating neighboring cells.
    UNASSIGNED: The results are evaluated both on simulated data and manually annotated real RCM data. Accuracy is measured using recall and precision metrics, which is summarized as the F 1 -score.
    UNASSIGNED: We demonstrate that the proposed fully unsupervised method successfully identifies keratinocytes on RCM images of the epidermis, with an accuracy on par with experts\' cell identification, is not constrained by limited available annotated data, and can be extended to images acquired using various imaging techniques without retraining.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    疾病的症状可能因个体而异,并且可能在早期阶段未被发现。在初始阶段,检测这些症状对于有效管理和治疗不同严重程度的病例至关重要。机器学习近年来取得了重大进展,证明其在各种医疗保健应用中的有效性。这项研究旨在使用有监督和无监督的机器学习来识别患者的症状模式和有关症状的一般规则。基于规则的机器学习技术和分类方法的集成用于扩展预测模型。这项研究分析了通过Kaggle存储库在线获得的患者数据。在对数据进行预处理并探索描述性统计后,Apriori算法用于识别发现的规则中的频繁症状和模式。此外,这项研究应用了几种机器学习模型来预测疾病,包括逐步回归,支持向量机,引导森林,提升的树木,和神经增强方法。将几种预测性机器学习模型应用于数据集以预测疾病。发现在这项研究中,逐步拟合的方法优于所有竞争对手,通过基于既定标准对每个模型进行交叉验证确定。此外,在研究中提取了许多重要的决策规则,这可以简化临床应用,而不需要额外的专业知识。这些规则可以预测症状和疾病之间的关系,以及不同疾病之间。因此,在这项研究中获得的结果有可能提高预测模型的性能。我们可以使用数据集的监督和无监督机器学习来发现疾病症状和一般规则。总的来说,所提出的算法不仅可以支持医疗保健专业人员,还可以支持在诊断和治疗这些疾病时面临成本和时间限制的患者。
    The symptoms of diseases can vary among individuals and may remain undetected in the early stages. Detecting these symptoms is crucial in the initial stage to effectively manage and treat cases of varying severity. Machine learning has made major advances in recent years, proving its effectiveness in various healthcare applications. This study aims to identify patterns of symptoms and general rules regarding symptoms among patients using supervised and unsupervised machine learning. The integration of a rule-based machine learning technique and classification methods is utilized to extend a prediction model. This study analyzes patient data that was available online through the Kaggle repository. After preprocessing the data and exploring descriptive statistics, the Apriori algorithm was applied to identify frequent symptoms and patterns in the discovered rules. Additionally, the study applied several machine learning models for predicting diseases, including stepwise regression, support vector machine, bootstrap forest, boosted trees, and neural-boosted methods. Several predictive machine learning models were applied to the dataset to predict diseases. It was discovered that the stepwise method for fitting outperformed all competitors in this study, as determined through cross-validation conducted for each model based on established criteria. Moreover, numerous significant decision rules were extracted in the study, which can streamline clinical applications without the need for additional expertise. These rules enable the prediction of relationships between symptoms and diseases, as well as between different diseases. Therefore, the results obtained in this study have the potential to improve the performance of prediction models. We can discover diseases symptoms and general rules using supervised and unsupervised machine learning for the dataset. Overall, the proposed algorithm can support not only healthcare professionals but also patients who face cost and time constraints in diagnosing and treating these diseases.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    声学信号在动物交流中至关重要,量化它们是理解动物行为和生态学的基础。发声可以分为声学和功能或上下文不同的类别,但是建立这些类别可能具有挑战性。新开发的方法,比如机器学习,可以为分类任务提供解决方案。平原斑马以其响亮而具体的发声而闻名,然而,对其声乐的结构和信息内容的了解有限。在这项研究中,我们采用了基于特征和基于频谱图的算法,结合有监督和无监督的机器学习方法,以增强对斑马发声类型进行分类的鲁棒性。此外,我们实施了置换判别函数分析,以检查识别的发声类型中包含的个人身份信息.调查结果揭示了至少四种不同的发声类型——“鼻涕”,\'软鼻涕\',\'尖叫\'和\'quaggaquagga\'-具有主要在norts中观察到的个体差异,在较小程度上尖叫。基于声学特征的分析优于基于频谱图的分析,但是每个人都擅长表征不同的发声类型。因此,我们建议将这两种方法结合使用。这项研究为平原斑马发声提供了有价值的见解,对未来动物交流的全面探索具有重要意义。
    Acoustic signals are vital in animal communication, and quantifying them is fundamental for understanding animal behaviour and ecology. Vocalizations can be classified into acoustically and functionally or contextually distinct categories, but establishing these categories can be challenging. Newly developed methods, such as machine learning, can provide solutions for classification tasks. The plains zebra is known for its loud and specific vocalizations, yet limited knowledge exists on the structure and information content of its vocalzations. In this study, we employed both feature-based and spectrogram-based algorithms, incorporating supervised and unsupervised machine learning methods to enhance robustness in categorizing zebra vocalization types. Additionally, we implemented a permuted discriminant function analysis to examine the individual identity information contained in the identified vocalization types. The findings revealed at least four distinct vocalization types-the \'snort\', the \'soft snort\', the \'squeal\' and the \'quagga quagga\'-with individual differences observed mostly in snorts, and to a lesser extent in squeals. Analyses based on acoustic features outperformed those based on spectrograms, but each excelled in characterizing different vocalization types. We thus recommend the combined use of these two approaches. This study offers valuable insights into plains zebra vocalization, with implications for future comprehensive explorations in animal communication.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:尽管电子医疗记录(EHR)数据的可用性越来越高,并且即插即用机器学习(ML)应用编程接口的广泛可用性,到目前为止,在常规医院工作流程中采用数据驱动的决策,仍然有限。通过按年龄推导诊断集群的镜头,本研究调查了可以使用EHR数据进行ML分析的类型,以及如何将结果传达给相关利益相关者.
    方法:来自三级儿科医院的观察性EHR数据,使用了61522例独特患者和3315例独特ICD-10诊断代码,预处理后。K均值聚类用于识别患者诊断的年龄分布。使用定量度量和专家评估聚类的临床有效性来选择最终模型。此外,分析了预处理决策的不确定性。
    结果:确定了四个年龄簇的疾病,大致与年龄在0和1之间;1和5;5和13;13和18。诊断,在集群内,与现有的关于不同年龄的演讲倾向的知识相一致,和序贯群集呈现已知的疾病进展。结果验证了文献中的类似方法。预处理决策引起的不确定性的影响在个体诊断中很大,但在人群水平上却没有。缓解战略,或沟通,这种不确定性得到了成功的证明。
    结论:无监督ML应用于EHR数据可识别诊断的临床相关年龄分布,这可以增强现有决策。然而,如果没有适当地减轻或传达,医疗保健数据集中的偏见会极大地影响结果。
    BACKGROUND: Despite the increasing availability of electronic healthcare record (EHR) data and wide availability of plug-and-play machine learning (ML) Application Programming Interfaces, the adoption of data-driven decision-making within routine hospital workflows thus far, has remained limited. Through the lens of deriving clusters of diagnoses by age, this study investigated the type of ML analysis that can be performed using EHR data and how results could be communicated to lay stakeholders.
    METHODS: Observational EHR data from a tertiary paediatric hospital, containing 61 522 unique patients and 3315 unique ICD-10 diagnosis codes was used, after preprocessing. K-means clustering was applied to identify age distributions of patient diagnoses. The final model was selected using quantitative metrics and expert assessment of the clinical validity of the clusters. Additionally, uncertainty over preprocessing decisions was analysed.
    RESULTS: Four age clusters of diseases were identified, broadly aligning to ages between: 0 and 1; 1 and 5; 5 and 13; 13 and 18. Diagnoses, within the clusters, aligned to existing knowledge regarding the propensity of presentation at different ages, and sequential clusters presented known disease progressions. The results validated similar methodologies within the literature. The impact of uncertainty induced by preprocessing decisions was large at the individual diagnoses but not at a population level. Strategies for mitigating, or communicating, this uncertainty were successfully demonstrated.
    CONCLUSIONS: Unsupervised ML applied to EHR data identifies clinically relevant age distributions of diagnoses which can augment existing decision making. However, biases within healthcare datasets dramatically impact results if not appropriately mitigated or communicated.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号