背景:临床医学为应用机器学习(ML)模型提供了一个有前途的领域。然而,尽管许多研究在医疗数据分析中使用ML,只有一小部分影响了临床护理。本文强调了在医疗数据分析中使用ML的重要性,认识到单独的ML可能无法充分捕获临床数据的全部复杂性,从而倡导在ML中整合医学领域知识。
方法:该研究对将医学知识整合到ML中的先前努力进行了全面回顾,并将这些整合策略映射到ML管道的各个阶段。包括数据预处理,特征工程,模型训练,和输出评估。该研究通过糖尿病预测的案例研究进一步探讨了这种整合的意义和影响。这里,临床知识,包含规则,因果网络,间隔,和公式,集成在ML管道的每个阶段,产生了一系列集成模型。
结果:这些发现突出了集成在准确性方面的好处,可解释性,数据效率,并遵守临床指南。在一些情况下,集成模型的性能优于纯数据驱动的方法,强调领域知识通过改进的泛化来增强ML模型的潜力。在其他情况下,整合有助于增强模型的可解释性,并确保符合既定的临床指南.值得注意的是,知识集成也被证明在有限的数据场景下有效地保持性能。
结论:通过临床案例研究说明各种整合策略,这项工作为激励和促进未来的整合努力提供了指导。此外,该研究认为,需要完善领域知识表示并微调其对ML模型的贡献,这是对集成的两个主要挑战,并旨在促进该方向的进一步研究。
BACKGROUND: Clinical medicine offers a promising arena for applying Machine Learning (ML) models. However, despite numerous studies employing ML in medical data analysis, only a fraction have impacted clinical care. This article underscores the importance of utilising ML in medical data analysis, recognising that ML alone may not adequately capture the full complexity of clinical data, thereby advocating for the integration of medical domain knowledge in ML.
METHODS: The study conducts a comprehensive review of prior efforts in integrating medical knowledge into ML and maps these integration strategies onto the phases of the ML pipeline, encompassing data pre-processing, feature engineering, model training, and output evaluation. The study further explores the significance and impact of such integration through a case study on diabetes prediction. Here, clinical knowledge, encompassing rules, causal networks, intervals, and formulas, is integrated at each stage of the ML pipeline, resulting in a spectrum of integrated models.
RESULTS: The findings highlight the benefits of integration in terms of accuracy, interpretability, data efficiency, and adherence to clinical guidelines. In several cases, integrated models outperformed purely data-driven approaches, underscoring the potential for domain knowledge to enhance ML models through improved generalisation. In other cases, the integration was instrumental in enhancing model interpretability and ensuring conformity with established clinical guidelines. Notably, knowledge integration also proved effective in maintaining performance under limited data scenarios.
CONCLUSIONS: By illustrating various integration strategies through a clinical case study, this work provides guidance to inspire and facilitate future integration efforts. Furthermore, the study identifies the need to refine domain knowledge representation and fine-tune its contribution to the ML model as the two main challenges to integration and aims to stimulate further research in this direction.