关键词: biomarker classification diagnostic host infectious disease machine learning protein

来  源:   DOI:10.3390/diagnostics14121290   PDF(Pubmed)

Abstract:
In recent years, infectious disease diagnosis has increasingly turned to host-centered approaches as a complement to pathogen-directed ones. The former, however, typically requires the interpretation of complex multiple biomarker datasets to arrive at an informative diagnostic outcome. This report describes a machine learning (ML)-based classification workflow that is intended as a template for researchers seeking to apply ML approaches for developing host-based infectious disease biomarker classifiers. As an example, we built a classification model that could accurately distinguish between three disease etiology classes: bacterial, viral, and normal in human sera using host protein biomarkers of known diagnostic utility. After collecting protein data from known disease samples, we trained a series of increasingly complex Auto-ML models until arriving at an optimized classifier that could differentiate viral, bacterial, and non-disease samples. Even when limited to a relatively small training set size, the model had robust diagnostic characteristics and performed well when faced with a blinded sample set. We present here a flexible approach for applying an Auto-ML-based workflow for the identification of host biomarker classifiers with diagnostic utility for infectious disease, and which can readily be adapted for multiple biomarker classes and disease states.
摘要:
近年来,传染病诊断越来越多地转向以宿主为中心的方法,作为病原体导向方法的补充。前者,然而,通常需要对复杂的多个生物标志物数据集进行解释,以得出信息丰富的诊断结果.该报告描述了基于机器学习(ML)的分类工作流程,旨在作为寻求应用ML方法开发基于宿主的传染病生物标志物分类器的研究人员的模板。作为一个例子,我们建立了一个分类模型,可以准确区分三种疾病病因类别:细菌,病毒,使用已知诊断效用的宿主蛋白质生物标志物在人血清中正常。从已知的疾病样本中收集蛋白质数据后,我们训练了一系列越来越复杂的Auto-ML模型,直到获得可以区分病毒的优化分类器,细菌,和非疾病样本。即使限于相对较小的训练集大小,该模型具有稳健的诊断特征,在面对盲态样本组时表现良好.我们在这里提出了一种灵活的方法,用于应用基于Auto-ML的工作流程来识别具有感染性疾病诊断功能的宿主生物标志物分类器。并且可以容易地适应多种生物标志物类别和疾病状态。
公众号