关键词: ATC COVID-19 ICD Italy SARS-CoV-2 big data coronavirus data data analysis data mining epidemiology evolutionary algorithm feature bins feature engineering health data long COVID longitudinal analyses longitudinal analysis longitudinal study machine learning multimorbidity polypharmacy public health risk assessment risk assessments severity sparse binary data

Mesh : Humans COVID-19 / epidemiology Multimorbidity Machine Learning Italy / epidemiology Male Female Aged Hospitalization / statistics & numerical data Middle Aged Longitudinal Studies Aged, 80 and over

来  源:   DOI:10.2196/52353   PDF(Pubmed)

Abstract:
BACKGROUND: Multimorbidity is a significant public health concern, characterized by the coexistence and interaction of multiple preexisting medical conditions. This complex condition has been associated with an increased risk of COVID-19. Individuals with multimorbidity who contract COVID-19 often face a significant reduction in life expectancy. The postpandemic period has also highlighted an increase in frailty, emphasizing the importance of integrating existing multimorbidity details into epidemiological risk assessments. Managing clinical data that include medical histories presents significant challenges, particularly due to the sparsity of data arising from the rarity of multimorbidity conditions. Also, the complex enumeration of combinatorial multimorbidity features introduces challenges associated with combinatorial explosions.
OBJECTIVE: This study aims to assess the severity of COVID-19 in individuals with multiple medical conditions, considering their demographic characteristics such as age and sex. We propose an evolutionary machine learning model designed to handle sparsity, analyzing preexisting multimorbidity profiles of patients hospitalized with COVID-19 based on their medical history. Our objective is to identify the optimal set of multimorbidity feature combinations strongly associated with COVID-19 severity. We also apply the Apriori algorithm to these evolutionarily derived predictive feature combinations to identify those with high support.
METHODS: We used data from 3 administrative sources in Piedmont, Italy, involving 12,793 individuals aged 45-74 years who tested positive for COVID-19 between February and May 2020. From their 5-year pre-COVID-19 medical histories, we extracted multimorbidity features, including drug prescriptions, disease diagnoses, sex, and age. Focusing on COVID-19 hospitalization, we segmented the data into 4 cohorts based on age and sex. Addressing data imbalance through random resampling, we compared various machine learning algorithms to identify the optimal classification model for our evolutionary approach. Using 5-fold cross-validation, we evaluated each model\'s performance. Our evolutionary algorithm, utilizing a deep learning classifier, generated prediction-based fitness scores to pinpoint multimorbidity combinations associated with COVID-19 hospitalization risk. Eventually, the Apriori algorithm was applied to identify frequent combinations with high support.
RESULTS: We identified multimorbidity predictors associated with COVID-19 hospitalization, indicating more severe COVID-19 outcomes. Frequently occurring morbidity features in the final evolved combinations were age>53, R03BA (glucocorticoid inhalants), and N03AX (other antiepileptics) in cohort 1; A10BA (biguanide or metformin) and N02BE (anilides) in cohort 2; N02AX (other opioids) and M04AA (preparations inhibiting uric acid production) in cohort 3; and G04CA (Alpha-adrenoreceptor antagonists) in cohort 4.
CONCLUSIONS: When combined with other multimorbidity features, even less prevalent medical conditions show associations with the outcome. This study provides insights beyond COVID-19, demonstrating how repurposed administrative data can be adapted and contribute to enhanced risk assessment for vulnerable populations.
摘要:
背景:多症是一个重要的公共卫生问题,以多种先前存在的医疗状况共存和相互作用为特征。这种复杂的情况与COVID-19的风险增加有关。感染COVID-19的多病患者通常面临预期寿命的显著降低。大流行后时期也凸显了虚弱的增加,强调将现有多发病率细节纳入流行病学风险评估的重要性。管理包括病史在内的临床数据面临重大挑战,特别是由于多症条件的稀有性所产生的数据的稀疏性。此外,组合多发病率特征的复杂列举引入了与组合爆炸相关的挑战。
目的:本研究旨在评估患有多种疾病的个体中COVID-19的严重程度,考虑到他们的人口特征,如年龄和性别。我们提出了一种进化机器学习模型,旨在处理稀疏性,根据COVID-19住院患者的病史分析其先前存在的多患病情况。我们的目标是确定与COVID-19严重程度密切相关的多发病率特征组合的最佳集合。我们还将Apriori算法应用于这些进化推导的预测特征组合,以识别具有高支持度的特征。
方法:我们使用了来自皮埃蒙特3个行政来源的数据,意大利,涉及12,793名年龄在45-74岁之间的人,他们在2020年2月至5月之间检测出COVID-19阳性。根据他们在COVID-19之前的5年病史,我们提取了多浊度特征,包括药物处方,疾病诊断,性别,和年龄。关注COVID-19住院,我们根据年龄和性别将数据分为4个队列.通过随机重采样解决数据不平衡,我们比较了各种机器学习算法,以确定进化方法的最佳分类模型。使用5倍交叉验证,我们评估了每个模型的性能。我们的进化算法,利用深度学习分类器,生成基于预测的适合度评分,以确定与COVID-19住院风险相关的多发病率组合。最终,Apriori算法用于识别高支持度的频繁组合。
结果:我们确定了与COVID-19住院相关的多发病率预测因子,表明COVID-19结果更严重。最终进化组合中经常出现的发病特征是年龄>53,R03BA(糖皮质激素吸入剂),和N03AX(其他抗癫痫药)在队列1中;A10BA(双胍或二甲双胍)和N02BE(苯胺)在队列2中;N02AX(其他阿片类药物)和M04AA(抑制尿酸产生的制剂)在队列3中;G04CA(α-肾上腺素受体拮抗剂)在队列4中。
结论:当与其他多浊度特征结合使用时,甚至不那么普遍的医疗条件显示与结果的关联。这项研究提供了超越COVID-19的见解,证明了如何调整重新利用的行政数据,并有助于加强对弱势群体的风险评估。
公众号