关键词: early screening ensemble learning methods machine learning thyroid nodule urine iodine

Mesh : Thyroid Nodule / diagnosis epidemiology diagnostic imaging Humans Machine Learning Female Male China / epidemiology Cross-Sectional Studies Middle Aged Adult Early Detection of Cancer / methods Aged Mass Screening / methods Ultrasonography / methods

来  源:   DOI:10.3389/fendo.2024.1385167   PDF(Pubmed)

Abstract:
UNASSIGNED: Thyroid nodules, increasingly prevalent globally, pose a risk of malignant transformation. Early screening is crucial for management, yet current models focus mainly on ultrasound features. This study explores machine learning for screening using demographic and biochemical indicators.
UNASSIGNED: Analyzing data from 6,102 individuals and 61 variables, we identified 17 key variables to construct models using six machine learning classifiers: Logistic Regression, SVM, Multilayer Perceptron, Random Forest, XGBoost, and LightGBM. Performance was evaluated by accuracy, precision, recall, F1 score, specificity, kappa statistic, and AUC, with internal and external validations assessing generalizability. Shapley values determined feature importance, and Decision Curve Analysis evaluated clinical benefits.
UNASSIGNED: Random Forest showed the highest internal validation accuracy (78.3%) and AUC (89.1%). LightGBM demonstrated robust external validation performance. Key factors included age, gender, and urinary iodine levels, with significant clinical benefits at various thresholds. Clinical benefits were observed across various risk thresholds, particularly in ensemble models.
UNASSIGNED: Machine learning, particularly ensemble methods, accurately predicts thyroid nodule presence using demographic and biochemical data. This cost-effective strategy offers valuable insights for thyroid health management, aiding in early detection and potentially improving clinical outcomes. These findings enhance our understanding of the key predictors of thyroid nodules and underscore the potential of machine learning in public health applications for early disease screening and prevention.
摘要:
甲状腺结节,在全球范围内越来越普遍,构成恶性转化的风险。早期筛查对管理至关重要,然而,目前的模型主要集中在超声特征。这项研究探索了使用人口统计学和生化指标进行筛查的机器学习。
分析来自6102个人和61个变量的数据,我们使用六个机器学习分类器确定了17个关键变量来构建模型:Logistic回归,SVM,多层感知器,随机森林,XGBoost,和LightGBM。性能是通过准确性来评估的,精度,召回,F1得分,特异性,卡帕统计,AUC,通过内部和外部验证来评估泛化性。Shapley值确定特征重要性,和决策曲线分析评估了临床获益。
随机森林显示出最高的内部验证准确性(78.3%)和AUC(89.1%)。LightGBM展示了强大的外部验证性能。关键因素包括年龄,性别,和尿碘水平,在各种阈值下具有显着的临床益处。在各种风险阈值中观察到临床益处,特别是在合奏模型中。
机器学习,特别是合奏方法,使用人口统计学和生化数据准确预测甲状腺结节的存在。这种具有成本效益的策略为甲状腺健康管理提供了宝贵的见解,有助于早期发现并可能改善临床结果。这些发现增强了我们对甲状腺结节的关键预测因素的理解,并强调了机器学习在公共卫生应用中早期疾病筛查和预防的潜力。
公众号