评估 ChatGPT 作为意大利语 MASLD 患者的咨询工具：准确性评估，完整性和可理解性。Evaluation of ChatGPT as a Counselling Tool for Italian-Speaking MASLD Patients: Assessment of Accuracy, Completeness and Comprehensibility.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

BACKGROUND: Artificial intelligence (AI)-based chatbots have shown promise in providing counseling to patients with metabolic dysfunction-associated steatotic liver disease (MASLD). While ChatGPT3.5 has demonstrated the ability to comprehensively answer MASLD-related questions in English, its accuracy remains suboptimal. Whether language influences these results is unclear. This study aims to assess ChatGPT\'s performance as a counseling tool for Italian MASLD patients.
METHODS: Thirteen Italian experts rated the accuracy, completeness and comprehensibility of ChatGPT3.5 in answering 15 MASLD-related questions in Italian using a six-point accuracy, three-point completeness and three-point comprehensibility Likert\'s scale.
RESULTS: Mean scores for accuracy, completeness and comprehensibility were 4.57 ± 0.42, 2.14 ± 0.31 and 2.91 ± 0.07, respectively. The physical activity domain achieved the highest mean scores for accuracy and completeness, whereas the specialist referral domain achieved the lowest. Overall, Fleiss\'s coefficient of concordance for accuracy, completeness and comprehensibility across all 15 questions was 0.016, 0.075 and -0.010, respectively. Age and academic role of the evaluators did not influence the scores. The results were not significantly different from our previous study focusing on English.
CONCLUSIONS: Language does not appear to affect ChatGPT\'s ability to provide comprehensible and complete counseling to MASLD patients, but accuracy remains suboptimal in certain domains.

摘要：

背景：基于人工智能（AI）的聊天机器人在为代谢功能障碍相关的脂肪变性肝病（MASLD）患者提供咨询方面显示出了希望。虽然ChatGPT3.5展示了用英语全面回答MASLD相关问题的能力，它的准确性仍然是次优的。语言是否影响这些结果尚不清楚。本研究旨在评估ChatGPT作为意大利MASLD患者咨询工具的表现。
方法：13位意大利专家对准确性进行了评估，ChatGPT3.5的完整性和可理解性在回答15个与MASLD相关的意大利语问题时使用六点精度，三点完整性和三点可理解性李克特量表。
结果：准确性的平均得分，完整性和可理解性分别为4.57±0.42、2.14±0.31和2.91±0.07。身体活动领域在准确性和完整性方面取得了最高的平均分数，而专科转诊领域则最低。总的来说,Fleiss的准确度一致性系数，所有15个问题的完整性和可理解性分别为0.016,0.075和-0.010.评估者的年龄和学术角色不会影响分数。结果与我们先前以英语为重点的研究没有显着差异。
结论：语言似乎不会影响ChatGPT为MASLD患者提供可理解和完整咨询的能力，但精度在某些领域仍然是次优的。