评估 ChatGPT 在医疗保健相关伦理问题中的道德能力。Evaluating ChatGPT's moral competence in health care-related ethical problems.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

UNASSIGNED: Artificial intelligence tools such as Chat Generative Pre-trained Transformer (ChatGPT) have been used for many health care-related applications; however, there is a lack of research on their capabilities for evaluating morally and/or ethically complex medical decisions. The objective of this study was to assess the moral competence of ChatGPT.
UNASSIGNED: This cross-sectional study was performed between May 2023 and July 2023 using scenarios from the Moral Competence Test (MCT). Numerical responses were collected from ChatGPT 3.5 and 4.0 to assess individual and overall stage scores, including C-index and overall moral stage preference. Descriptive analysis and 2-sided Student\'s t-test were used for all continuous data.
UNASSIGNED: A total of 100 iterations of the MCT were performed and moral preference was found to be higher in the latter Kohlberg-derived arguments. ChatGPT 4.0 was found to have a higher overall moral stage preference (2.325 versus 1.755) when compared to ChatGPT 3.5. ChatGPT 4.0 was also found to have a statistically higher C-index score in comparison to ChatGPT 3.5 (29.03 ± 11.10 versus 19.32 ± 10.95, P =.0000275).
UNASSIGNED: ChatGPT 3.5 and 4.0 trended towards higher moral preference for the latter stages of Kohlberg\'s theory for both dilemmas with C-indices suggesting medium moral competence. However, both models showed moderate variation in C-index scores indicating inconsistency and further training is recommended.
UNASSIGNED: ChatGPT demonstrates medium moral competence and can evaluate arguments based on Kohlberg\'s theory of moral development. These findings suggest that future revisions of ChatGPT and other large language models could assist physicians in the decision-making process when encountering complex ethical scenarios.

摘要：

■诸如聊天生成预训练变压器（ChatGPT）之类的人工智能工具已用于许多与医疗保健相关的应用；但是，缺乏对他们评估道德和/或道德复杂医疗决策的能力的研究。这项研究的目的是评估ChatGPT的道德能力。
■这项横断面研究是在2023年5月至2023年7月之间使用道德能力测试（MCT）的情景进行的。从ChatGPT3.5和4.0中收集了数值响应，以评估个人和总体阶段分数，包括C指数和整体道德阶段偏好。对所有连续数据使用描述性分析和双侧Studentt检验。
■总共执行了100次MCT迭代，并且在后者的Kohlberg衍生参数中发现道德偏好更高。与ChatGPT3.5相比，ChatGPT4.0具有更高的整体道德阶段偏好（2.325对1.755）。还发现ChatGPT4.0与ChatGPT3.5相比具有统计学上更高的C指数得分(29.03±11.10对19.32±10.95,P=.0000275)。
■ChatGPT3.5和4.0对于Kohlberg理论的后期阶段，这两种困境都倾向于更高的道德偏好，C指数表明中等道德能力。然而,两种模型均显示C指数评分有中等差异,表明不一致,建议进一步训练.
■ChatGPT展示了中等道德能力，并可以根据科尔伯格的道德发展理论评估论点。这些发现表明，未来对ChatGPT和其他大型语言模型的修订可以在遇到复杂的道德情景时帮助医生进行决策过程。