基于 GPT 4 的内容识别聊天机器人为牙科成像中的锥形束计算机断层扫描指南提供了值得信赖的建议。A content-aware chatbot based on GPT 4 provides trustworthy recommendations for Cone-Beam CT guidelines in dental imaging.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

OBJECTIVE: To develop a content-aware chatbot based on GPT-3.5-Turbo and GPT-4 with specialized knowledge on the German S2 Cone-Beam CT (CBCT) dental imaging guideline and to compare the performance against humans.
METHODS: The LlamaIndex software library was used to integrate the guideline context into the chatbots. Based on the CBCT S2 guideline, 40 questions were posed to content-aware chatbots and early career and senior practitioners with different levels of experience served as reference. The chatbots\' performance was compared in terms of recommendation accuracy and explanation quality. Chi-square test and one-tailed Wilcoxon signed rank test evaluated accuracy and explanation quality, respectively.
RESULTS: The GPT-4 based chatbot provided 100% correct recommendations and superior explanation quality compared to the one based on GPT3.5-Turbo (87.5% vs. 57.5% for GPT-3.5-Turbo; P = .003). Moreover, it outperformed early career practitioners in correct answers (P = .002 and P = .032) and earned higher trust than the chatbot using GPT-3.5-Turbo (P = 0.006).
CONCLUSIONS: A content-aware chatbot using GPT-4 reliably provided recommendations according to current consensus guidelines. The responses were deemed trustworthy and transparent, and therefore facilitate the integration of artificial intelligence into clinical decision-making.

摘要：

目的：开发基于GPT-3.5-Turbo和GPT-4的内容感知聊天机器人，并具有德国S2锥束CT（CBCT）牙科成像指南的专业知识，并比较其性能与人类。
方法：LlamaIndex软件库用于将指南上下文集成到聊天机器人中。根据CBCTS2指南,向内容感知的聊天机器人提出了40个问题，并以具有不同经验水平的早期职业和高级从业者作为参考。在推荐准确性和解释质量方面比较了聊天机器人的性能。卡方检验和单尾Wilcoxon符号秩检验评估准确性和解释质量，分别。
结果：与基于GPT3.5-Turbo的聊天机器人相比，基于GPT-4的聊天机器人提供了100％正确的建议和出色的解释质量（87.5％vs.GPT-3.5-Turbo为57.5%；p=0.003）。此外，它的正确答案优于早期职业从业者（p=0.002和p=0.032），并且比使用GPT-3.5-Turbo（p=0.006）的聊天机器人获得更高的信任。
结论：使用GPT-4的内容感知聊天机器人根据当前的共识指南可靠地提供了建议。这些回应被认为是可信和透明的，因此有助于将人工智能整合到临床决策中。