使用眼底图像评估多模态 ChatGPT - 4 在检测青光眼中的优势和局限性。Evaluating the strengths and limitations of multimodal ChatGPT-4 in detecting glaucoma using fundus images.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

UNASSIGNED: This study evaluates the diagnostic accuracy of a multimodal large language model (LLM), ChatGPT-4, in recognizing glaucoma using color fundus photographs (CFPs) with a benchmark dataset and without prior training or fine tuning.
UNASSIGNED: The publicly accessible Retinal Fundus Glaucoma Challenge \"REFUGE\" dataset was utilized for analyses. The input data consisted of the entire 400 image testing set. The task involved classifying fundus images into either \'Likely Glaucomatous\' or \'Likely Non-Glaucomatous\'. We constructed a confusion matrix to visualize the results of predictions from ChatGPT-4, focusing on accuracy of binary classifications (glaucoma vs non-glaucoma).
UNASSIGNED: ChatGPT-4 demonstrated an accuracy of 90% with a 95% confidence interval (CI) of 87.06%-92.94%. The sensitivity was found to be 50% (95% CI: 34.51%-65.49%), while the specificity was 94.44% (95% CI: 92.08%-96.81%). The precision was recorded at 50% (95% CI: 34.51%-65.49%), and the F1 Score was 0.50.
UNASSIGNED: ChatGPT-4 achieved relatively high diagnostic accuracy without prior fine tuning on CFPs. Considering the scarcity of data in specialized medical fields, including ophthalmology, the use of advanced AI techniques, such as LLMs, might require less data for training compared to other forms of AI with potential savings in time and financial resources. It may also pave the way for the development of innovative tools to support specialized medical care, particularly those dependent on multimodal data for diagnosis and follow-up, irrespective of resource constraints.

摘要：

■本研究评估了多模态大型语言模型（LLM）的诊断准确性，ChatGPT-4，使用具有基准数据集的彩色眼底照片（CFP）识别青光眼，无需事先训练或微调。
■使用可公开访问的视网膜眼底青光眼挑战“REFUGE”数据集进行分析。输入数据由整个400个图像测试集组成。任务涉及将眼底图像分类为“可能的青光眼”或“可能的非青光眼”。我们构建了一个混淆矩阵来可视化ChatGPT-4的预测结果，重点是二元分类的准确性（青光眼与非青光眼）。
■ChatGPT-4显示出90％的准确性，95％的置信区间（CI）为87.06％-92.94％。敏感度为50%(95%CI:34.51%-65.49%)，而特异性为94.44%(95%CI:92.08%-96.81%)。精度记录为50％（95％CI：34.51％-65.49％），F1评分为0.50。
■ChatGPT-4在没有预先对CFP进行微调的情况下实现了相对较高的诊断准确性。考虑到专业医疗领域数据的稀缺性，包括眼科,使用先进的人工智能技术，如LLM，与其他形式的AI相比，可能需要更少的数据进行培训，并可能节省时间和财务资源。它还可能为开发创新工具以支持专业医疗服务铺平道路，特别是那些依赖于多模态数据进行诊断和随访的数据，不受资源限制。