测试 ChatGPT 视觉 (GPT - 4V) ：交通图像中的风险感知。Putting ChatGPT vision (GPT-4V) to the test: risk perception in traffic images.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

Vision-language models are of interest in various domains, including automated driving, where computer vision techniques can accurately detect road users, but where the vehicle sometimes fails to understand context. This study examined the effectiveness of GPT-4V in predicting the level of \'risk\' in traffic images as assessed by humans. We used 210 static images taken from a moving vehicle, each previously rated by approximately 650 people. Based on psychometric construct theory and using insights from the self-consistency prompting method, we formulated three hypotheses: (i) repeating the prompt under effectively identical conditions increases validity, (ii) varying the prompt text and extracting a total score increases validity compared to using a single prompt, and (iii) in a multiple regression analysis, the incorporation of object detection features, alongside the GPT-4V-based risk rating, significantly contributes to improving the model\'s validity. Validity was quantified by the correlation coefficient with human risk scores, across the 210 images. The results confirmed the three hypotheses. The eventual validity coefficient was r = 0.83, indicating that population-level human risk can be predicted using AI with a high degree of accuracy. The findings suggest that GPT-4V must be prompted in a way equivalent to how humans fill out a multi-item questionnaire.

摘要：

视觉语言模型在各个领域都很有趣，包括自动驾驶，计算机视觉技术可以准确检测道路使用者，但是车辆有时无法理解上下文。这项研究检查了GPT-4V在预测人类评估的交通图像中“风险”水平方面的有效性。我们使用了从移动车辆上拍摄的210张静态图像，每个人以前被大约650人评级。基于心理测量学建构理论，运用自我一致性提示方法的见解，我们提出了三个假设：(I)在有效相同的条件下重复提示增加了有效性，（ii）与使用单个提示相比，改变提示文本并提取总分增加了有效性，和(Iii)在多元回归分析中，结合对象检测特征，除了基于GPT-4V的风险评级，显著有助于提高模型的有效性。有效性通过与人类风险评分的相关系数量化，在210张图片中。结果证实了这三个假设。最终的有效性系数为r=0.83，表明可以使用AI高度准确地预测人口水平的人类风险。研究结果表明，GPT-4V必须以等同于人类填写多项目问卷的方式进行提示。