关键词: ai comparison ai efficacy future medicine healthcare ai medical decision-making

来  源:   DOI:10.7759/cureus.52485   PDF(Pubmed)

Abstract:
This study rigorously evaluates the performance of four artificial intelligence (AI) language models - ChatGPT, Claude AI, Google Bard, and Perplexity AI - across four key metrics: accuracy, relevance, clarity, and completeness. We used a strong mix of research methods, getting opinions from 14 scenarios. This helped us make sure our findings were accurate and dependable. The study showed that Claude AI performs better than others because it gives complete responses. Its average score was 3.64 for relevance and 3.43 for completeness compared to other AI tools. ChatGPT always did well, and Google Bard had unclear responses, which varied greatly, making it difficult to understand it, so there was no consistency in Google Bard. These results give important information about what AI language models are doing well or not for medical suggestions. They help us use them better, telling us how to improve future tech changes that use AI. The study shows that AI abilities match complex medical scenarios.
摘要:
这项研究严格评估了四种人工智能(AI)语言模型的性能-ChatGPT,克劳德·艾伊,谷歌吟游诗人,和困惑AI-跨越四个关键指标:准确性、相关性,清晰度,和完整性。我们使用了多种研究方法,从14种场景中获取意见。这有助于我们确保我们的发现是准确和可靠的。研究表明,克劳德AI比其他人表现更好,因为它给出了完整的响应。与其他AI工具相比,其相关性平均得分为3.64,完整性平均得分为3.43。ChatGPT总是做得很好,谷歌巴德的反应不清楚,变化很大,很难理解它,所以谷歌吟游诗人没有一致性。这些结果提供了有关AI语言模型在医学建议方面做得好或不好的重要信息。它们帮助我们更好地使用它们,告诉我们如何改善使用AI的未来技术变革。研究表明,AI能力与复杂的医疗场景相匹配。
公众号