
聊天 - GPT
  • 文章类型: Journal Article
    BACKGROUND: Artificial intelligence (AI) can be a tool in the diagnosis and acquisition of knowledge, particularly in dentistry, sparking debates on its application in clinical decision-making.
    OBJECTIVE: This study aims to evaluate the accuracy, completeness, and reliability of the responses generated by Chatbot Generative Pre-Trained Transformer (ChatGPT) 3.5 in dentistry using expert-formulated questions.
    METHODS: Experts were invited to create three questions, answers, and respective references according to specialized fields of activity. The Likert scale was used to evaluate agreement levels between experts and ChatGPT responses. Statistical analysis compared descriptive and binary question groups in terms of accuracy and completeness. Questions with low accuracy underwent re-evaluation, and subsequent responses were compared for improvement. The Wilcoxon test was utilized (α = 0.05).
    RESULTS: Ten experts across six dental specialties generated 30 binary and descriptive dental questions and references. The accuracy score had a median of 5.50 and a mean of 4.17. For completeness, the median was 2.00 and the mean was 2.07. No difference was observed between descriptive and binary responses for accuracy and completeness. However, re-evaluated responses showed a significant improvement with a significant difference in accuracy (median 5.50 vs. 6.00; mean 4.17 vs. 4.80; p=0.042) and completeness (median 2.0 vs. 2.0; mean 2.07 vs. 2.30; p=0.011). References were more incorrect than correct, with no differences between descriptive and binary questions.
    CONCLUSIONS: ChatGPT initially demonstrated good accuracy and completeness, which was further improved with machine learning (ML) over time. However, some inaccurate answers and references persisted. Human critical discernment continues to be essential to facing complex clinical cases and advancing theoretical knowledge and evidence-based practice.






  • 文章类型: Letter






  • 文章类型: Journal Article
    OBJECTIVE: To investigate the capacity of chat-generative pre-trained transformer (Chat-GPT) to understand the S2k guideline of the German Society for Gynecology and Obstetrics on intrauterine growth restriction.
    METHODS: The German-language free Chat-GPT version was used to test the ability of Chat-GPT to understand the definition of small for gestational age and intrauterine growth restriction, to indicate the correct time and place of delivery and to evaluate ist ability to recommend a spontaneous delivery versus a primary caesarean section in accordance with the guideline recommendations. In order to objectively evaluate the suggestions a simple three-color \'traffic light\' evaluation system was employed.
    RESULTS: Almost all Chat-GPT\'s suggestions in the context of definition of small for gestational age/intrauterine growth restriction as well as correct time of delivery were adequate, whereas more than half of the suggestions made in terms of correct delivery mode needed reformulation or even correction.
    CONCLUSIONS: Chat-GPT appears to be a valuable form of artificial intelligence that could be integrated into everyday clinical practice.






  • 文章类型: Journal Article
    BACKGROUND: Patients find technology tools to be more approachable for seeking sensitive health-related information, such as reproductive health information. The inventive conversational ability of artificial intelligence (AI) chatbots, such as ChatGPT (OpenAI Inc), offers a potential means for patients to effectively locate answers to their health-related questions digitally.
    OBJECTIVE: A pilot study was conducted to compare the novel ChatGPT with the existing Google Search technology for their ability to offer accurate, effective, and current information regarding proceeding action after missing a dose of oral contraceptive pill.
    METHODS: A sequence of 11 questions, mimicking a patient inquiring about the action to take after missing a dose of an oral contraceptive pill, were input into ChatGPT as a cascade, given the conversational ability of ChatGPT. The questions were input into 4 different ChatGPT accounts, with the account holders being of various demographics, to evaluate potential differences and biases in the responses given to different account holders. The leading question, \"what should I do if I missed a day of my oral contraception birth control?\" alone was then input into Google Search, given its nonconversational nature. The results from the ChatGPT questions and the Google Search results for the leading question were evaluated on their readability, accuracy, and effective delivery of information.
    RESULTS: The ChatGPT results were determined to be at an overall higher-grade reading level, with a longer reading duration, less accurate, less current, and with a less effective delivery of information. In contrast, the Google Search resulting answer box and snippets were at a lower-grade reading level, shorter reading duration, more current, able to reference the origin of the information (transparent), and provided the information in various formats in addition to text.
    CONCLUSIONS: ChatGPT has room for improvement in accuracy, transparency, recency, and reliability before it can equitably be implemented into health care information delivery and provide the potential benefits it poses. However, AI may be used as a tool for providers to educate their patients in preferred, creative, and efficient ways, such as using AI to generate accessible short educational videos from health care provider-vetted information. Larger studies representing a diverse group of users are needed.






  • 文章类型: Journal Article
    BACKGROUND: In reconstructive plastic surgery, the need for comprehensive research and systematic reviews is apparent due to the field\'s intricacies, influencing the evidence supporting specific procedures. Although Chat-GPT\'s knowledge is limited to September 2021, its integration into research proves valuable for efficiently identifying knowledge gaps. Therefore, this tool becomes a potent asset, directing researchers to focus on conducting systematic reviews where they are most necessary.
    METHODS: Chat-GPT 3.5 was prompted to generate 10 unpublished, innovative research topics on breast reconstruction surgery, followed by 10 additional subtopics. Results were filtered for systematic reviews in PubMed, and novel ideas were identified. To evaluate Chat-GPT\'s power in generating improved responses, two additional searches were conducted using search terms generated by Chat-GPT.
    RESULTS: Chat-GPT produced 83 novel ideas, leading to an accuracy rate of 83%. There was a wide range of novel ideas produced among topics such as transgender women, generating 10 ideas, whereas acellular dermal matrix (ADM) generated five ideas. Chat-GPT increased the total number of manuscripts generated by a factor of 2.3, 3.9, and 4.0 in the first, second, and third trials, respectively. While the search results were accurate to our manual searches (95.2% accuracy), the greater number of manuscripts potentially diluted the quality of articles, resulting in fewer novel systematic review ideas.
    CONCLUSIONS: Chat-GPT proves valuable in identifying gaps in the literature and offering insights into areas lacking research in breast reconstruction surgery. While it displays high sensitivity, refining its specificity is imperative. Prudent practice involves evaluating accomplished work and conducting a comprehensive review of all components involved.






  • 文章类型: Journal Article
    BACKGROUND: Artificial intelligence, particularly chatbot systems, is becoming an instrumental tool in health care, aiding clinical decision-making and patient engagement.
    OBJECTIVE: This study aims to analyze the performance of ChatGPT-3.5 and ChatGPT-4 in addressing complex clinical and ethical dilemmas, and to illustrate their potential role in health care decision-making while comparing seniors\' and residents\' ratings, and specific question types.
    METHODS: A total of 4 specialized physicians formulated 176 real-world clinical questions. A total of 8 senior physicians and residents assessed responses from GPT-3.5 and GPT-4 on a 1-5 scale across 5 categories: accuracy, relevance, clarity, utility, and comprehensiveness. Evaluations were conducted within internal medicine, emergency medicine, and ethics. Comparisons were made globally, between seniors and residents, and across classifications.
    RESULTS: Both GPT models received high mean scores (4.4, SD 0.8 for GPT-4 and 4.1, SD 1.0 for GPT-3.5). GPT-4 outperformed GPT-3.5 across all rating dimensions, with seniors consistently rating responses higher than residents for both models. Specifically, seniors rated GPT-4 as more beneficial and complete (mean 4.6 vs 4.0 and 4.6 vs 4.1, respectively; P<.001), and GPT-3.5 similarly (mean 4.1 vs 3.7 and 3.9 vs 3.5, respectively; P<.001). Ethical queries received the highest ratings for both models, with mean scores reflecting consistency across accuracy and completeness criteria. Distinctions among question types were significant, particularly for the GPT-4 mean scores in completeness across emergency, internal, and ethical questions (4.2, SD 1.0; 4.3, SD 0.8; and 4.5, SD 0.7, respectively; P<.001), and for GPT-3.5\'s accuracy, beneficial, and completeness dimensions.
    CONCLUSIONS: ChatGPT\'s potential to assist physicians with medical issues is promising, with prospects to enhance diagnostics, treatments, and ethics. While integration into clinical workflows may be valuable, it must complement, not replace, human expertise. Continued research is essential to ensure safe and effective implementation in clinical environments.






  • 文章类型: Journal Article
    Generative Artificial Intelligence foundation models (for example Generative Pre-trained Transformer - GPT - models) can generate the next token given a sequence of tokens. How can this \'generative AI\' be compared with the \'real\' intelligence of the human brain, when for example a human generates a whole memory in response to an incomplete retrieval cue, and then generates further prospective thoughts? Here these two types of generative intelligence, artificial in machines and real in the human brain are compared, and it is shown how when whole memories are generated by hippocampal recall in response to an incomplete retrieval cue, what the human brain computes, and how it computes it, are very different from generative AI. Key differences are the use of local associative learning rules in the hippocampal memory system, and of non-local backpropagation of error learning in AI. Indeed, it is argued that the whole operation of the human brain is performed computationally very differently to what is implemented in generative AI. Moreover, it is emphasized that the primate including human hippocampal system includes computations about spatial view and where objects and people are in scenes, whereas in rodents the emphasis is on place cells and path integration by movements between places. This comparison with generative memory and processing in the human brain has interesting implications for the further development of generative AI and for neuroscience research.






  • 文章类型: Journal Article
    BACKGROUND: Large language models (LLMs) hold potential for mental health applications. However, their opaque alignment processes may embed biases that shape problematic perspectives. Evaluating the values embedded within LLMs that guide their decision-making have ethical importance. Schwartz\'s theory of basic values (STBV) provides a framework for quantifying cultural value orientations and has shown utility for examining values in mental health contexts, including cultural, diagnostic, and therapist-client dynamics.
    OBJECTIVE: This study aimed to (1) evaluate whether the STBV can measure value-like constructs within leading LLMs and (2) determine whether LLMs exhibit distinct value-like patterns from humans and each other.
    METHODS: In total, 4 LLMs (Bard, Claude 2, Generative Pretrained Transformer [GPT]-3.5, GPT-4) were anthropomorphized and instructed to complete the Portrait Values Questionnaire-Revised (PVQ-RR) to assess value-like constructs. Their responses over 10 trials were analyzed for reliability and validity. To benchmark the LLMs\' value profiles, their results were compared to published data from a diverse sample of 53,472 individuals across 49 nations who had completed the PVQ-RR. This allowed us to assess whether the LLMs diverged from established human value patterns across cultural groups. Value profiles were also compared between models via statistical tests.
    RESULTS: The PVQ-RR showed good reliability and validity for quantifying value-like infrastructure within the LLMs. However, substantial divergence emerged between the LLMs\' value profiles and population data. The models lacked consensus and exhibited distinct motivational biases, reflecting opaque alignment processes. For example, all models prioritized universalism and self-direction, while de-emphasizing achievement, power, and security relative to humans. Successful discriminant analysis differentiated the 4 LLMs\' distinct value profiles. Further examination found the biased value profiles strongly predicted the LLMs\' responses when presented with mental health dilemmas requiring choosing between opposing values. This provided further validation for the models embedding distinct motivational value-like constructs that shape their decision-making.
    CONCLUSIONS: This study leveraged the STBV to map the motivational value-like infrastructure underpinning leading LLMs. Although the study demonstrated the STBV can effectively characterize value-like infrastructure within LLMs, substantial divergence from human values raises ethical concerns about aligning these models with mental health applications. The biases toward certain cultural value sets pose risks if integrated without proper safeguards. For example, prioritizing universalism could promote unconditional acceptance even when clinically unwise. Furthermore, the differences between the LLMs underscore the need to standardize alignment processes to capture true cultural diversity. Thus, any responsible integration of LLMs into mental health care must account for their embedded biases and motivation mismatches to ensure equitable delivery across diverse populations. Achieving this will require transparency and refinement of alignment techniques to instill comprehensive human values.






  • 文章类型: Journal Article
    OBJECTIVE: Since the beginning of 2023, ChatGPT emerged as a hot topic in healthcare research. The potential to be a valuable tool in clinical practice is compelling, particularly in improving clinical decision support by helping physicians to make clinical decisions based on the best medical knowledge available. We aim to investigate ChatGPT\'s ability to identify, diagnose and manage patients with otorhinolaryngology-related symptoms.
    METHODS: A prospective, cross-sectional study was designed based on an idea suggested by ChatGPT to assess the level of agreement between ChatGPT and five otorhinolaryngologists (ENTs) in 20 reality-inspired clinical cases. The clinical cases were presented to the chatbot on two different occasions (ChatGPT-1 and ChatGPT-2) to assess its temporal stability.
    RESULTS: The mean score of ChatGPT-1 was 4.4 (SD 1.2; min 1, max 5) and of ChatGPT-2 was 4.15 (SD 1.3; min 1, max 5), while the ENTs mean score was 4.91 (SD 0.3; min 3, max 5). The Mann-Whitney U test revealed a statistically significant difference (p < 0.001) between both ChatGPT\'s and the ENTs\'s score. ChatGPT-1 and ChatGPT-2 gave different answers in five occasions.
    CONCLUSIONS: Artificial intelligence will be an important instrument in clinical decision-making in the near future and ChatGPT is the most promising chatbot so far. Despite needing further development to be used with safety, there is room for improvement and potential to aid otorhinolaryngology residents and specialists in making the most correct decision for the patient.






  • 文章类型: Journal Article
    BACKGROUND: The purpose of this study was to evaluate the efficacy of an Artificial Intelligence Large Language Model (AI-LLM) at improving the readability foot and ankle orthopedic radiology reports.
    METHODS: The radiology reports from 100 foot or ankle X-Rays, 100 computed tomography (CT) scans and 100 magnetic resonance imaging (MRI) scans were randomly sampled from the institution\'s database. The following prompt command was inserted into the AI-LLM: \"Explain this radiology report to a patient in layman\'s terms in the second person: [Report Text]\". The mean report length, Flesch reading ease score (FRES) and Flesch-Kincaid reading level (FKRL) were evaluated for both the original radiology report and the AI-LLM generated report. The accuracy of the information contained within the AI-LLM report was assessed via a 5-point Likert scale. Additionally, any \"hallucinations\" generated by the AI-LLM report were recorded.
    RESULTS: There was a statistically significant improvement in mean FRES scores in the AI-LLM generated X-Ray report (33.8 ± 6.8 to 72.7 ± 5.4), CT report (27.8 ± 4.6 to 67.5 ± 4.9) and MRI report (20.3 ± 7.2 to 66.9 ± 3.9), all p < 0.001. There was also a statistically significant improvement in mean FKRL scores in the AI-LLM generated X-Ray report (12.2 ± 1.1 to 8.5 ± 0.4), CT report (15.4 ± 2.0 to 8.4 ± 0.6) and MRI report (14.1 ± 1.6 to 8.5 ± 0.5), all p < 0.001. Superior FRES scores were observed in the AI-LLM generated X-Ray report compared to the AI-LLM generated CT report and MRI report, p < 0.001. The mean Likert score for the AI-LLM generated X-Ray report, CT report and MRI report was 4.0 ± 0.3, 3.9 ± 0.4, and 3.9 ± 0.4, respectively. The rate of hallucinations in the AI-LLM generated X-Ray report, CT report and MRI report was 4%, 7% and 6%, respectively.
    CONCLUSIONS: AI-LLM was an efficacious tool for improving the readability of foot and ankle radiological reports across multiple imaging modalities. Superior FRES scores together with superior Likert scores were observed in the X-Ray AI-LLM reports compared to the CT and MRI AI-LLM reports. This study demonstrates the potential use of AI-LLMs as a new patient-centric approach for enhancing patient understanding of their foot and ankle radiology reports. Jel Classifications: IV.





