关键词: RoBERTa healthcare quality machine learning natural language processing patient care patient experience patient judgments patient-authored reviews psychology text classification web-based physician reviews

Mesh : Natural Language Processing Humans Algorithms Internet Female Male Physicians Physician-Patient Relations Judgment Adult Middle Aged

来  源:   DOI:10.2196/50236   PDF(Pubmed)

Abstract:
BACKGROUND: Patients increasingly rely on web-based physician reviews to choose a physician and share their experiences. However, the unstructured text of these written reviews presents a challenge for researchers seeking to make inferences about patients\' judgments. Methods previously used to identify patient judgments within reviews, such as hand-coding and dictionary-based approaches, have posed limitations to sample size and classification accuracy. Advanced natural language processing methods can help overcome these limitations and promote further analysis of physician reviews on these popular platforms.
OBJECTIVE: This study aims to train, test, and validate an advanced natural language processing algorithm for classifying the presence and valence of 2 dimensions of patient judgments in web-based physician reviews: interpersonal manner and technical competence.
METHODS: We sampled 345,053 reviews for 167,150 physicians across the United States from Healthgrades.com, a commercial web-based physician rating and review website. We hand-coded 2000 written reviews and used those reviews to train and test a transformer classification algorithm called the Robustly Optimized BERT (Bidirectional Encoder Representations from Transformers) Pretraining Approach (RoBERTa). The 2 fine-tuned models coded the reviews for the presence and positive or negative valence of patients\' interpersonal manner or technical competence judgments of their physicians. We evaluated the performance of the 2 models against 200 hand-coded reviews and validated the models using the full sample of 345,053 RoBERTa-coded reviews.
RESULTS: The interpersonal manner model was 90% accurate with precision of 0.89, recall of 0.90, and weighted F1-score of 0.89. The technical competence model was 90% accurate with precision of 0.91, recall of 0.90, and weighted F1-score of 0.90. Positive-valence judgments were associated with higher review star ratings whereas negative-valence judgments were associated with lower star ratings. Analysis of the data by review rating and physician gender corresponded with findings in prior literature.
CONCLUSIONS: Our 2 classification models coded interpersonal manner and technical competence judgments with high precision, recall, and accuracy. These models were validated using review star ratings and results from previous research. RoBERTa can accurately classify unstructured, web-based review text at scale. Future work could explore the use of this algorithm with other textual data, such as social media posts and electronic health records.
摘要:
背景:患者越来越依赖基于网络的医生评论来选择医生并分享他们的经验。然而,这些书面评论的非结构化文本对寻求推断患者判断的研究人员提出了挑战。以前用于在评论中识别患者判断的方法,例如手工编码和基于字典的方法,对样本量和分类精度构成了限制。先进的自然语言处理方法可以帮助克服这些限制,并促进对这些流行平台上的医生评论的进一步分析。
目的:本研究旨在训练,test,并验证了一种先进的自然语言处理算法,用于在基于网络的医师评论中对患者判断的两个维度的存在和效价进行分类:人际关系方式和技术能力。
方法:我们从Healthgrades.com抽取了美国167,150名医生的345,053条评论,一个基于网络的商业医生评级和评论网站。我们手工编码了2000份书面评论,并使用这些评论来训练和测试一种变压器分类算法,称为鲁棒优化BERT(来自变压器的双向编码器表示)预训练方法(RoBERTa)。这2个微调模型对患者的人际关系方式或医生的技术能力判断的存在和积极或消极的评价进行了编码。我们根据200份手工编码的评论评估了2个模型的性能,并使用345,053份RoBERTa编码的评论的完整样本验证了模型。
结果:人际关系方式模型的准确率为90%,精度为0.89,召回率为0.90,加权F1得分为0.89。技术能力模型的准确率为90%,准确率为0.91,召回率为0.90,加权F1得分为0.90。正价判断与较高的评论星级相关,而负价判断与较低的星级相关。通过评论评级和医生性别对数据的分析与先前文献中的发现相对应。
结论:我们的2个分类模型对人际关系方式和技术能力判断进行了高精度编码,召回,和准确性。使用评论星级评级和先前研究的结果验证了这些模型。RoBERTa可以准确地分类非结构化,基于网络的评论文本的规模。未来的工作可以探索这种算法与其他文本数据的使用,例如社交媒体帖子和电子健康记录。
公众号