随着患者复杂性的增加,其数据存储在零散的健康信息系统中,有效的临床决策需要从患者病史中收集重要信息的自动化和时效性方法。使用COVID-19作为案例研究,我们开发了一个带有用户反馈的查询机器人信息检索系统,使临床医生能够提出自然问题,从而从患者笔记中检索数据.
我们应用了临床BERT,预先训练的上下文语言模型,到我们的病人笔记数据集来获得句子嵌入,使用K均值来减少实时交互的计算时间。然后采用Rocchio算法来合并用户反馈并提高检索性能。
在迭代反馈循环实验中,最终迭代的MAP为0.93/0.94,而普通和1的初始MAP为0.66/0.52。/1.与COVID-19特定查询的0.79/0.83相比,确认上下文模型处理自然语言查询和反馈中的歧义有助于提高检索性能。用户在环实验也优于自动伪相关反馈方法。此外,假设在初始检索和相关性反馈之间具有相同精度的零假设被拒绝,具有很高的统计意义(p<0.05)。与Word2Vec相比,TF-IDF和biobert模型,clinicalBERT工作最佳考虑响应精度和用户反馈之间的平衡。
我们的模型适用于通用和COVID-19特定的查询。然而,一些通用查询没有回答以及其他,因为聚类会降低查询性能,并且查询和句子之间的模糊关系被认为是不相关的。我们还针对具有相同含义但不同表达式的查询测试了我们的模型,并证明了这些查询变体在合并用户反馈后产生了类似的性能。
总之,我们开发了一个基于NLP的查询机器人,它可以处理同义词和自然语言歧义,以便从患者图表中检索相关信息。用户反馈对于提高模型性能至关重要。
With increasing patient complexity whose data are stored in fragmented health information systems, automated and time-efficient ways of gathering important information from the patients\' medical history are needed for effective clinical decision making. Using COVID-19 as a
case study, we developed a query-bot information retrieval system with user-feedback to allow clinicians to ask natural questions to retrieve data from patient notes.
We applied clinicalBERT, a pre-trained contextual language model, to our dataset of patient notes to obtain sentence embeddings, using K-Means to reduce computation time for real-time interaction. Rocchio algorithm was then employed to incorporate user-feedback and improve retrieval performance.
In an iterative feedback loop experiment, MAP for final iteration was 0.93/0.94 as compared to initial MAP of 0.66/0.52 for generic and 1./1. compared to 0.79/0.83 for COVID-19 specific queries confirming that contextual model handles the ambiguity in natural language queries and feedback helps to improve retrieval performance. User-in-loop experiment also outperformed the automated pseudo relevance feedback method. Moreover, the null hypothesis which assumes identical precision between initial retrieval and relevance feedback was rejected with high statistical significance (p ≪ 0.05). Compared to Word2Vec, TF-IDF and bioBERT models, clinicalBERT works optimally considering the balance between response precision and user-feedback.
Our model works well for generic as well as COVID-19 specific queries. However, some generic queries are not answered as well as others because clustering reduces query performance and vague relations between queries and sentences are considered non-relevant. We also tested our model for queries with the same meaning but different expressions and demonstrated that these query variations yielded similar performance after incorporation of user-feedback.
In conclusion, we develop an NLP-based query-bot that handles synonyms and natural language ambiguity in order to retrieve relevant information from the patient chart. User-feedback is critical to improve model performance.