关键词: Controlled vocabulary Electronic health records Natural language processing Psychological pain Suicide Veterans health services

Mesh : Humans Suicidal Ideation Algorithms Electronic Health Records Vocabulary, Controlled Veterans Natural Language Processing

来  源:   DOI:10.1016/j.jbi.2023.104582

Abstract:
Suicide risk prediction algorithms at the Veterans Health Administration (VHA) do not include predictors based on the 3-Step Theory of suicide (3ST), which builds on hopelessness, psychological pain, connectedness, and capacity for suicide. These four factors are not available from structured fields in VHA electronic health records, but they are found in unstructured clinical text. An ontology and controlled vocabulary that maps psychosocial and behavioral terms to these factors does not exist. The objectives of this study were 1) to develop an ontology with a controlled vocabulary of terms that map onto classes that represent the 3ST factors as identified within electronic clinical progress notes, and 2) to determine the accuracy of automated extractions based on terms in the controlled vocabulary.
A team of four annotators did linguistic annotation of 30,000 clinical progress notes from 231 Veterans in VHA electronic health records who attempted suicide or who died by suicide for terms relating to the 3ST factors. Annotation involved manually assigning a label to words or phrases that indicated presence or absence of the factor (polarity). These words and phrases were entered into a controlled vocabulary that was then used by our computational system to tag 14 million clinical progress notes from Veterans who attempted or died by suicide after 2013. Tagged text was extracted and machine-labelled for presence or absence of the 3ST factors. Accuracy of these machine-labels was determined for 1000 randomly selected extractions for each factor against a ground truth created by our annotators.
Linguistic annotation identified 8486 terms that related to 33 subclasses across the four factors and polarities. Precision of machine-labeled extractions ranged from 0.73 to 1.00 for most factor-polarity combinations, whereas recall was somewhat lower 0.65-0.91.
The ontology that was developed consists of classes that represent each of the four 3ST factors, subclasses, relationships, and terms that map onto those classes which are stored in a controlled vocabulary (https://bioportal.bioontology.org/ontologies/THREE-ST). The use case that we present shows how scores based on clinical notes tagged for terms in the controlled vocabulary capture meaningful change in the 3ST factors during weeks preceding a suicidal event.
摘要:
目的:退伍军人健康管理局(VHA)的自杀风险预测算法不包括基于三步自杀理论(3ST)的预测因子,建立在绝望之上,心理疼痛,连通性,和自杀能力。这四个因素不能从VHA电子健康记录的结构化字段中获得,但它们存在于非结构化的临床文本中。不存在将心理社会和行为术语映射到这些因素的本体论和受控词汇。这项研究的目的是1)开发具有受控词汇的本体论,这些词汇映射到代表电子临床进展记录中确定的3ST因素的类别上,和2)基于受控词汇表中的术语来确定自动提取的准确性。
方法:一个由四个注释者组成的团队对VHA电子健康记录中231名退伍军人的30,000个临床进展记录进行了语言注释,这些退伍军人企图自杀或因自杀而死亡,涉及与3ST因素有关的术语。注释涉及手动将标签分配给指示存在或不存在因素(极性)的单词或短语。这些单词和短语被输入到一个受控的词汇中,然后被我们的计算系统用来标记2013年后企图自杀或死于自杀的退伍军人的1400万份临床进展记录。提取标记文本并机器标记是否存在3ST因子。根据我们的注释者创建的地面实况,针对每个因素随机选择的1000次提取,确定了这些机器标签的准确性。
结果:语言注释确定了8486个术语,涉及四个因素和极性的33个子类。对于大多数因子-极性组合,机器标记的提取精度范围为0.73至1.00,而召回率略低0.65-0.91。
结论:开发的本体由代表四个3ST因子的类组成,子类,关系,以及映射到存储在受控词汇表(https://bioportal)中的那些类的术语。bioontology.org/ontologies/THREE-ST)。我们提供的用例显示了基于临床注释标记的受控词汇中的术语的分数如何在自杀事件发生前的几周内捕获3ST因子的有意义的变化。
公众号