Diagnose

诊断
  • 文章类型: Journal Article
    对电子健康记录(EHR)和数据类型(即,诊断,药物,和实验室数据)要求评估其数据质量作为一种基本方法,特别是由于需要确定患有慢性病的适当分母人群,例如2型糖尿病(T2D),使用通常可用的可计算表型定义(即,表型)。
    为了弥合这一差距,我们的研究旨在评估表型中的EHR数据质量和变异以及稳健性(或缺乏)问题如何对分母群体的识别产生潜在影响.
    大约208,000名T2D患者被纳入我们的研究,该研究使用了约翰·霍普金斯大学医疗机构(JHMI)2017-2019年的回顾性EHR数据。我们的评估包括4个已发表的表型和1个来自Hopkins专家小组的定义。我们对人口统计进行了描述性分析(即,年龄,性别,种族,和种族),使用医疗保健(住院和急诊室就诊),和每个表型的平均Charlson合并症指数得分。然后,我们使用不同的方法来诱导或模拟完整性的数据质量问题,准确度,和时效性分别跨每个表型。对于诱发的数据不完整,我们的模型随机放弃诊断,药物,和实验室代码以10%的增量独立;对于诱导的数据不准确,我们的模型用相同数据类型的另一个代码随机替换诊断或药物代码,并在实验室结果值中从-100%到+10%引起2%的增量变化;最后,为了及时性,数据被建模为诱导的日期记录增量转移30天到365天.
    在使用EHR的所有表型中,不到四分之一(n=47,326,23%)的人口重叠。通过每种表型识别的群体在数据类型的所有组合中变化。诱发的不完整性识别出每次增加的患者较少;例如,在100%诊断不完整的情况下,慢性病数据仓库表型确定为零患者,因为其表型特征仅包括诊断代码。诱导的不准确性和及时性类似地证明了每个表型的性能变化,因此,每次增加的变化导致更少的患者被识别。
    我们使用EHR数据进行诊断,药物,和来自大型三级医院系统的实验室数据类型,以了解T2D表型差异和性能。我们使用诱导数据质量方法来了解数据质量问题如何影响临床分母群体的识别(例如,临床研究和试验,人口健康评估)和财务或运营决策。我们研究的新结果可能为未来塑造可应用于临床信息学的常见T2D可计算表型定义的方法提供信息。管理慢性病,以及整个行业在医疗保健方面的额外努力。
    UNASSIGNED: Increasing and substantial reliance on electronic health records (EHRs) and data types (ie, diagnosis, medication, and laboratory data) demands assessment of their data quality as a fundamental approach, especially since there is a need to identify appropriate denominator populations with chronic conditions, such as type 2 diabetes (T2D), using commonly available computable phenotype definitions (ie, phenotypes).
    UNASSIGNED: To bridge this gap, our study aims to assess how issues of EHR data quality and variations and robustness (or lack thereof) in phenotypes may have potential impacts in identifying denominator populations.
    UNASSIGNED: Approximately 208,000 patients with T2D were included in our study, which used retrospective EHR data from the Johns Hopkins Medical Institution (JHMI) during 2017-2019. Our assessment included 4 published phenotypes and 1 definition from a panel of experts at Hopkins. We conducted descriptive analyses of demographics (ie, age, sex, race, and ethnicity), use of health care (inpatient and emergency room visits), and the average Charlson Comorbidity Index score of each phenotype. We then used different methods to induce or simulate data quality issues of completeness, accuracy, and timeliness separately across each phenotype. For induced data incompleteness, our model randomly dropped diagnosis, medication, and laboratory codes independently at increments of 10%; for induced data inaccuracy, our model randomly replaced a diagnosis or medication code with another code of the same data type and induced 2% incremental change from -100% to +10% in laboratory result values; and lastly, for timeliness, data were modeled for induced incremental shift of date records by 30 days to 365 days.
    UNASSIGNED: Less than a quarter (n=47,326, 23%) of the population overlapped across all phenotypes using EHRs. The population identified by each phenotype varied across all combinations of data types. Induced incompleteness identified fewer patients with each increment; for example, at 100% diagnostic incompleteness, the Chronic Conditions Data Warehouse phenotype identified zero patients, as its phenotypic characteristics included only diagnosis codes. Induced inaccuracy and timeliness similarly demonstrated variations in performance of each phenotype, therefore resulting in fewer patients being identified with each incremental change.
    UNASSIGNED: We used EHR data with diagnosis, medication, and laboratory data types from a large tertiary hospital system to understand T2D phenotypic differences and performance. We used induced data quality methods to learn how data quality issues may impact identification of the denominator populations upon which clinical (eg, clinical research and trials, population health evaluations) and financial or operational decisions are made. The novel results from our study may inform future approaches to shaping a common T2D computable phenotype definition that can be applied to clinical informatics, managing chronic conditions, and additional industry-wide efforts in health care.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:颈部反射点或Adler-Langer点通常用于神经治疗中以检测所谓的干扰场。慢性刺激或炎症在鼻窦,牙齿,扁桃体或耳朵应该引起上颈椎软组织和短肌肉的紧张和压痛。个体治疗策略基于诊断Adler-Langer点触诊的结果。这项研究调查了评估者之间和内部的可靠性,并探讨了治疗效果。
    方法:我们对104例住院患者(80.8%为女性,51.812.74年),来自德国内科和综合医学系。根据病理结果(n=48)或不治疗(n=56),将患者随机分配至单独的神经治疗。在每个病人中,三名有经验的评估者(在神经治疗方面有20-45年的经验)和两名新手评估者(医学生)在标准化等级量表上对Adler-Langer点刚性进行了评估(“强”,\"弱\",\"none\")。患者使用相同的量表独立评估八个点的触诊压痛。在八个Adler-Langer点评估压力疼痛阈值。所有患者在30分钟后重新测试。五名评估者对其他评估者的治疗分配和评估视而不见。获得视频记录以评估由不同评估者测试的区域的一致性。
    结果:患者和评估者之间的协议(Cohen的Kappa=.161-.400)和评估者之间的可靠性较低(Fleiss-Kappa=.132-.150)。此外,即使在有经验的评估者中,个体一致性(未经治疗的患者的前后比较)也同样低(Cohen'sKappa=.099-.173).视频文档表明,评估者没有将手指放在正确的片段中(正确位置的百分比为42.0-60.6%)。与对照组相比,八个Adler-Langer点中的五个点的压力疼痛阈值在治疗后显示出显着变化。
    结论:在这种人工实验环境下,这种Adler-Langer点触诊方法尚未被证明是可靠的诊断工具。但它可以显示,正如该方法所声称的那样,神经治疗后,八个Adler-Langer点中有五个的压痛降低。
    BACKGROUND: Neck reflex points or Adler-Langer points are commonly used in neural therapy to detect so-called interference fields. Chronic irritations or inflammations in the sinuses, teeth, tonsils, or ears are supposed to induce tension and tenderness of the soft tissues and short muscles in the upper cervical spine. The individual treatment strategy is based on the results of diagnostic Adler-Langer point palpation. This study investigated the inter- and intra-rater reliability and explored treatment effects.
    METHODS: We performed a randomized controlled trial with 104 inpatients (80.8% female, 51.8 ± 12.74 years) of a German department for internal and integrative medicine. Patients were randomized to individual neural therapy according to the pathological findings (n = 48) or no treatment (n = 56). In each patient, three experienced raters (20-45 years of experience in neural therapy) and two novice raters (medical students) rated Adler-Langer points rigidity on a standardized rating scale (\"strong,\" \"weak,\" \"none\"). The patients independently evaluated the tenderness on palpation of the eight points using the same scale. Pressure pain thresholds were assessed at the eight Adler-Langer points. All patients were retested after 30 min. The five raters were blinded to treatment allocation and assessments of the other raters. Video recordings were obtained to assess the consistency of the areas tested by the different raters.
    RESULTS: Agreement between patients and raters (Cohen\'s kappa = 0.161-0.400) and inter-rater reliability were low (Fleiss kappa = 0.132-0.150). Moreover, the individual agreement (pre-post comparisons in untreated patients) was similarly low even in experienced raters (Cohen\'s kappa = 0.099-0.173). Video documentation suggests that raters do not place their fingers in the correct segments (percentage of correct position: 42.0-60.6%). Pressure pain thresholds at five of the eight Adler-Langer points showed significant changes after treatment compared to none in the control group.
    CONCLUSIONS: Under this artificial experimental setting, this method of Adler-Langer point palpation has not proven to be a reliable diagnostic tool. But it could be shown that, as claimed by the method, the tenderness in five of eight Adler-Langer points decreased after neural therapy.
    Hintergrund Nackenreflexpunkte oder Adler-Langer-Punkte werden in der Neuraltherapie häufig zum Aufspüren sogenannter Störfelder eingesetzt. Chronische Reizungen oder Entzündungen im Bereich der Nasennebenhöhlen, der Zähne, der Mandeln oder der Ohren sollen zu Verspannungen der kurzen Muskeln sowie zu gesteigerter Druckdolenz des Bindegewebes im Bereich der oberen Halswirbelsäule führen. Die individuelle Behandlungsstrategie richtet sich nach den Ergebnissen der diagnostischen Palpation der Adler-Langer-Punkte. Diese Studie untersuchte die Inter- und Intra-Rater-Reliabilität sowie die Behandlungseffekte.Methoden Wir führten eine randomisiert-kontrollierte Studie mit 104 stationären Patienten (80.8% weiblich, 51.8 ± 12.74 Jahre) einer deutschen Abteilung für Innere und Integrative Medizin durch. Die Patienten wurden randomisiert einer individuellen Neuraltherapie entsprechend dem pathologischen Befund (n = 48) oder keiner Behandlung (n = 56) zugewiesen. Bei jedem Patienten bewerteten drei erfahrene Ärzte (20–45 Jahre Erfahrung in der Neuraltherapie) und zwei unerfahrene Untersucher (Medizinstudenten) die Rigidität der Adler-Langer-Punkte auf einer standardisierten Bewertungsskala (“stark,” “schwach,” “keine”). Die Patienten bewerteten ebenfalls die Schmerzempfindlichkeit bei der Palpation der acht Punkte anhand derselben Skala. Die Druckschmerzschwellen wurden an den acht Adler-Langer-Punkten ermittelt. Alle Patienten wurden nach 30 minuten erneut getestet. Die fünf Untersucher waren gegenüber der Behandlungszuweisung und den Bewertungen der anderen Untersucher verblindet. Es wurden Videoaufzeichnungen angefertigt, um die Korrektheit der von den verschiedenen Untersuchern getesteten Bereiche zu bewerten.Ergebnisse Die Übereinstimmung zwischen Patienten und Untersuchern (Cohen’s Kappa = 0.161–0.400) und die Zuverlässigkeit zwischen den Untersuchern waren gering (Fleiss-Kappa = 0.132–0.150). Darüber hinaus war die individuelle Übereinstimmung (Prä-Post-Vergleiche bei unbehandelten Patienten) selbst bei erfahrenen Beurteilern ähnlich gering (Cohen’s Kappa = 0.099–0.173). Die Videodokumentation deutet darauf hin, dass die Untersucher ihre Finger nicht in den richtigen Segmenten platzieren (Prozentsatz der korrekten Position 42.0–60.6%). Die Druckschmerzschwellen an fünf der acht Adler-Langer-Punkte wiesen nach der Behandlung signifikante Veränderungen auf, in der unbehandelten Kontrollgruppe dagegen nicht.Schlussfolgerung Unter diesen artifiziellen experimentellen Bedingungen hat sich die Methode der Palpation der Adler-Langer-Punkte nicht als zuverlässiges diagnostisches Instrument erwiesen. Es konnte jedoch gezeigt werden, dass, wie von der Methode behauptet, die Druckdolenz in fünf von acht Adler-Langer-Punkten nach der Neuraltherapie abnahm.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:血瘀是血液的减慢或停滞,可引起代谢,肌肉骨骼,和妇科疾病。本研究通过提取与妇科疾病相关的临床指标,使用血瘀问卷I和II(BSQ-I和II,分别)并分析了横断面研究的临床数据。
    方法:总共,103名符合妇科疾病标准的年龄在25-65岁之间的妇女被纳入这项研究。使用BSQ-II评估血瘀评分(BSS),并将其分为BSS和非BSS组。为了评估BSQ-GD的可靠性,内部稠度系数采用克朗巴赫α。此外,对与妇科疾病相关的临床症状进行相关性分析,并通过比较两组证实判别效度。使用逻辑回归确定预测准确性,并通过敏感性和特异性分析确定BSQ-GD的临界值。
    结果:BSQ-GD显示出令人满意的内部一致性(Cronbach'sα系数=0.71)和有效性,血瘀组(22.30±3.34)和非血瘀组(14.93±3.49)的平均得分差异有统计学意义。当Youden指数(73.45)和一致概率(0.75)达到最大值时,BSQ-GD得分的临界值为19分。接收器工作特性曲线下的面积约为96%,根据截断值诊断准确率的敏感性和特异性分别为80.95%和92.50%,分别。
    结论:BSQ-GD可作为判断妇科疾病患者血瘀的合适工具,其诊断依据临界值的敏感性较高。
    OBJECTIVE: Blood stasis is the slowing or stagnation of blood and can cause metabolic, musculoskeletal, and gynecological diseases. This study developed the Blood Stasis Questionnaire for gynecological disease (BSQ-GD) by extracting clinical indicators related to gynecological diseases using the Blood Stasis Questionnaires I and II (BSQ-I and II, respectively) and analyzed the clinical data of a cross-sectional study.
    METHODS: In total, 103 women aged between 25 and 65 years who met gynecological disease criteria were enrolled in this study. Blood stasis scores (BSS) were evaluated using the BSQ-II and categorized into BSS and non-BSS groups. To assess the reliability of BSQ-GD, the internal consistency coefficient was employed using Cronbach\'s α. Furthermore, correlation analyses were conducted for the clinical symptoms related to gynecological diseases, and the discriminant validity was confirmed by comparing the two groups. The prediction accuracy was determined using logistic regression and the cut-off value of the BSQ-GD was established via the sensitivity and specificity calculations.
    RESULTS: The BSQ-GD showed satisfactory internal consistency (Cronbach\'s α coefficient = 0.71) and validity, with significant differences in mean scores between blood stasis (22.30 ± 3.34) and non-blood stasis (14.93 ± 3.49) groups. The cut-off value of the BSQ-GD score was 19 points when the Youden index (73.45) and the concordance probability (0.75) were at their maximum. The area under the receiver operating characteristic curve was approximately 96%, and the sensitivity and specificity of the diagnostic accuracy according to the cut-off value are 80.95% and 92.50%, respectively.
    CONCLUSIONS: The BSQ-GD can be an appropriate instrument to estimate blood stasis in patients with gynecological diseases; its diagnostic sensitivity according to the cut-off value is high.
    Ziele Die Blutstase ist eine Verlangsamung oder Stagnation des Blutes und kann metabolische, muskuloskelettale und gynäkologische Erkrankungen verursachen. In der vorliegenden Studie wurde der Fragebogen zur Blutstase bei gynäkologischen Erkrankungen (Blood Stasis Questionnaire for gynecological disease, BSQ-GD) entwickelt, indem klinische Indikatoren im Zusammenhang mit gynäkologischen Erkrankungen aus den Blutstasefragebögen I und II (Blood Stasis Questionnaires I und II, BSQ-I bzw. II) extrahiert und die klinischen Daten einer Querschnittsstudie analysiert wurden.Patientinnen und Methoden Insgesamt wurden 103 Frauen im Alter von 25 bis 65 Jahren, die die Kriterien einer gynäkologischen Erkrankung erfüllten, in diese Studie aufgenommen. Die Blutstase-Scores (BSS) wurden mit dem BSQ-II bewertet und in eine BSS- und eine Nicht-BSS-Gruppe unterteilt. Zur Beurteilung der Zuverlässigkeit des BSQ-GD wurde der Koeffizient für interne Konsistenz Cronbachs Alpha verwendet. Darüber hinaus erfolgten Korrelationsanalysen für die klinischen Symptome im Zusammenhang mit gynäkologischen Erkrankungen und die Diskriminanzvalidität wurde durch den Vergleich der beiden Gruppen bestätigt. Die Vorhersagegenauigkeit wurde durch logistische Regression ermittelt und der Cut-off-Wert des BSQ-GD wurde durch Berechnung der Sensitivität und Spezifität bestimmt.Ergebnisse Der BSQ-GD wies eine zufriedenstellende interne Konsistenz (Cronbachs α-Koeffizient = 0,71) und Validität auf, wobei signifikante Unterschiede in den mittleren Scores zwischen den Gruppen mit Blutstase und ohne Blutstase bestanden (22,30 ± 3,34 bzw. 14,93 ± 3,49). Der Cut-off-Wert des BSQ-GD-Scores lag bei 19 Punkten, wo der Youden-Index (73,45) und die Konkordanzwahrscheinlichkeit (0,75) am höchsten waren. Die Fläche unter der Receiver-Operating-Characteristic-Kurve betrug etwa 96%, und die Sensitivität und Spezifität der diagnostischen Genauigkeit in Abhängigkeit vom Cut-off-Wert lagen bei 80,95% bzw. 92,50%.Schlussfolgerung Der BSQ-GD kann ein geeignetes Instrument zur Beurteilung der Blutstase bei Patientinnen mit gynäkologischen Erkrankungen sein; seine diagnostische Sensitivität entsprechend dem Cut-off-Wert ist hoch.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    具有进行预诊断的能力,消费者可穿戴设备有可能影响后续诊断和医疗保健服务的水平。尽管如此,由于电子健康记录(EHR)中缺乏编码术语来捕获可穿戴使用,因此阻碍了对消费者可穿戴设备的上市后监控。
    我们试图开发一种基于监督的弱方法,以证明基于EHR的上市后监测对导致房颤(AF)预诊断的消费者可穿戴设备的可行性和有效性。
    我们应用了数据编程,其中标记启发式表示为基于代码的标记函数,检测房颤预诊断事件。然后使用Snorkel框架从标记函数的预测中得出标记器模型。标签器模型被应用于临床笔记,以概率标记它们,然后将标记的注释用作训练集,以微调称为Clinical-Longformer的分类器。所得到的分类器识别出具有AF预诊断的患者。进行了一项回顾性队列研究,其中将分类器识别的患者的基线特征和后续护理模式与未接受预诊断的患者进行比较.
    从标记函数得出的贴标机模型在训练集上显示出很高的准确性(0.92;F1分数=0.77)。在概率标记的注释上训练的分类器准确地识别出具有AF预诊断的患者(0.95;F1得分=0.83)。使用构建的系统进行的队列研究具有足够的统计能力来验证AppleHeart研究的关键发现,注册了更多的参与者,接受预诊断的患者往往年龄较大,男性,和白色与更高的CHA2DS2-VASc(充血性心力衰竭,高血压,年龄≥75岁,糖尿病,中风,血管疾病,年龄65-74岁,性别类别)得分(P<.001)。我们还发现,预先诊断的患者更可能使用抗凝剂(525/1037,50.63%vs5936/16,560,35.85%),并且最终诊断为AF(305/1037,29.41%vs262/16,560,1.58%)。在索引诊断时,预诊断的存在并没有根据临床特征区分患者,但确实与抗凝剂处方相关(阿哌沙班P=.004,利伐沙班P=.01)。
    我们的工作确立了基于EHR的监测系统的可行性和有效性,该系统适用于可进行AF预诊断的消费者可穿戴设备。需要进一步的工作来将这些发现推广到其他地点的患者人群。
    UNASSIGNED: With the capability to render prediagnoses, consumer wearables have the potential to affect subsequent diagnoses and the level of care in the health care delivery setting. Despite this, postmarket surveillance of consumer wearables has been hindered by the lack of codified terms in electronic health records (EHRs) to capture wearable use.
    UNASSIGNED: We sought to develop a weak supervision-based approach to demonstrate the feasibility and efficacy of EHR-based postmarket surveillance on consumer wearables that render atrial fibrillation (AF) prediagnoses.
    UNASSIGNED: We applied data programming, where labeling heuristics are expressed as code-based labeling functions, to detect incidents of AF prediagnoses. A labeler model was then derived from the predictions of the labeling functions using the Snorkel framework. The labeler model was applied to clinical notes to probabilistically label them, and the labeled notes were then used as a training set to fine-tune a classifier called Clinical-Longformer. The resulting classifier identified patients with an AF prediagnosis. A retrospective cohort study was conducted, where the baseline characteristics and subsequent care patterns of patients identified by the classifier were compared against those who did not receive a prediagnosis.
    UNASSIGNED: The labeler model derived from the labeling functions showed high accuracy (0.92; F1-score=0.77) on the training set. The classifier trained on the probabilistically labeled notes accurately identified patients with an AF prediagnosis (0.95; F1-score=0.83). The cohort study conducted using the constructed system carried enough statistical power to verify the key findings of the Apple Heart Study, which enrolled a much larger number of participants, where patients who received a prediagnosis tended to be older, male, and White with higher CHA2DS2-VASc (congestive heart failure, hypertension, age ≥75 years, diabetes, stroke, vascular disease, age 65-74 years, sex category) scores (P<.001). We also made a novel discovery that patients with a prediagnosis were more likely to use anticoagulants (525/1037, 50.63% vs 5936/16,560, 35.85%) and have an eventual AF diagnosis (305/1037, 29.41% vs 262/16,560, 1.58%). At the index diagnosis, the existence of a prediagnosis did not distinguish patients based on clinical characteristics, but did correlate with anticoagulant prescription (P=.004 for apixaban and P=.01 for rivaroxaban).
    UNASSIGNED: Our work establishes the feasibility and efficacy of an EHR-based surveillance system for consumer wearables that render AF prediagnoses. Further work is necessary to generalize these findings for patient populations at other sites.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:对新兴传染病的实时监测需要动态发展,可计算的案例定义,经常包含与症状相关的标准。对于症状检测,人口健康监测平台和研究计划都主要依赖于从电子健康记录中提取的结构化数据。
    目的:本研究旨在验证和测试基于人工智能(AI)的自然语言处理(NLP)管道,用于检测儿科患者的医生记录中的COVID-19症状。我们专门研究到急诊科(ED)就诊的患者,这些患者可能是暴发中的前哨病例。
    方法:这项回顾性队列研究的受试者是21岁及以下的患者,他在2020年3月1日至2022年5月31日期间在一家大型学术儿童医院接受儿科ED治疗。根据疾病控制和预防中心(CDC)标准,所有患者的ED注释都用NLP管道处理,以检测11种COVID-19症状的提及。对于黄金标准,3位主题专家标记了226个ED注释,并且具有很强的一致性(F1评分=0.986;阳性预测值[PPV]=0.972;灵敏度=1.0)。F1分数,PPV,和敏感性用于比较NLP和国际疾病分类的性能,第10次修订(ICD-10)编码为黄金标准图表审查。作为形成性用例,在SARS-CoV-2变种时代测量了症状模式的变化。
    结果:在研究期间有85,678次ED发作,包括4%(n=3420)的COVID-19患者。NLP在识别与有任何COVID-19症状(F1评分=0.796)的患者的相遇方面比ICD-10代码(F1评分=0.451)更准确。阳性症状的NLP准确性(敏感性=0.930)高于ICD-10(敏感性=0.300)。然而,阴性症状(特异性=0.994)的ICD-10准确性高于NLP(特异性=0.917)。充血或流鼻涕显示出最高的准确性差异(NLP:F1评分=0.828,ICD-10:F1评分=0.042)。对于与COVID-19患者的接触,每种NLP症状的患病率估计在不同的时代有所不同。与没有这种疾病的患者相比,患有COVID-19的患者更有可能检测到每种NLP症状。影响大小(赔率比)在大流行时代有所不同。
    结论:这项研究确立了基于AI的NLP作为儿科患者实时检测COVID-19症状的高效工具的价值,优于传统的ICD-10方法。它还揭示了不同病毒变体中症状流行的演变性质,强调了对动态的需求,传染病监测中的技术驱动方法。
    BACKGROUND: Real-time surveillance of emerging infectious diseases necessitates a dynamically evolving, computable case definition, which frequently incorporates symptom-related criteria. For symptom detection, both population health monitoring platforms and research initiatives primarily depend on structured data extracted from electronic health records.
    OBJECTIVE: This study sought to validate and test an artificial intelligence (AI)-based natural language processing (NLP) pipeline for detecting COVID-19 symptoms from physician notes in pediatric patients. We specifically study patients presenting to the emergency department (ED) who can be sentinel cases in an outbreak.
    METHODS: Subjects in this retrospective cohort study are patients who are 21 years of age and younger, who presented to a pediatric ED at a large academic children\'s hospital between March 1, 2020, and May 31, 2022. The ED notes for all patients were processed with an NLP pipeline tuned to detect the mention of 11 COVID-19 symptoms based on Centers for Disease Control and Prevention (CDC) criteria. For a gold standard, 3 subject matter experts labeled 226 ED notes and had strong agreement (F1-score=0.986; positive predictive value [PPV]=0.972; and sensitivity=1.0). F1-score, PPV, and sensitivity were used to compare the performance of both NLP and the International Classification of Diseases, 10th Revision (ICD-10) coding to the gold standard chart review. As a formative use case, variations in symptom patterns were measured across SARS-CoV-2 variant eras.
    RESULTS: There were 85,678 ED encounters during the study period, including 4% (n=3420) with patients with COVID-19. NLP was more accurate at identifying encounters with patients that had any of the COVID-19 symptoms (F1-score=0.796) than ICD-10 codes (F1-score =0.451). NLP accuracy was higher for positive symptoms (sensitivity=0.930) than ICD-10 (sensitivity=0.300). However, ICD-10 accuracy was higher for negative symptoms (specificity=0.994) than NLP (specificity=0.917). Congestion or runny nose showed the highest accuracy difference (NLP: F1-score=0.828 and ICD-10: F1-score=0.042). For encounters with patients with COVID-19, prevalence estimates of each NLP symptom differed across variant eras. Patients with COVID-19 were more likely to have each NLP symptom detected than patients without this disease. Effect sizes (odds ratios) varied across pandemic eras.
    CONCLUSIONS: This study establishes the value of AI-based NLP as a highly effective tool for real-time COVID-19 symptom detection in pediatric patients, outperforming traditional ICD-10 methods. It also reveals the evolving nature of symptom prevalence across different virus variants, underscoring the need for dynamic, technology-driven approaches in infectious disease surveillance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:日本的医学生接受为期2年的研究生住院医师课程,以获得临床知识和一般医学技能。普通医学培训考试(GM-ITE)评估研究生住院医师的临床知识。临床模拟视频(CSV)可以评估学习者的人际交往能力。
    目的:本研究旨在评估GM-ITE评分与住院医师的诊断技能之间的关系,让他们观看CSV,并探索住院医师对CSV现实的看法,教育价值,并影响他们的学习动机。
    方法:参与者包括在2021年1月21日至1月28日期间服用GM-ITE的56名研究生住院医师;观看CSV;然后提供诊断。比较了CSV和GM-ITE评分,并使用歧视指数检查模拟的有效性,其中≥0.20表示具有较高的鉴别力,而>0.40表示对受试者资格的很好衡量。此外,我们进行了匿名调查问卷,以确定参与者对CSV的现实性和教育价值及其对学习动机的影响的看法.
    结果:在56名参与者中,6(11%)提供了正确的诊断,都是研究生二年级的。所有领域都具有很高的鉴别力。(匿名)随访反应表明,CSV格式比常规GM-ITE更适合评估临床能力。匿名调查显示,12(52%)参与者发现CSV格式比GM-ITE更适合评估临床能力,18(78%)肯定了视频模拟的真实感,17人(74%)表示这种经历增加了他们的学习动力。
    结论:研究结果表明,模拟真实世界临床检查的CSV模块成功地评估了多个领域的受检者的临床能力。研究表明,CSV不仅增强了对诊断技能的评估,而且还积极影响了学习者的动机。表明模拟在医学教育中的多方面作用。
    BACKGROUND: Medical students in Japan undergo a 2-year postgraduate residency program to acquire clinical knowledge and general medical skills. The General Medicine In-Training Examination (GM-ITE) assesses postgraduate residents\' clinical knowledge. A clinical simulation video (CSV) may assess learners\' interpersonal abilities.
    OBJECTIVE: This study aimed to evaluate the relationship between GM-ITE scores and resident physicians\' diagnostic skills by having them watch a CSV and to explore resident physicians\' perceptions of the CSV\'s realism, educational value, and impact on their motivation to learn.
    METHODS: The participants included 56 postgraduate medical residents who took the GM-ITE between January 21 and January 28, 2021; watched the CSV; and then provided a diagnosis. The CSV and GM-ITE scores were compared, and the validity of the simulations was examined using discrimination indices, wherein ≥0.20 indicated high discriminatory power and >0.40 indicated a very good measure of the subject\'s qualifications. Additionally, we administered an anonymous questionnaire to ascertain participants\' views on the realism and educational value of the CSV and its impact on their motivation to learn.
    RESULTS: Of the 56 participants, 6 (11%) provided the correct diagnosis, and all were from the second postgraduate year. All domains indicated high discriminatory power. The (anonymous) follow-up responses indicated that the CSV format was more suitable than the conventional GM-ITE for assessing clinical competence. The anonymous survey revealed that 12 (52%) participants found the CSV format more suitable than the GM-ITE for assessing clinical competence, 18 (78%) affirmed the realism of the video simulation, and 17 (74%) indicated that the experience increased their motivation to learn.
    CONCLUSIONS: The findings indicated that CSV modules simulating real-world clinical examinations were successful in assessing examinees\' clinical competence across multiple domains. The study demonstrated that the CSV not only augmented the assessment of diagnostic skills but also positively impacted learners\' motivation, suggesting a multifaceted role for simulation in medical education.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Randomized Controlled Trial
    目的:胃萎缩和肠上皮化生(IM)患者有胃癌的风险,需要准确的风险评估。我们旨在使用深度学习和OLGA/OLGIM为个体胃癌风险分类建立和验证胃活检标本的诊断方法。
    方法:在本研究中,我们前瞻性纳入了2017年12月22日至2020年9月25日期间13家三级医院内镜检查中疑似萎缩性胃炎的545例患者,共2725张全张图像(WSI).患者被随机分为一组训练组(n=349),内部验证集(n=87),和外部验证集(n=109)。从外部验证集中随机选择60名患者,并将其分为两组进行观察研究,一个有算法结果的辅助,另一个没有。我们提出了一种半监督深度学习算法来诊断和分级IM和萎缩,我们将其与10位病理学家的评估进行了比较。根据曲线下面积(AUC)评估模型的性能,灵敏度,特异性,和加权kappa值。
    结果:算法,名叫Gasmil,在外部测试集中,在诊断IM(AUC0.884,95%CI0.862-0.902)和萎缩(AUC0.877,95%CI0.855-0.897)方面建立并证明了令人鼓舞的表现。在观察者研究中,GasMIL实现了80%的灵敏度,85%特异性,加权kappa值为0.61,AUC为0.953,超过了所有10位病理学家诊断萎缩的能力。在10位病理学家中,GasMIL的AUC在OLGA中排名第二(0.729,95%CI0.625-0.833),在OLGIM中排名第五(0.792,95%CI0.688-0.896)。在Gasmil的协助下,病理学家表现出改善的AUC(p=0.013),灵敏度(p=0.014),和加权κ(p=0.016)诊断IM,与单独工作的病理学家相比,诊断萎缩的特异性提高(p=0.007)。
    结论:与病理学家相比,GasMIL在诊断IM和萎缩方面表现最佳。显著提高其诊断能力。
    Patients with gastric atrophy and intestinal metaplasia (IM) were at risk for gastric cancer, necessitating an accurate risk assessment. We aimed to establish and validate a diagnostic approach for gastric biopsy specimens using deep learning and OLGA/OLGIM for individual gastric cancer risk classification.
    In this study, we prospectively enrolled 545 patients suspected of atrophic gastritis during endoscopy from 13 tertiary hospitals between December 22, 2017, to September 25, 2020, with a total of 2725 whole-slide images (WSIs). Patients were randomly divided into a training set (n = 349), an internal validation set (n = 87), and an external validation set (n = 109). Sixty patients from the external validation set were randomly selected and divided into two groups for an observer study, one with the assistance of algorithm results and the other without. We proposed a semi-supervised deep learning algorithm to diagnose and grade IM and atrophy, and we compared it with the assessments of 10 pathologists. The model\'s performance was evaluated based on the area under the curve (AUC), sensitivity, specificity, and weighted kappa value.
    The algorithm, named GasMIL, was established and demonstrated encouraging performance in diagnosing IM (AUC 0.884, 95% CI 0.862-0.902) and atrophy (AUC 0.877, 95% CI 0.855-0.897) in the external test set. In the observer study, GasMIL achieved an 80% sensitivity, 85% specificity, a weighted kappa value of 0.61, and an AUC of 0.953, surpassing the performance of all ten pathologists in diagnosing atrophy. Among the 10 pathologists, GasMIL\'s AUC ranked second in OLGA (0.729, 95% CI 0.625-0.833) and fifth in OLGIM (0.792, 95% CI 0.688-0.896). With the assistance of GasMIL, pathologists demonstrated improved AUC (p = 0.013), sensitivity (p = 0.014), and weighted kappa (p = 0.016) in diagnosing IM, and improved specificity (p = 0.007) in diagnosing atrophy compared to pathologists working alone.
    GasMIL shows the best overall performance in diagnosing IM and atrophy when compared to pathologists, significantly enhancing their diagnostic capabilities.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:诊断是有效医疗保健的核心组成部分,但是误诊很常见,会使患者处于危险之中。诊断决策支持系统可以在改善医生和其他医护人员的诊断方面发挥作用。症状检查程序(SC)旨在改善诊断和分诊(即,患者寻求的护理水平)。
    目的:本研究的目的是评估新的大型语言模型ChatGPT(版本3.5和4.0)的性能,广泛使用的WebMDSC,和AdaHealth开发的SC,用于诊断和分诊有紧急或紧急临床问题的患者,并与最终急诊科(ED)诊断和医师审查进行比较。
    方法:我们使用以前收集的,被取消身份,来自40名接受ED治疗的患者的自我报告数据,这些患者在看ED医生之前使用AdaSC记录他们的症状.由不了解诊断和分类的研究助理将鉴定的数据输入到ChatGPT3.5和4.0版以及WebMD中。将所有4个系统的诊断与ED中先前抽象的最终诊断以及三名独立的委员会认证的ED医生的诊断和分诊建议进行了比较,他们盲目地审查了Ada的自我报告临床数据。诊断准确性计算为ChatGPT诊断的比例,AdaSC,WebMDSC,和至少一个ED诊断匹配的独立医师(分层为前1名或前3名)。分类准确度计算为ChatGPT的建议数量,WebMD,或与至少2名独立医生达成一致或被评为“不安全”或“过于谨慎”的Ada。\"
    结果:总体而言,30例和37例有足够的数据进行诊断和分诊分析,分别。Ada的前1诊断率匹配,ChatGPT3.5,ChatGPT4.0,WebMD为9(30%),12(40%),10(33%),和12(40%),分别,医生的平均比率为47%。Ada的前3名诊断匹配率,ChatGPT3.5,ChatGPT4.0和WebMD为19(63%),19(63%),15(50%),和17(57%),分别,医生的平均比率为69%。Ada的分诊结果分布为62%(n=23)同意,14%不安全(n=5),24%(n=9)过于谨慎;对于ChatGPT,3.5是59%(n=22)同意,41%(n=15)不安全,0%(n=0)过于谨慎;对于ChatGPT4.0,76%(n=28)同意,22%(n=8)不安全,3%(n=1)过于谨慎;对于WebMD,70%(n=26)同意,19%(n=7)不安全,和11%(n=4)过于谨慎。ChatGPT3.5的不安全分诊率(41%)显着高于Ada(14%)(P=.009)。
    结论:ChatGPT3.5诊断准确率高,但不安全分诊率高。ChatGPT4.0的诊断准确性最差,但较低的不安全分诊率和与医生的最高分诊协议。Ada和WebMDSC的总体表现优于ChatGPT。在不改善分诊准确性和广泛临床评估的情况下,不建议患者在无监督下使用ChatGPT进行诊断和分诊。
    Diagnosis is a core component of effective health care, but misdiagnosis is common and can put patients at risk. Diagnostic decision support systems can play a role in improving diagnosis by physicians and other health care workers. Symptom checkers (SCs) have been designed to improve diagnosis and triage (ie, which level of care to seek) by patients.
    The aim of this study was to evaluate the performance of the new large language model ChatGPT (versions 3.5 and 4.0), the widely used WebMD SC, and an SC developed by Ada Health in the diagnosis and triage of patients with urgent or emergent clinical problems compared with the final emergency department (ED) diagnoses and physician reviews.
    We used previously collected, deidentified, self-report data from 40 patients presenting to an ED for care who used the Ada SC to record their symptoms prior to seeing the ED physician. Deidentified data were entered into ChatGPT versions 3.5 and 4.0 and WebMD by a research assistant blinded to diagnoses and triage. Diagnoses from all 4 systems were compared with the previously abstracted final diagnoses in the ED as well as with diagnoses and triage recommendations from three independent board-certified ED physicians who had blindly reviewed the self-report clinical data from Ada. Diagnostic accuracy was calculated as the proportion of the diagnoses from ChatGPT, Ada SC, WebMD SC, and the independent physicians that matched at least one ED diagnosis (stratified as top 1 or top 3). Triage accuracy was calculated as the number of recommendations from ChatGPT, WebMD, or Ada that agreed with at least 2 of the independent physicians or were rated \"unsafe\" or \"too cautious.\"
    Overall, 30 and 37 cases had sufficient data for diagnostic and triage analysis, respectively. The rate of top-1 diagnosis matches for Ada, ChatGPT 3.5, ChatGPT 4.0, and WebMD was 9 (30%), 12 (40%), 10 (33%), and 12 (40%), respectively, with a mean rate of 47% for the physicians. The rate of top-3 diagnostic matches for Ada, ChatGPT 3.5, ChatGPT 4.0, and WebMD was 19 (63%), 19 (63%), 15 (50%), and 17 (57%), respectively, with a mean rate of 69% for physicians. The distribution of triage results for Ada was 62% (n=23) agree, 14% unsafe (n=5), and 24% (n=9) too cautious; that for ChatGPT 3.5 was 59% (n=22) agree, 41% (n=15) unsafe, and 0% (n=0) too cautious; that for ChatGPT 4.0 was 76% (n=28) agree, 22% (n=8) unsafe, and 3% (n=1) too cautious; and that for WebMD was 70% (n=26) agree, 19% (n=7) unsafe, and 11% (n=4) too cautious. The unsafe triage rate for ChatGPT 3.5 (41%) was significantly higher (P=.009) than that of Ada (14%).
    ChatGPT 3.5 had high diagnostic accuracy but a high unsafe triage rate. ChatGPT 4.0 had the poorest diagnostic accuracy, but a lower unsafe triage rate and the highest triage agreement with the physicians. The Ada and WebMD SCs performed better overall than ChatGPT. Unsupervised patient use of ChatGPT for diagnosis and triage is not recommended without improvements to triage accuracy and extensive clinical evaluation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:全科医生(GP)在诊断(前)恶性和良性皮肤病变方面仍然存在挑战。电视显微镜(TDsc)支持全科医生在远程皮肤科医生(TDs)诊断和建议的指导下诊断这些皮肤病变,并防止不必要的转诊至皮肤科护理。然而,TDsc在TDsc会诊前后对全科医生自我报告的皮肤科转诊决定的影响尚不清楚.
    目的:本研究的目的是评估和比较TDsc之前全科医生的初始自我报告转诊决定与TDsc之后的最终自我报告转诊决定,这些决定是由TD诊断为(前)恶性或良性的皮肤病变。
    方法:2015年7月至2020年6月期间,全科医生在日常实践中要求的TDsc咨询与TD评估和诊断来自荷兰全国远程医疗数据库。根据全科医生自我管理的问题,对于(前)恶性和良性TD诊断,评估了TDsc会诊前的全科医生转诊决定和TDsc会诊后的最终转诊决定.
    结果:对6364次TDsc会诊的GP自我管理问题和TD诊断进行了评估(9.3%恶性,8.8%的癌前病变,和81.9%良性皮肤病变)。在一半的TDsc磋商中,全科医生在TD建议和TD诊断后调整了最初的转诊决定。最初,全科医生无意转诊118例恶性TD诊断患者中的67例(56.8%)和162例癌前TD诊断患者中的26例(16.0%),但随后决定在TDsc咨询后转诊这些患者。此外,全科医生调整了他们对2534例(74.9%)良性皮肤病变(包括676例脂溢性角化病和131例血管病变)的转诊决定。
    结论:在TD评估后,在52%(n=3306)的TDsc咨询中,全科医生调整了他们的转诊决定。因此,TDsc的可用性具有附加价值,并有助于全科医生将皮肤病变患者(非)转诊至皮肤科护理。TDsc导致转诊患有(前)恶性皮肤病变的患者,而全科医生不会直接将其转诊给皮肤科医生。TDsc还导致低复杂性良性皮肤病变患者不必要的转诊减少(例如,脂溢性角化病和血管病变)。
    BACKGROUND: Challenges remain for general practitioners (GPs) in diagnosing (pre)malignant and benign skin lesions. Teledermoscopy (TDsc) supports GPs in diagnosing these skin lesions guided by teledermatologists\' (TDs) diagnosis and advice and prevents unnecessary referrals to dermatology care. However, the impact of the availability of TDsc on GPs\' self-reported referral decisions to dermatology care before and after the TDsc consultation is unknown.
    OBJECTIVE: The objective of this study is to assess and compare the initial self-reported referral decisions of GPs before TDsc versus their final self-reported referral decisions after TDsc for skin lesions diagnosed by the TD as (pre)malignant or benign.
    METHODS: TDsc consultations requested by GPs in daily practice between July 2015 and June 2020 with a TD assessment and diagnosis were extracted from a nationwide Dutch telemedicine database. Based on GP self-administered questions, the GPs\' referral decisions before and their final referral decision after TDsc consultation were assessed for (pre)malignant and benign TD diagnoses.
    RESULTS: GP self-administered questions and TD diagnoses were evaluated for 6364 TDsc consultations (9.3% malignant, 8.8% premalignant, and 81.9% benign skin lesions). In half of the TDsc consultations, GPs adjusted their initial referral decision after TD advice and TD diagnosis. Initially, GPs did not have the intention to refer 67 (56.8%) of 118 patients with a malignant TD diagnosis and 26 (16.0%) of 162 patients with a premalignant TD diagnosis but then decided to refer these patients after the TDsc consultation. Furthermore, GPs adjusted their decision from referral to nonreferral for 2534 (74.9%) benign skin lesions (including 676 seborrheic keratosis and 131 vascular lesions).
    CONCLUSIONS: GPs adjusted their referral decision in 52% (n=3306) of the TDsc consultations after the TD assessment. The availability of TDsc is thus of added value and assists GPs in their (non)referral for patients with skin lesions to dermatology care. TDsc resulted in referrals of patients with (pre)malignant skin lesions that GPs would not have referred directly to the dermatologist. TDsc also led to a reduction of unnecessary referrals of patients with low complex benign skin lesions (eg, seborrheic keratosis and vascular lesions).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:Behçet病(BD)是一种慢性多系统血管炎,表现为影响眼睛的破坏性炎症,中枢神经系统,和血管。BD中静脉受累的病理特征较差。磁共振(MR)静脉造影提供了有关深静脉和邻近组织的更全面的信息。在这项研究中,我们旨在描述静脉受累的特征,并评估MR静脉造影在BD中的诊断实用性。
    方法:纳入65名符合国际研究组(ISG)标准的BD患者和20名健康对照受试者。下腔静脉(IVC),髂总静脉(CIV),髂外静脉(EIV)和髂内静脉(IVV),股总静脉(CFV),股静脉(FV),BD患者和健康对照者的大隐静脉(GSV)通过MR静脉造影和超声检查评估其存在的病理特征,管腔血栓,血管壁的变化,血管周围异常.
    结果:纳入33例血管性和32例非血管性BD患者(平均年龄39.3±11.3岁,男性48例[73.8%])。MR静脉造影显示IVC壁弥漫性同心增厚,CIV,EIV,IIV,CFV,FV,和BD中的GSV(健康对照与所有静脉段的BDp<0.05)。MR静脉造影提供了有关静脉和血管周围组织的额外信息,例如对比增强,淋巴结肿大,和精囊血管化,与非血管性BD和健康对照相比,血管性BD的发病率明显更高。
    结论:我们的研究结果表明,静脉系统的受累在BD中是弥漫性和广泛性的,和显示的静脉炎可能有助于诊断疾病。
    Behçet\'s Disease (BD) is a chronic multisystem vasculitis that manifests with destructive inflammation affecting the eyes, central nervous system, and blood vessels. The pathology of vein involvement in BD is poorly characterized. Magnetic resonance (MR) venography gives more comprehensive information about deep veins and adjacent tissues. In this study, we aimed to characterize vein involvement and evaluate the diagnostic utility of MR venography in BD.
    Sixty-five BD patients who fulfilled the International Study Group (ISG) criteria and 20 healthy control subjects were enrolled. Inferior vena cava (IVC), common iliac veins (CIV), external (EIV) and internal iliac veins (IVV), common femoral veins (CFV), femoral veins (FV), and greater saphenous veins (GSV) of BD patients and healthy controls were evaluated with MR venography and ultrasonography for the presence pathologic features, luminal thrombi, vessel wall changes, and perivascular abnormalities.
    33 vascular and 32 non-vascular BD patients (mean age 39.3 ± 11.3 years and 48 [73.8%] male) were enrolled. MR venography revealed diffuse concentric thickening of the walls of IVC, CIV, EIV, IIV, CFV, FV, and GSV in BD (healthy controls vs. BD p<0.05 for all vein segments). MR venography provided additional information about veins and perivascular tissues like contrast enhancement, enlarged lymph nodes, and seminal vesicle vascularization, which were remarkably more frequent in vascular BD than non-vascular BD and healthy controls.
    The results of our study suggest that the involvement of the venous system is diffuse and generalized in BD, and demonstration of venulitis might help diagnose the disease.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号