注释 annotation-医云文献数字医云科研云海量医学决策数据服务

annotation 关注

注释

文献(23篇)

百科

视频

1 Improving the Annotation Process in Computational Pathology: A Pilot Study with Manual and Semi-automated Approaches on Consumer and Medical Grade Devices.

改进计算病理学中的注释过程：对消费者和医疗级设备进行手动和半自动方法的初步研究。影响指数 : 暂无
发表时间：Sep 2024 4
来源期刊：J Imaging Inform Med PMID：39231887

DOI：10.1007/s10278-024-01248-x
文章类型： Journal Article

病理学中可靠的人工智能（AI）算法的发展通常取决于整张幻灯片图像（WSI）注释所提供的地面实况，一个耗时且依赖于操作员的过程。对不同注释方法进行了比较分析，以简化此过程。两名病理学家使用半自动化(SegmentAnythingModel，SAM））和手动设备（触摸板与鼠标）。在工作时间方面进行了比较，再现性(重叠分数)，和精度（0到10精度由两个专家的肾病理学家评定）在不同的方法和操作。评价了不同显示器对小鼠性能的影响。注释集中在三个组织区室：小管（57注释），肾小球(53个注释)，和动脉（58注释）。半自动方法是最快的，观察者之间的可变性最小，平均13.6±0.2min，差值(Δ)为2%，其次是小鼠（29.9±10.2，Δ=24%），和触摸板(47.5±19.6分钟,Δ=45%)。使用SAM可实现小管和肾小球的最高再现性（重叠值为1和0.99，而鼠标为0.97，触摸板为0.94和0.93），尽管SAM在动脉中的可重复性较低（与鼠标和触摸板的0.94相比，重叠值为0.89）。在操作者之间没有观察到精度差异(p=0.59)。使用非医疗显示器将注释时间增加了6.1％。未来采用半自动和人工智能辅助方法可以显著加快注释过程。改善AI工具开发的真相。
The development of reliable artificial intelligence (AI) algorithms in pathology often depends on ground truth provided by annotation of whole slide images (WSI), a time-consuming and operator-dependent process. A comparative analysis of different annotation approaches is performed to streamline this process. Two pathologists annotated renal tissue using semi-automated (Segment Anything Model, SAM)) and manual devices (touchpad vs mouse). A comparison was conducted in terms of working time, reproducibility (overlap fraction), and precision (0 to 10 accuracy rated by two expert nephropathologists) among different methods and operators. The impact of different displays on mouse performance was evaluated. Annotations focused on three tissue compartments: tubules (57 annotations), glomeruli (53 annotations), and arteries (58 annotations). The semi-automatic approach was the fastest and had the least inter-observer variability, averaging 13.6 ± 0.2 min with a difference (Δ) of 2%, followed by the mouse (29.9 ± 10.2, Δ = 24%), and the touchpad (47.5 ± 19.6 min, Δ = 45%). The highest reproducibility in tubules and glomeruli was achieved with SAM (overlap values of 1 and 0.99 compared to 0.97 for the mouse and 0.94 and 0.93 for the touchpad), though SAM had lower reproducibility in arteries (overlap value of 0.89 compared to 0.94 for both the mouse and touchpad). No precision differences were observed between operators (p = 0.59). Using non-medical monitors increased annotation times by 6.1%. The future employment of semi-automated and AI-assisted approaches can significantly speed up the annotation process, improving the ground truth for AI tool development.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
2 Is Boundary Annotation Necessary? Evaluating Boundary-Free Approaches to Improve Clinical Named Entity Annotation Efficiency: Case Study.

边界注释是否必要？评估无边界方法以提高临床命名实体注释效率：案例研究。影响指数 : 暂无
发表时间：Jul 2024 2
来源期刊：JMIR Med Inform PMID：38954456

DOI：10.2196/59680
文章类型： Journal Article

背景：命名实体识别（NER）是自然语言处理中的一项基本任务。然而,它之前通常是命名实体注释，这带来了一些挑战，尤其是在临床领域。例如,确定实体边界是注释者之间最常见的分歧来源之一，因为诸如是否应该注释修饰语或外围词。如果未解决,这些会导致产生的语料库不一致，然而，另一方面,严格的指导方针或裁决会议可以进一步延长已经缓慢和复杂的过程。
目的：本研究的目的是通过评估两种新颖的注释方法来解决这些挑战，宽松的跨度和点注释，旨在减轻精确确定实体边界的难度。
方法：我们通过对日本医学病例报告数据集的注释案例研究来评估其效果。我们比较注释时间，注释者协议，和生成的标签的质量，并评估对在注释的语料库上训练的NER系统的性能的影响。
结果：我们看到了标签过程效率的显着提高，与传统的边界严格方法相比，整体注释时间减少了25％，注释者协议甚至提高了10％。然而,与传统的注释方法相比，即使是最好的NER模型也表现出一些性能下降。
结论：我们的发现证明了注释速度和模型性能之间的平衡。尽管忽略边界信息会在一定程度上影响模型性能，这是由显著减少注释者的工作量和显著提高注释过程的速度所抵消的。这些好处可能在各种应用中被证明是有价值的，为开发人员和研究人员提供了一个有吸引力的折衷方案。
BACKGROUND: Named entity recognition (NER) is a fundamental task in natural language processing. However, it is typically preceded by named entity annotation, which poses several challenges, especially in the clinical domain. For instance, determining entity boundaries is one of the most common sources of disagreements between annotators due to questions such as whether modifiers or peripheral words should be annotated. If unresolved, these can induce inconsistency in the produced corpora, yet, on the other hand, strict guidelines or adjudication sessions can further prolong an already slow and convoluted process.
OBJECTIVE: The aim of this study is to address these challenges by evaluating 2 novel annotation methodologies, lenient span and point annotation, aiming to mitigate the difficulty of precisely determining entity boundaries.
METHODS: We evaluate their effects through an annotation case study on a Japanese medical case report data set. We compare annotation time, annotator agreement, and the quality of the produced labeling and assess the impact on the performance of an NER system trained on the annotated corpus.
RESULTS: We saw significant improvements in the labeling process efficiency, with up to a 25% reduction in overall annotation time and even a 10% improvement in annotator agreement compared to the traditional boundary-strict approach. However, even the best-achieved NER model presented some drop in performance compared to the traditional annotation methodology.
CONCLUSIONS: Our findings demonstrate a balance between annotation speed and model performance. Although disregarding boundary information affects model performance to some extent, this is counterbalanced by significant reductions in the annotator\'s workload and notable improvements in the speed of the annotation process. These benefits may prove valuable in various applications, offering an attractive compromise for developers and researchers.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
3 Educational value of a novel telestration device for surgical coaching-a randomized controlled trial.

用于手术指导的新型远程设备的教育价值 - 一项随机对照试验。影响指数 : 3.453
发表时间：Aug 2024 24
来源期刊：Surg Endosc PMID：38913120

DOI：10.1007/s00464-024-10972-y
文章类型： Journal Article

背景：沟通是有效手术指导的基础。这对于在图像引导程序期间的训练来说可能是具有挑战性的,其中教练和受训者需要在监视器上阐明技术细节。在监视器上远程注释的远程传送设备可以潜在地克服这些限制并增强指导体验。本研究旨在评估一种新型远程设备在手术指导中的价值。
方法：设计一项随机对照试验。所有参与者都观看了一段视频，演示了该任务，然后进行了基线绩效评估，并随机分为对照组（不使用电传的常规口头教练）或电传组（使用电传的口头教练）。在干实验室模型上进行模拟腹腔镜小肠吻合的指导是由一名教员外科医生完成的。在教练会议之后，参与者接受了相同任务的教练后绩效评估.使用经修改的客观结构化技术技能评估（OSATS）的全球评定量表，由盲法评审员对评估进行记录和评级。还记录了教练课程，并在指导时刻进行了比较；指导误解，学员的问题/澄清，和任务完成时间。进行5点Likert量表以获得反馈。
结果：24名居民参加了（对照组13，电报组11）。在Telestrationarm中注意到OSATS量表某些要素的改善，但两组之间的总体得分没有统计学意义。指导时刻更多的是在电信组。在电信集团中，55%的人认为他们可以独立完成这项任务，相比之下，对照组中只有8%的人和82%的人建议在这里使用电传工具。
结论：通过改善教练和受训者之间的沟通和更多的指导时间来增强教练体验，这种新颖的远程传送设备主要在互动的非技术方面具有教育价值。
BACKGROUND: Communication is fundamental to effective surgical coaching. This can be challenging for training during image-guided procedures where coaches and trainees need to articulate technical details on a monitor. Telestration devices that annotate on monitors remotely could potentially overcome these limitations and enhance the coaching experience. This study aims to evaluate the value of a novel telestration device in surgical coaching.
METHODS: A randomized-controlled trial was designed. All participants watched a video demonstrating the task followed by a baseline performance assessment and randomization into either control group (conventional verbal coaching without telestration) or telestration group (verbal coaching with telestration). Coaching for a simulated laparoscopic small bowel anastomosis on a dry lab model was done by a faculty surgeon. Following the coaching session, participants underwent a post-coaching performance assessment of the same task. Assessments were recorded and rated by blinded reviewers using a modified Global Rating Scale of the Objective Structured Assessment of Technical Skills (OSATS). Coaching sessions were also recorded and compared in terms of mentoring moments; guidance misinterpretations, questions/clarifications by trainees, and task completion time. A 5-point Likert scale was administered to obtain feedback.
RESULTS: Twenty-four residents participated (control group 13, telestration group 11). Improvements in some elements of the OSATS scale were noted in the Telestration arm but there was no statistical significance in the overall score between the two groups. Mentoring moments were more in the telestration Group. Amongst the telestration Group, 55% felt comfortable that they could perform this task independently, compared to only 8% amongst the control group and 82% would recommend the use of telestration tools here.
CONCLUSIONS: There is demonstrated educational value of this novel telestration device mainly in the non-technical aspects of the interaction by enhancing the coaching experience with improvement in communication and greater mentoring moments between coach and trainee.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
4 Sample Size Considerations for Fine-Tuning Large Language Models for Named Entity Recognition Tasks: Methodological Study.

为命名实体识别任务微调大型语言模型的样本量考虑因素：方法论研究。影响指数 : 暂无
发表时间：May 2024 16
来源期刊：JMIR AI PMID：38875593

DOI：10.2196/52095
文章类型： Journal Article

背景：大型语言模型（LLM）具有支持健康信息学中有前途的新应用的潜力。然而,缺乏在生物医学和卫生政策背景下对LLM进行微调以执行特定任务的样本量考虑因素的实际数据。
目的：本研究旨在评估用于微调LLM的样本量和样本选择技术，以支持针对利益冲突披露声明的自定义数据集的改进的命名实体识别（NER）。
方法：随机抽取200份披露声明进行注释。所有“人员”和“ORG”实体均由2个评估者识别，一旦建立了适当的协议，注释者独立地注释了另外290个公开声明。从490个注释文档中，抽取了2500个不同大小范围的分层随机样本。2500个训练集子样本用于在2个模型架构（来自变压器[BERT]和生成预训练变压器[GPT]的双向编码器表示）中微调语言模型的选择，以改善NER。多元回归用于评估样本量(句子)之间的关系，实体密度(每个句子的实体[EPS])，和训练的模型性能(F1分数)。此外,单预测阈值回归模型用于评估增加样本量或实体密度导致边际收益递减的可能性。
结果：在架构中，微调模型的顶线NER性能从F1分数=0.79到F1分数=0.96不等。双预测多元线性回归模型的多重R2在0.6057~0.7896之间有统计学意义（均P<.001）。在所有情况下，EPS和句子数是F1得分的显著预测因子(P<.001)，除了GPT-2_large模型，其中每股收益不是显著的预测因子(P=0.184)。模型阈值表示由增加的训练数据集样本量（以句子的数量衡量）的边际收益递减点，点估计范围从RoBERTa_large的439个句子到GPT-2_large的527个句子。同样，阈值回归模型表明每股收益的边际收益递减，点估计在1.36和1.38之间。
结论：相对适度的样本量可用于微调适用于生物医学文本的NER任务的LLM，和训练数据实体密度应代表性地近似生产数据中的实体密度。训练数据质量和模型架构的预期用途(文本生成与文本处理或分类)可能是,或更多，重要的是训练数据量和模型参数大小。
BACKGROUND: Large language models (LLMs) have the potential to support promising new applications in health informatics. However, practical data on sample size considerations for fine-tuning LLMs to perform specific tasks in biomedical and health policy contexts are lacking.
OBJECTIVE: This study aims to evaluate sample size and sample selection techniques for fine-tuning LLMs to support improved named entity recognition (NER) for a custom data set of conflicts of interest disclosure statements.
METHODS: A random sample of 200 disclosure statements was prepared for annotation. All \"PERSON\" and \"ORG\" entities were identified by each of the 2 raters, and once appropriate agreement was established, the annotators independently annotated an additional 290 disclosure statements. From the 490 annotated documents, 2500 stratified random samples in different size ranges were drawn. The 2500 training set subsamples were used to fine-tune a selection of language models across 2 model architectures (Bidirectional Encoder Representations from Transformers [BERT] and Generative Pre-trained Transformer [GPT]) for improved NER, and multiple regression was used to assess the relationship between sample size (sentences), entity density (entities per sentence [EPS]), and trained model performance (F1-score). Additionally, single-predictor threshold regression models were used to evaluate the possibility of diminishing marginal returns from increased sample size or entity density.
RESULTS: Fine-tuned models ranged in topline NER performance from F1-score=0.79 to F1-score=0.96 across architectures. Two-predictor multiple linear regression models were statistically significant with multiple R2 ranging from 0.6057 to 0.7896 (all P<.001). EPS and the number of sentences were significant predictors of F1-scores in all cases ( P<.001), except for the GPT-2_large model, where EPS was not a significant predictor (P=.184). Model thresholds indicate points of diminishing marginal return from increased training data set sample size measured by the number of sentences, with point estimates ranging from 439 sentences for RoBERTa_large to 527 sentences for GPT-2_large. Likewise, the threshold regression models indicate a diminishing marginal return for EPS with point estimates between 1.36 and 1.38.
CONCLUSIONS: Relatively modest sample sizes can be used to fine-tune LLMs for NER tasks applied to biomedical text, and training data entity density should representatively approximate entity density in production data. Training data quality and a model architecture\'s intended use (text generation vs text processing or classification) may be as, or more, important as training data volume and model parameter size.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
5 Using Large Language Models to Support Content Analysis: A Case Study of ChatGPT for Adverse Event Detection.

使用大型语言模型支持内容分析： ChatGPT 的不良事件检测案例研究。影响指数 : 7.076
发表时间：May 2024 2
来源期刊：J Med Internet Res PMID：38696245

DOI：10.2196/52499
文章类型： Journal Article

本研究通过进行案例研究来识别社交媒体帖子中的不良事件（AE），探讨了使用大型语言模型来辅助内容分析的潜力。该案例研究比较了ChatGPT与人类注释者在检测与δ-8-四氢大麻酚相关的AE方面的表现，大麻衍生产品。使用给人类注释者的相同指令，ChatGPT非常接近人类结果，高度一致：任何AE检测（Fleissκ=0.95）为94.4%（9436/10,000），严重AE（κ=0.96）为99.3%（9931/10,000）。这些发现表明ChatGPT具有准确有效地复制人类注释的潜力。该研究认识到可能的局限性，包括对ChatGPT的训练数据的泛化性的担忧，并提示用不同的模型进行进一步的研究，数据源,和内容分析任务。该研究强调了大型语言模型对提高生物医学研究效率的承诺。
This study explores the potential of using large language models to assist content analysis by conducting a case study to identify adverse events (AEs) in social media posts. The case study compares ChatGPT\'s performance with human annotators\' in detecting AEs associated with delta-8-tetrahydrocannabinol, a cannabis-derived product. Using the identical instructions given to human annotators, ChatGPT closely approximated human results, with a high degree of agreement noted: 94.4% (9436/10,000) for any AE detection (Fleiss κ=0.95) and 99.3% (9931/10,000) for serious AEs (κ=0.96). These findings suggest that ChatGPT has the potential to replicate human annotation accurately and efficiently. The study recognizes possible limitations, including concerns about the generalizability due to ChatGPT\'s training data, and prompts further research with different models, data sources, and content analysis tasks. The study highlights the promise of large language models for enhancing the efficiency of biomedical research.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
6 Landscape of FLT3 Variations Associated with Structural and Functional Impact on Acute Myeloid Leukemia: A Computational Study.

与急性髓系白血病的结构和功能影响相关的 FLT3 变异的景观：一项计算研究。影响指数 : 6.208
发表时间：Mar 2024 18
来源期刊：Int J Mol Sci PMID：38542393

DOI：10.3390/ijms25063419
文章类型： Journal Article

急性骨髓性白血病（AML）的特点是骨髓母细胞的克隆增殖。导致fms样酪氨酸激酶3（FLT3）基因组成型激活的突变，编码III类受体酪氨酸激酶，与这种异质性血液系统恶性肿瘤显著相关。fms相关的酪氨酸激酶3配体与FLT3受体的胞外结构域结合，在质膜中诱导同源二聚体的形成，导致自磷酸化和细胞凋亡的激活，扩散,和骨髓造血细胞的分化。在本研究中,我们评估了FLT3作为AML的重要生物标志物的相关性,并试图了解特定变异对FLT3蛋白结构和功能的影响.我们还使用分子对接检查了I836变体对与索拉非尼的结合亲和力的影响。我们整合了多种生物信息学工具，数据库,和OncoDB等资源，UniProt,COSMIC,UALCAN,PyMOL,ProSA,Missense3D,InterProScan,SIFT,PolyPhen,和PredictSNP来注释结构，功能，以及与FLT3相关的已知变异的表型影响。使用计算机模拟方法分析了29种FLT3变体，如DynaMut，CUPSAT,AutoDock,以及DiscoveryStudio对蛋白质稳定性的影响，灵活性,函数,和结合亲和力。OncoDB和UALCAN门户证实了FLT3基因表达及其突变状态与AML的关联。对FLT3有害变体的计算结构分析显示I863F突变体是蛋白质结构的去稳定剂，可能导致功能变化。FLT3中的许多单核苷酸变异对其结构和功能有影响。因此，FLT3SNV的注释及其有害致病影响的预测将有助于深入了解肿瘤发生过程,并指导实验研究和临床意义.
Acute myeloid leukemia (AML) is hallmarked by the clonal proliferation of myeloid blasts. Mutations that result in the constitutive activation of the fms-like tyrosine kinase 3 (FLT3) gene, coding for a class III receptor tyrosine kinase, are significantly associated with this heterogeneous hematologic malignancy. The fms-related tyrosine kinase 3 ligand binds to the extracellular domain of the FLT3 receptor, inducing homodimer formation in the plasma membrane, leading to autophosphorylation and activation of apoptosis, proliferation, and differentiation of hematopoietic cells in bone marrow. In the present study, we evaluated the association of FLT3 as a significant biomarker for AML and tried to comprehend the effects of specific variations on the FLT3 protein\'s structure and function. We also examined the effects of I836 variants on binding affinity to sorafenib using molecular docking. We integrated multiple bioinformatics tools, databases, and resources such as OncoDB, UniProt, COSMIC, UALCAN, PyMOL, ProSA, Missense3D, InterProScan, SIFT, PolyPhen, and PredictSNP to annotate the structural, functional, and phenotypic impact of the known variations associated with FLT3. Twenty-nine FLT3 variants were analyzed using in silico approaches such as DynaMut, CUPSAT, AutoDock, and Discovery Studio for their impact on protein stability, flexibility, function, and binding affinity. The OncoDB and UALCAN portals confirmed the association of FLT3 gene expression and its mutational status with AML. A computational structural analysis of the deleterious variants of FLT3 revealed I863F mutants as destabilizers of the protein structure, possibly leading to functional changes. Many single-nucleotide variations in FLT3 have an impact on its structure and function. Thus, the annotation of FLT3 SNVs and the prediction of their deleterious pathogenic impact will facilitate an insight into the tumorigenesis process and guide experimental studies and clinical implications.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
7 Genomic prediction based on preselected single-nucleotide polymorphisms from genome-wide association study and imputed whole-genome sequence data annotation for growth traits in Duroc pigs.

基于来自全基因组关联研究的预选单核苷酸多态性的基因组预测和杜洛克猪生长性状的估算全基因组序列数据注释。影响指数 : 4.929
发表时间：Feb 2024
来源期刊：Evol Appl PMID：38362509

DOI：10.1111/eva.13651
文章类型： Journal Article

全基因组序列（WGS）数据的使用有望提高复杂性状的基因组预测（GP）能力，因为它可能包含具有因果突变的强连锁不平衡模式的突变。然而,先前的一些研究表明，使用WGS数据，预测准确性没有或有很小的改善。将先前的生物学信息纳入GP似乎是一种有吸引力的策略，可以提高预测准确性。在这项研究中,使用50K芯片对总共6334头猪进行了基因分型,随后将其归入WGS水平.该队列包括两个先前发现的种群，其中包括294头长白猪和186头杜洛克猪，以及由3770只美国杜洛克猪和2084只加拿大杜洛克猪组成的两个验证群体。然后，我们使用WGS数据中的注释信息和全基因组关联研究（GWAS）对两个Duroc猪种群的六个生长性状进行GP。基于变体注释，我们划分了不同的基因组类别，如内含子，基因间,和未翻译的区域，用于估算的WGS数据。根据WGS数据的GWAS结果，我们获得了性状相关的单核苷酸多态性（SNP）。然后，我们应用基因组特征最佳线性无偏预测（GFBLUP）和基因组最佳线性无偏预测（GBLUP）模型来估计具有这些不同变体面板的生长性状的基因组估计育种值，包括六个基因组类别和性状相关的SNP。与50K芯片数据比拟，GBLUP与估算的WGS数据没有增加预测准确性。与50K的GBLUP相比，仅使用注释不会提高预测精度，但是在具有估算WGS数据的GFBLUP模型中添加注释信息可以提高预测精度，提高0.00％-2.82％。总之,结合了先前生物学信息的GFBLUP模型可能会增加将估算的WGS数据用于GP的优势。
The use of whole-genome sequence (WGS) data is expected to improve genomic prediction (GP) power of complex traits because it may contain mutations that in strong linkage disequilibrium pattern with causal mutations. However, a few previous studies have shown no or small improvement in prediction accuracy using WGS data. Incorporating prior biological information into GP seems to be an attractive strategy that might improve prediction accuracy. In this study, a total of 6334 pigs were genotyped using 50K chips and subsequently imputed to the WGS level. This cohort includes two prior discovery populations that comprise 294 Landrace pigs and 186 Duroc pigs, as well as two validation populations that consist of 3770 American Duroc pigs and 2084 Canadian Duroc pigs. Then we used annotation information and genome-wide association study (GWAS) from the WGS data to make GP for six growth traits in two Duroc pig populations. Based on variant annotation, we partitioned different genomic classes, such as intron, intergenic, and untranslated regions, for imputed WGS data. Based on GWAS results of WGS data, we obtained trait-associated single-nucleotide polymorphisms (SNPs). We then applied the genomic feature best linear unbiased prediction (GFBLUP) and genomic best linear unbiased prediction (GBLUP) models to estimate the genomic estimated breeding values for growth traits with these different variant panels, including six genomic classes and trait-associated SNPs. Compared with 50K chip data, GBLUP with imputed WGS data had no increase in prediction accuracy. Using only annotations resulted in no increase in prediction accuracy compared to GBLUP with 50K, but adding annotation information into the GFBLUP model with imputed WGS data could improve the prediction accuracy with increases of 0.00%-2.82%. In conclusion, a GFBLUP model that incorporated prior biological information might increase the advantage of using imputed WGS data for GP.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
8 Developing a Framework to Infer Opioid Use Disorder Severity From Clinical Notes to Inform Natural Language Processing Methods: Characterization Study.

开发一个框架来推断阿片类药物使用障碍严重程度从临床笔记到告知自然语言处理方法：表征研究。影响指数 : 6.332
发表时间：Jan 2024 15
来源期刊：JMIR Ment Health PMID：38224481

DOI：10.2196/53366
文章类型： Journal Article

背景：关于阿片类药物使用障碍（OUD）状态和严重程度的信息对于患者护理很重要。临床笔记为检测和表征有问题的阿片类药物使用提供了有价值的信息，需要开发自然语言处理（NLP）工具，这反过来需要可靠地标记OUD相关文本和文档模式的理解。
目标：为了告知自动NLP方法，我们旨在开发和评估用于表征OUD及其严重性的注释模式，并在异质患者队列的临床笔记中记录OUD相关信息的模式。
方法：我们根据《精神疾病诊断和统计手册》的标准开发了一种注释模式来表征OUD严重程度，第五版。总的来说,2注释者回顾了来自100名成年患者的关键遭遇的临床注释，这些患者具有OUD的各种证据，包括患有和没有慢性疼痛的患者，有或没有OUD的药物治疗，和一个对照组。我们在句子级别完成了注释。我们根据注释文本的注释计算了严重程度评分，其中18个类别与OUD严重程度的标准一致，并确定了OUD严重程度的阳性预测值。
结果：注释模式包含27个类。我们注释了82名患者的1436个句子；18名患者（其中11名是对照）的注释没有相关信息。在15批审阅的笔记中，有11批注释者之间的协议超过70％。对照组患者的严重程度评分均为0。在非对照患者中，平均严重程度评分为5.1（SD3.2），表明适度的OUD，检测中度或重度OUD的阳性预测值为0.71。来自急诊科和门诊部的进度笔记和笔记包含了最多和最大的信息多样性。物质滥用和精神病类别最普遍，并且在不同类型的音符之间高度相关，并且在患者中同时出现。
结论：注释模式的实施表明，根据一小组临床记录中的关键信息，并突出显示这些信息的记录位置，有很强的潜力推断OUD严重程度。这些进步将促进NLP工具开发，以改善OUD预防，诊断,和治疗。
BACKGROUND: Information regarding opioid use disorder (OUD) status and severity is important for patient care. Clinical notes provide valuable information for detecting and characterizing problematic opioid use, necessitating development of natural language processing (NLP) tools, which in turn requires reliably labeled OUD-relevant text and understanding of documentation patterns.
OBJECTIVE: To inform automated NLP methods, we aimed to develop and evaluate an annotation schema for characterizing OUD and its severity, and to document patterns of OUD-relevant information within clinical notes of heterogeneous patient cohorts.
METHODS: We developed an annotation schema to characterize OUD severity based on criteria from the Diagnostic and Statistical Manual of Mental Disorders, 5th edition. In total, 2 annotators reviewed clinical notes from key encounters of 100 adult patients with varied evidence of OUD, including patients with and those without chronic pain, with and without medication treatment for OUD, and a control group. We completed annotations at the sentence level. We calculated severity scores based on annotation of note text with 18 classes aligned with criteria for OUD severity and determined positive predictive values for OUD severity.
RESULTS: The annotation schema contained 27 classes. We annotated 1436 sentences from 82 patients; notes of 18 patients (11 of whom were controls) contained no relevant information. Interannotator agreement was above 70% for 11 of 15 batches of reviewed notes. Severity scores for control group patients were all 0. Among noncontrol patients, the mean severity score was 5.1 (SD 3.2), indicating moderate OUD, and the positive predictive value for detecting moderate or severe OUD was 0.71. Progress notes and notes from emergency department and outpatient settings contained the most and greatest diversity of information. Substance misuse and psychiatric classes were most prevalent and highly correlated across note types with high co-occurrence across patients.
CONCLUSIONS: Implementation of the annotation schema demonstrated strong potential for inferring OUD severity based on key information in a small set of clinical notes and highlighting where such information is documented. These advancements will facilitate NLP tool development to improve OUD prevention, diagnosis, and treatment.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
9 Reliability of Retinal Layer Annotation with a Novel, High-Resolution Optical Coherence Tomography Device: A Comparative Study.

一种新颖的视网膜层注释的可靠性，高分辨率光学相干层析成像装置的比较研究. 影响指数 : 5.046
发表时间：Mar 2023 31
来源期刊：Bioengineering (Basel) PMID：37106625

DOI：10.3390/bioengineering10040438
文章类型： Journal Article

光学相干断层扫描(OCT)能够对活体人眼中的各个视网膜层进行体内诊断。然而,提高成像分辨率可以帮助诊断和监测视网膜疾病,并确定潜在的新成像生物标志物.研究性高分辨率OCT平台（高分辨率OCT；853nm中心波长，3µm轴向分辨率）通过移动中心波长并与常规OCT设备（880nm中心波长，7µm轴向分辨率)。为了评估更高分辨率的可能好处，我们比较了常规和高分辨率OCT的视网膜层注释的重测可靠性，评估了高分辨率OCT在年龄相关性黄斑变性（AMD）患者中的应用，并评估了两种设备在主观图像质量上的差异。30例早期/中度AMD（iAMD；平均年龄75±8岁）患者的30只眼和30例年龄相似且无黄斑病变（62±17岁）的受试者的30只眼在两种设备上进行了相同的OCT成像。使用EyeLab分析手动视网膜层注释的阅读器间和阅读器内可靠性。由两个分级者对中央OCTB扫描的图像质量进行分级,并且形成和评估平均意见评分(MOS)。高分辨率OCT的阅读器间和阅读器内可靠性较高（阅读器间可靠性的最大益处：神经节细胞层；阅读器内可靠性的最大益处：视网膜神经纤维层）。高分辨率OCT与改善的MOS（MOS9/8，Z值=5.4，p<0.01）显着相关，主要是由于改善了主观分辨率（9/7，Z值6.2，p<0.01）。视网膜色素上皮玻璃疣复合物在iAMD眼中的高分辨率OCT中显示出改善重检可靠性的趋势，但无统计学意义。高分辨率OCT的改善的轴向分辨率有利于重新测试视网膜层注释的可靠性，并改善感知的图像质量和分辨率。自动图像分析算法也可以受益于增加的图像分辨率。
Optical coherence tomography (OCT) enables in vivo diagnostics of individual retinal layers in the living human eye. However, improved imaging resolution could aid diagnosis and monitoring of retinal diseases and identify potential new imaging biomarkers. The investigational high-resolution OCT platform (High-Res OCT; 853 nm central wavelength, 3 µm axial-resolution) has an improved axial resolution by shifting the central wavelength and increasing the light source bandwidth compared to a conventional OCT device (880 nm central wavelength, 7 µm axial-resolution). To assess the possible benefit of a higher resolution, we compared the retest reliability of retinal layer annotation from conventional and High-Res OCT, evaluated the use of High-Res OCT in patients with age-related macular degeneration (AMD), and assessed differences of both devices on subjective image quality. Thirty eyes of 30 patients with early/intermediate AMD (iAMD; mean age 75 ± 8 years) and 30 eyes of 30 age-similar subjects without macular changes (62 ± 17 years) underwent identical OCT imaging on both devices. Inter- and intra-reader reliability were analyzed for manual retinal layer annotation using EyeLab. Central OCT B-scans were graded for image quality by two graders and a mean-opinion-score (MOS) was formed and evaluated. Inter- and intra-reader reliability were higher for High-Res OCT (greatest benefit for inter-reader reliability: ganglion cell layer; for intra-reader reliability: retinal nerve fiber layer). High-Res OCT was significantly associated with an improved MOS (MOS 9/8, Z-value = 5.4, p < 0.01) mainly due to improved subjective resolution (9/7, Z-Value 6.2, p < 0.01). The retinal pigment epithelium drusen complex showed a trend towards improved retest reliability in High-Res OCT in iAMD eyes but without statistical significance. Improved axial resolution of the High-Res OCT benefits retest reliability of retinal layer annotation and improves perceived image quality and resolution. Automated image analysis algorithms could also benefit from the increased image resolution.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
10 Development of a COVID-19-Related Anti-Asian Tweet Data Set: Quantitative Study.

COVID - 19 相关的抗亚洲 Tweet 数据集的开发：定量研究。影响指数 : 暂无
发表时间：Feb 2023 28
来源期刊：JMIR Form Res PMID：36693148

DOI：10.2196/40403
文章类型： Journal Article

未经批准：背景自COVID-19大流行以来，亚洲血统的个人（我们指的是北美流行的单词的口语用法，其中亚洲人用来指来自东亚的人，尤其是中国）在离线和在线社区中一直是污名和仇恨言论的主题。遇到这种不公平攻击的主要场所之一是Twitter等社交网络。随着研究界试图理解，分析和实施检测技术，高质量的数据集正变得非常重要。目的在本研究中,我们引入了一个手动标记的Tweets数据集，该数据集具有反亚洲污名化内容。数据集和方法我们对2020年1月至2020年7月在Twitter上发布的668M条推文进行了采样，并使用了迭代数据构建方法，该方法包括三个不同阶段的算法驱动数据选择，最后，我们有志愿者手动手动注释推文，以达到11,263条带有主要标签的推文（未知/无关，不污名化，污名化-低，污名化-媒介，污名化-高)和Tweet子主题(例如，潮湿的市场和饮食习惯，COVID-19病例，生物武器,等。).此外，我们从该数据集中选择了5,000条推文，并由第二个注释器标记它们，然后第三个注释器解决了第一个和第二个注释器之间的标签冲突。我们将最终的数据集作为高质量的Twitter数据集，介绍了COVID-19大流行期间对中国人的污名。数据集和标签说明可以在Github存储库中查看[46]。我们实施了一些最先进的模型来检测污名化的推文，为我们的数据集设置初始基准。我们的结果表明，当使用传统模型（如支持向量机以73％的精度执行）检测看不见的数据时，来自变压器的双向编码器表示（BERT）模型达到了79％的最高精度。结论我们的数据集可以用作围绕该问题进行进一步定性和定量研究和分析的基准。它首先重申了对全球亚洲人口的普遍歧视和污名的存在和意义。此外，我们的数据集和随后的论点应该有助于来自各个领域的其他研究人员，包括心理学家,公共政策当局，和社会学家，分析复杂的经济，政治,历史,以及反亚洲污名化和仇恨行为的文化根源。手动注释的数据集对于开发可用于检测污名或有问题的文本的算法至关重要。尤其是在社交媒体上。我们相信这一贡献将有助于预测，随后设计干预措施，将大大有助于减少污名，恨,以及在像COVID-19这样的未来危机中对边缘化人群的歧视。
BACKGROUND: Since the advent of the COVID-19 pandemic, individuals of Asian descent (colloquial usage prevalent in North America, where \"Asian\" is used to refer to people from East Asia, particularly China) have been the subject of stigma and hate speech in both offline and online communities. One of the major venues for encountering such unfair attacks is social networks, such as Twitter. As the research community seeks to understand, analyze, and implement detection techniques, high-quality data sets are becoming immensely important.
OBJECTIVE: In this study, we introduce a manually labeled data set of tweets containing anti-Asian stigmatizing content.
METHODS: We sampled over 668 million tweets posted on Twitter from January to July 2020 and used an iterative data construction approach that included 3 different stages of algorithm-driven data selection. Finally, we found volunteers who manually annotated the tweets by hand to arrive at a high-quality data set of tweets and a second, more sampled data set with higher-quality labels from multiple annotators. We presented this final high-quality Twitter data set on stigma toward Chinese people during the COVID-19 pandemic. The data set and instructions for labeling can be viewed in the Github repository. Furthermore, we implemented some state-of-the-art models to detect stigmatizing tweets to set initial benchmarks for our data set.
RESULTS: Our primary contributions are labeled data sets. Data Set v3.0 contained 11,263 tweets with primary labels (unknown/irrelevant, not-stigmatizing, stigmatizing-low, stigmatizing-medium, stigmatizing-high) and tweet subtopics (eg, wet market and eating habits, COVID-19 cases, bioweapon). Data Set v3.1 contained 4998 (44.4%) tweets randomly sampled from Data Set v3.0, where a second annotator labeled them only on the primary labels and then a third annotator resolved conflicts between the first and second annotators. To demonstrate the usefulness of our data set, preliminary experiments on the data set showed that the Bidirectional Encoder Representations from Transformers (BERT) model achieved the highest accuracy of 79% when detecting stigma on unseen data with traditional models, such as a support vector machine (SVM) performing at 73% accuracy.
CONCLUSIONS: Our data set can be used as a benchmark for further qualitative and quantitative research and analysis around the issue. It first reaffirms the existence and significance of widespread discrimination and stigma toward the Asian population worldwide. Moreover, our data set and subsequent arguments should assist other researchers from various domains, including psychologists, public policy authorities, and sociologists, to analyze the complex economic, political, historical, and cultural underlying roots of anti-Asian stigmatization and hateful behaviors. A manually annotated data set is of paramount importance for developing algorithms that can be used to detect stigma or problematic text, particularly on social media. We believe this contribution will help predict and subsequently design interventions that will significantly help reduce stigma, hate, and discrimination against marginalized populations during future crises like COVID-19.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文

annotation 关注

1 Improving the Annotation Process in Computational Pathology: A Pilot Study with Manual and Semi-automated Approaches on Consumer and Medical Grade Devices.

2 Is Boundary Annotation Necessary? Evaluating Boundary-Free Approaches to Improve Clinical Named Entity Annotation Efficiency: Case Study.

3 Educational value of a novel telestration device for surgical coaching-a randomized controlled trial.

4 Sample Size Considerations for Fine-Tuning Large Language Models for Named Entity Recognition Tasks: Methodological Study.

5 Using Large Language Models to Support Content Analysis: A Case Study of ChatGPT for Adverse Event Detection.

6 Landscape of FLT3 Variations Associated with Structural and Functional Impact on Acute Myeloid Leukemia: A Computational Study.

7 Genomic prediction based on preselected single-nucleotide polymorphisms from genome-wide association study and imputed whole-genome sequence data annotation for growth traits in Duroc pigs.

8 Developing a Framework to Infer Opioid Use Disorder Severity From Clinical Notes to Inform Natural Language Processing Methods: Characterization Study.

9 Reliability of Retinal Layer Annotation with a Novel, High-Resolution Optical Coherence Tomography Device: A Comparative Study.

10 Development of a COVID-19-Related Anti-Asian Tweet Data Set: Quantitative Study.