Evaluations

评价
  • 文章类型: Journal Article
    背景:异物(FB)吸入,摄取,插入占耳朵急诊入院的11%,鼻子,和喉咙条件。儿童受到不成比例的影响,和紧急干预可能需要维持气道通畅和防止血管闭塞。高品质,可读的在线信息可以帮助减少FB的不良结果。
    目的:我们旨在评估与FBs相关的在线健康信息的质量和可读性。
    方法:总共,使用Google搜索引擎查询了6个搜索短语。对于每个搜索词,捕获了前30个结果。包括英文网站和显示健康信息。记录了提供者和原产国。改进的36项确保患者质量信息工具用于评估信息质量。使用组合工具评估可读性:Flesch阅读轻松评分,Flesch-Kincaid等级,Gunning-Fog指数,和Gobbledygook的简单测量。
    结果:删除重复项之后,评估了73个网站,大多数来自美国(n=46,63%)。总的来说,内容质量中等,确保患者质量信息的中位数评分为21分(IQR18-25,最大29分),最高可能得分为36分。41%(n=30)的网站没有提到预防措施,30%(n=22)的网站没有将磁盘电池识别为危险的FB。95%(n=69)的网站发现了需要紧急护理的危险信号,89%(n=65)建议患者就医,38%(n=28)建议安全去除FB。可读性得分(Flesch阅读轻松得分=12.4,Flesch-Kincaid等级等级=6.2,Gunning-Fog指数=6.5,Gobbledygook的简单度量=5.9年)显示大多数网站(56%)低于建议的六年级水平。
    结论:关于FBs的信息的当前质量和可读性不足。超过半数的网站高于推荐的六年级阅读水平,和重要的信息,如磁盘电池和磁铁等高风险的fb经常被排除在外。应制定策略,以改善对高质量信息的获取,以告知患者和父母有关风险以及何时寻求医疗帮助。在搜索结果中推广高质量网站的策略也有可能改善结果。
    BACKGROUND: Foreign body (FB) inhalation, ingestion, and insertion account for 11% of emergency admissions for ear, nose, and throat conditions. Children are disproportionately affected, and urgent intervention may be needed to maintain airway patency and prevent blood vessel occlusion. High-quality, readable online information could help reduce poor outcomes from FBs.
    OBJECTIVE: We aim to evaluate the quality and readability of available online health information relating to FBs.
    METHODS: In total, 6 search phrases were queried using the Google search engine. For each search term, the first 30 results were captured. Websites in the English language and displaying health information were included. The provider and country of origin were recorded. The modified 36-item Ensuring Quality Information for Patients tool was used to assess information quality. Readability was assessed using a combination of tools: Flesch Reading Ease score, Flesch-Kincaid Grade Level, Gunning-Fog Index, and Simple Measure of Gobbledygook.
    RESULTS: After the removal of duplicates, 73 websites were assessed, with the majority originating from the United States (n=46, 63%). Overall, the quality of the content was of moderate quality, with a median Ensuring Quality Information for Patients score of 21 (IQR 18-25, maximum 29) out of a maximum possible score of 36. Precautionary measures were not mentioned on 41% (n=30) of websites and 30% (n=22) did not identify disk batteries as a risky FB. Red flags necessitating urgent care were identified on 95% (n=69) of websites, with 89% (n=65) advising patients to seek medical attention and 38% (n=28) advising on safe FB removal. Readability scores (Flesch Reading Ease score=12.4, Flesch-Kincaid Grade Level=6.2, Gunning-Fog Index=6.5, and Simple Measure of Gobbledygook=5.9 years) showed most websites (56%) were below the recommended sixth-grade level.
    CONCLUSIONS: The current quality and readability of information regarding FBs is inadequate. More than half of the websites were above the recommended sixth-grade reading level, and important information regarding high-risk FBs such as disk batteries and magnets was frequently excluded. Strategies should be developed to improve access to high-quality information that informs patients and parents about risks and when to seek medical help. Strategies to promote high-quality websites in search results also have the potential to improve outcomes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:甲状腺热消融技术已被开发为治疗有症状良性结节的手术的替代方法,具有微创的优势,可重复,并发症发生率最低,避免手术的围手术期风险和不可逆转的后果。
    方法:介绍了2016年5月至2022年12月在2个中心由同一内分泌专家团队进行激光热消融的一系列患者。该程序是在由TIR2型或TIR3A型甲状腺结节确定的压迫症状的患者中进行的,并拒绝了手术选择。我们分析了由于持续的压迫症状而需要手术的病例,还原失败,或增加音量。
    结果:从2016年5月至2022年12月,共进行了207例热消融手术(187例患者)。28例患者(15%)的单次消融治疗未解决。TIR3A细胞学的结节比例明显更高(21.4vs7.0%)。在这个群体中,术后6个月体积减少方面的反应显著降低(5.7%vs50%).10名患者(5.3%)接受了手术,其组织学结果在50%的病例中显示为恶性肿瘤。
    结论:数据表明,对于热消融术后6个月体积减少小于20%的患者,及时重新评估并转诊手术治疗的重要性,以确保及时有效治疗错误识别的恶性肿瘤。
    BACKGROUND: Thyroid thermoablative techniques have been developed as an alternative to surgery for the treatment of symptomatic benign nodes, with the advantage of being minimally invasive, repeatable, with a minimal complication rate, and avoiding the perioperative risks and irreversible consequences of surgery.
    METHODS: A series of patients undergoing laser thermoablation by the same team of endocrinologists operating in 2 centers from May 2016 to December 2022 is presented. The procedure was performed in patients with compressive symptomatology determined by a thyroid node typed TIR 2, or TIR 3A with rejection of the surgical option. We analyzed cases that required surgery because of persistent compressive symptoms, reduction failure, or increase in volume.
    RESULTS: From May 2016 to December 2022, 207 thermoablative procedures were performed (187 patients). Single ablative session was unresolved in 28 patients (15%). The proportion of nodes with TIR 3A cytology was significantly higher (21.4 vs 7.0%). In this group, the response in terms of volumetric reduction 6 months after the procedure was significantly lower (5.7% vs 50%). Ten patients (5.3%) underwent surgery, whose histological outcome demonstrated malignancy in 50% of cases.
    CONCLUSIONS: Data show the importance of timely re-evaluation with referral to surgical treatment for patients presenting a less than 20% volume reduction 6 months after thermoablation, in order to ensure promptly effective treatment of misrecognized malignancies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    强大的卫生技术评估(HTA)框架对于解决不断增加的医疗成本负担和为决策提供信息以促进高质量的卫生系统至关重要。本研究旨在描述在世卫组织东南亚地区成功实施HTA的HTA方法和机制。并将与印度环境相关的综合证据进行语境化。
    现实主义评论涉及通过进行系统的搜索策略来发展程序理论,筛选,研究选择,数据提取,和数据合成。将在PubMed(NCBI)上进行系统的文献搜索,EMBASE(Elsevier),Scopus(Elsevier),WebofScience(Clarivate),和ProQuestCentral确定用于卫生技术干预的HTA方法。将进行利益相关者协商,以根据上下文机制结果配置(CMCos)框架开发程序理论。主要证据的搜索将迭代进行。将提取数据并根据程序理论进行测试。拟议的现实主义者审查将根据现实主义者和MEta叙事证据综合:不断发展的标准[RAMESESII]指南进行报告。
    据我们所知,尚未进行全面审查以了解世卫组织东南亚区域的HTA方法机制。现实审查的结果将有助于我们了解HTA在世卫组织东南亚国家开展工作的机制。然后,我们将从证据中获得的发现与印度背景联系起来,基于利益相关者咨询的方案理论开发。将制定一个框架,供印度的决策者/HTA专家使用,以有效实施该框架。
    UNASSIGNED: A robust Health Technology Assessment (HTA) framework is crucial to address the rising burden of healthcare costs and to inform decision-making to promote high-quality health systems. This research aims to describe the HTA methods and mechanisms for the successful implementation of HTA in the WHO South-East Asia region, and contextualize the synthesized evidence relevant to Indian settings.
    UNASSIGNED: Realist review involves developing a program theory by conducting a systematic search strategy, screening, study selection, data extraction, and data synthesis. A systematic search for literature will be conducted on PubMed (NCBI), EMBASE (Elsevier), Scopus (Elsevier), Web of Science (Clarivate), and ProQuest Central for identifying the methods used for HTA of health technology interventions. Stakeholder consultations will be conducted to develop a program theory following the Context-Mechanism-Outcome configurations (CMOcs) framework. Searches for primary evidence will be conducted iteratively. Data will be extracted and tested against the programme theory. The proposed realist review will be reported as per the Realist and MEta-narrative Evidence Syntheses: Evolving Standards [RAMESES II] guidelines.
    UNASSIGNED: To our knowledge, there has been no comprehensive review conducted to understand the mechanisms of HTA methods in the WHO South-East Asia region. The findings from the realist review will help us understand the mechanisms through which the HTA could work in WHO South-East Asian countries. We will then contextualize the findings obtained from evidence to Indian settings, based on program theory development through stakeholder consultation. A framework will be developed that can be used by policymakers/HTA experts in India for effective implementation of the same.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Letter
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    欧洲和全球奶牛养殖业从数据的合作和共享中受益匪浅。由于保护商业投资的要求,基因组学的新时代扰乱了信息流。新性状表型,评估模型,和育种目标继续发展,并将影响国家和专有数据被共享和呈现给乳制品行业的方式。全球性质的牛养殖会,然而,继续需要某种形式的合作,即使在新的工作方式下。
    The European and global dairy breeding industry has benefited enormously from collaboration and sharing of data. The new era of genomics has disrupted the information flow due to the requirement to protect commercial investments. New trait phenotypes, evaluation models, and breeding goals continue to evolve and will impact the way national and proprietary data are shared and presented to the dairy industry. The global nature of cattle breeding will, however, continue to require some form of collaboration, even under the new ways of working.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:深度学习(DL)CT去噪模型有可能改善较低辐射剂量检查的图像质量。这些模型通常用大量的成人患者图像数据进行训练。然而,CT,越来越多的DL去噪方法,用于成人和儿科人群。小儿的体型和体型可能与成年人有很大差异,从新生儿到青少年也有很大差异。为了确保不同体型的儿科亚组不会受到DL方法的不利影响,需要能够评估每个亚组的表现的评估。
    目的:评估儿童和成人患者的DLCT去噪,我们建立了计算机模拟图像质量(IQ)控制模型和评估方法的框架。
    方法:该框架中的计算机模拟IQ体模具有儿科大小的标准CatPhan600和MITA-LCD体模,其直径范围与从新生儿到18岁的儿科患者的平均有效直径相匹配。这些体模用于模拟CT图像,然后将其输入DL去噪器以评估不同大小患者的表现。使用成人扫描协议扫描的标准尺寸的体模模拟成人CT测试图像。用儿科大小的体模模拟儿科CT测试图像,并调整儿科方案。该框架的评估方法包括对成人和儿童测试图像进行去噪,然后评估图像质量的变化,包括噪音,图像清晰度,CT数准确度,和低对比度可检测性。为了演示框架的使用,评估了在成人患者图像上训练的REDCNN去噪模型。为了验证使用所提出的儿科IQ体模测量的DL模型性能在更现实的患者解剖结构中具有代表性,相同年龄范围的拟人化儿童XCAT体模也用于比较降噪性能。
    结果:使用拟议的儿科大小的智商模型框架,观察到成人和儿童大小的体模之间的大小差异显著影响成人训练的DL去噪模型的性能。当应用于成人图像时,DL模型在中或高空间频率下,噪声标准偏差降低了60%,而清晰度没有实质性损失。然而,在较小的体模中,由于成人和儿科协议之间较小的视场(FOV)导致的图像噪声纹理不同,因此去噪性能下降。在验证研究中,儿童大小的IQ体模的降噪趋势与拟人化体模的降噪趋势一致。
    结论:我们开发了一个使用儿科大小的IQ体模进行DL去噪模型的儿科亚组评估的框架。使用框架,我们发现,成人训练的DL去噪器的性能在较小直径的体模中没有很好的推广,这些体模对应于较年轻的儿科患者.我们的工作表明,成人和儿科协议之间FOV变化的噪声纹理差异可能导致DL去噪的泛化性差,并且所提出的框架是识别给定模型的这些性能差异的有效手段。
    BACKGROUND: Deep learning (DL) CT denoising models have the potential to improve image quality for lower radiation dose exams. These models are generally trained with large quantities of adult patient image data. However, CT, and increasingly DL denoising methods, are used in both adult and pediatric populations. Pediatric body habitus and size can differ significantly from adults and vary dramatically from newborns to adolescents. Ensuring that pediatric subgroups of different body sizes are not disadvantaged by DL methods requires evaluations capable of assessing performance in each subgroup.
    OBJECTIVE: To assess DL CT denoising in pediatric and adult-sized patients, we built a framework of computer simulated image quality (IQ) control phantoms and evaluation methodology.
    METHODS: The computer simulated IQ phantoms in the framework featured pediatric-sized versions of standard CatPhan 600 and MITA-LCD phantoms with a range of diameters matching the mean effective diameters of pediatric patients ranging from newborns to 18 years old. These phantoms were used in simulating CT images that were then inputs for a DL denoiser to evaluate performance in different sized patients. Adult CT test images were simulated using standard-sized phantoms scanned with adult scan protocols. Pediatric CT test images were simulated with pediatric-sized phantoms and adjusted pediatric protocols. The framework\'s evaluation methodology consisted of denoising both adult and pediatric test images then assessing changes in image quality, including noise, image sharpness, CT number accuracy, and low contrast detectability. To demonstrate the use of the framework, a REDCNN denoising model trained on adult patient images was evaluated. To validate that the DL model performance measured with the proposed pediatric IQ phantoms was representative of performance in more realistic patient anatomy, anthropomorphic pediatric XCAT phantoms of the same age range were also used to compare noise reduction performance.
    RESULTS: Using the proposed pediatric-sized IQ phantom framework, size differences between adult and pediatric-sized phantoms were observed to substantially influence the adult trained DL denoising model\'s performance. When applied to adult images, the DL model achieved a 60% reduction in noise standard deviation without substantial loss in sharpness in mid or high spatial frequencies. However, in smaller phantoms the denoising performance dropped due to different image noise textures resulting from the smaller field of view (FOV) between adult and pediatric protocols. In the validation study, noise reduction trends in the pediatric-sized IQ phantoms were found to be consistent with those found in anthropomorphic phantoms.
    CONCLUSIONS: We developed a framework of using pediatric-sized IQ phantoms for pediatric subgroup evaluation of DL denoising models. Using the framework, we found the performance of an adult trained DL denoiser did not generalize well in the smaller diameter phantoms corresponding to younger pediatric patient sizes. Our work suggests noise texture differences from FOV changes between adult and pediatric protocols can contribute to poor generalizability in DL denoising and that the proposed framework is an effective means to identify these performance disparities for a given model.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    本文已迁移。这篇文章被标记为推荐。长期以来,女性在医学和生物医学研究中的代表性不足;这通常被描述为“泄漏管道”,“随着级别的级别越高,任命的妇女越少。在评估一名教师晋升时,经常考虑学生对教师教学的评价;因此,如果学生对教学的评价加强了性别偏见,教师晋升中的性别差距可能会保持或增加。在这项研究中,我们在研究生生物医学研究培训计划中检查学生对教师教学的评估,使用两个学年收集的数据。虽然女性教师获得了更高的教学定量评级,学生评论中存在微妙的语言性别差异,这表明学生对教师教学的评价中可能存在对女性的内隐偏见。
    This article was migrated. The article was marked as recommended. There has long been an underrepresentation of women in medicine and biomedical research; this is often described as a \"leaky pipeline,\" as the more senior the level of rank, the fewer women are appointed. In evaluating a faculty member for promotion, student evaluations of faculty teaching are often considered; therefore, if gender biases are reinforced by student evaluations of teaching, the gender gap in faculty promotion could remain or increase. In this study, we examine student evaluations of faculty teaching in a graduate biomedical research training program, using data gathered during two academic years. While female faculty received higher quantitative ratings of teaching, subtle gender differences in language existed in the student comments, indicating that implicit biases about women may be present in student evaluations of faculty teaching.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:有用的反馈和评估对于医学培训生的发展至关重要。虽然大多数学术医生都明白向学习者提供反馈是必不可少的,许多人认为反馈的组成部分不是真正有用的,实施存在障碍。我们试图使用快速阅读器(QR)系统在一个机构的两个儿科专科(儿科重症监护和新生儿围产期医学)中征求学员的反馈,以提高收到的反馈的质量和数量。
    方法:从现有评估中修改了新的评估值,并将其导入到具有QR码功能的在线系统中。每个人都获得了一个QR码,链接到评估,并鼓励他们在各种临床环境和场景中征求反馈和评估。评估数量和评估质量,并在干预前和干预后进行比较。
    结果:儿科重症监护研究员和新生儿围产期医学研究员完成的评估数量有所增加。收到的书面评价的质量没有总体变化。两种培训计划的教师和研究员对评估系统的满意度均有所提高。
    结论:在我们的重症监护病房中,我们成功地为我们的研究员实施了QR码驱动的评估,改善了教师的访问权限,并提供了学习者征求评估的能力,不影响评价的数量或质量。新功能:学习者可以使用快速阅读器(QR)代码来征求教师的评估和反馈。他们可以在不影响质量的情况下增加收到的书面评价的数量。
    OBJECTIVE: Useful feedback and evaluation are critical to a medical trainee\'s development. While most academic physicians understand that giving feedback to learners is essential, many do not consider the components of feedback to be truly useful, and there are barriers to implementation. We sought to use a quick reader (QR) system to solicit feedback for trainees in two pediatric subspecialties (pediatric critical care and neonatal-perinatal medicine) at one institution to increase the quality and quantity of feedback received.
    METHODS: New valuations were modified from the existing evaluations and imported into online systems with QR code capability. Each fellow was given a QR code linking to evaluations and encouraged to solicit feedback and evaluations in a variety of clinical settings and scenarios. Evaluation numbers and quality of evaluations were assessed and compared both pre- and post-intervention.
    RESULTS: There were increases in the number of evaluations completed for both the pediatric critical care fellows and the neonatal-perinatal medicine fellows. There was no overall change in the quality of written evaluations received. Satisfaction with the evaluation system improved for both faculty and fellows of both training programs.
    CONCLUSIONS: In our critical care units, we were successfully able to implement a QR code-driven evaluation for our fellows that improved access for the faculty and offered the ability of the learner to solicit evaluations, without compromising the number or quality of evaluations. What\'s new: Quick reader (QR) codes can be used by learners to solicit evaluations and feedback from faculty. They can increase the quantity of written evaluations received without affecting their quality.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    尽管评估判断是日常决策的核心组成部分,但人们对用于判断判断的过程的时间动态知之甚少。本研究使用事件相关脑电位(ERP)的高时间分辨率来测试Cunningham和Zelazo\(2007)在态度标签检索相对于刺激分类的时间差异,\'以及其迭代再处理(IR)循环模型的原则。参与者在单独的反应时间(RT)任务中对自己的态度做出了同意/不同意的决定,而在32个头皮部位记录了大脑活动时,对自己的自传记忆做出了同意/不同意的决定。RT上的中位数分裂分析用于分离快速和慢速决策。关于自传体刺激的决策产生了典型的结果,其中检索和刺激分类在响应之前同时发生,而与决策难度无关。相比之下,标签检索和分类的相对时间与模型预测的态度决策的难度不同。快速态度决策的处理与快速You决策类似,检索和分类定时与响应耦合。缓慢的态度决定,然而,不同是因为,虽然标签检索时间与快速态度决策相同,检索后处理延迟刺激分类和450毫秒的响应。反应前间隔的背外侧前额叶皮层(DLPFC)的ERP活动是不对称的,在左右半球的态度和自传决定上有更大的活动,分别,而振幅和持续时间随着两者的决策困难而增加。仅缓慢的态度决定就会导致反应前的积极性降低,目标导向反应选择的相关因素。结果为Cunningham和Zelazo(2007)的态度评估二分法的关键方面以及其IR模型中假定的分量过程的时机以及有关存储标签和反射过程在不同态度决策中的作用的新颖信息提供了经验支持。
    Although evaluative judgments are a central component of everyday decision making little is known about the temporal dynamics of the processes used to make them. The present study used the high temporal resolution of event-related brain potentials (ERPs) to test Cunningham and Zelazo\'s (2007) posited differences in the timing of attitude tag retrieval relative to stimulus categorization for \'attitudes\' and \'evaluations,\' as well as tenets of their Iterative Reprocessing (IR) loop model. Participants made agree/disagree decisions about their attitudes and You/Not You decisions about their autobiographical memories in separate reaction time (RT) tasks while brain activity was recorded from 32 scalp sites. A median-split analysis on RT was used to separate fast and slow decisions. Decisions about autobiographical stimuli produced the typical results in which retrieval and stimulus categorization occurred together just before the response regardless of decision difficulty. By contrast, the relative timing of tag retrieval and categorization differed with difficulty for attitude decisions as predicted by the model. Fast attitude decisions were processed similarly to fast You decisions with retrieval and categorization timing coupled to the response. Slow attitude decisions, however, differed because, while tag retrieval timing was the same as for fast attitude decisions, post-retrieval processing delayed stimulus categorization and a response by 450 msec. ERP activity over dorsolateral prefrontal cortex (DLPFC) in the pre-response interval was asymmetrical, with greater activity for attitude and autobiographical decisions over left and right hemispheres, respectively, while amplitude and duration increased with decision difficulty for both. Slow attitude decisions alone elicited a reduced pre-response positivity, a correlate of goal-directed response selection. The results provide empirical support for key aspects of Cunningham and Zelazo\'s (2007) attitude-evaluation dichotomy and the timing of the posited component processes in their IR model as well as novel information about the roles of stored tags and reflective processes in different attitude decisions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:迄今为止,很少有研究检查从居民绩效评估中获得的数据质量。这项研究旨在解决这一需求,并比较了从规范参考和标准参考评估缩放方法中获得的评估者间可靠性,以完成住院医师的绩效评估。方法:对来自2个机构的居民绩效评估数据进行了检查(3个项目,内科2例,外科1例;共426名居民),具有4种评估形式:2种标准参考(1种附加标准参考项目)和2种标准参考。对于表格中的每个能力区域,使用组内相关系数(ICC)(1,10)计算了教师之间的可靠性。ICC转换为z分数,并计算95%CI。每个评估表格和能力的可靠性,能力范围内的平均值,并检查了缩放类型内的平均值。结果:相对于使用标准参考缩放的所有能力,使用标准参考缩放的评估者间可靠性平均值更高。标准参考量表的所有独立类别(能力和评估总体能力的项目)的总得分显示出比标准参考量表(z=0.88,CI0.77-0.99)更高的可靠性(z=1.37,CI1.26-1.48)。此外,综合分数分布的检查(每个被评级的个人的所有能力和评估者的平均值)表明,标准参考的评估更好地代表了绩效连续体。结论:与标准参考评估缩放方法相比,标准参考评估方法似乎提供了更高的评估者间可靠性。尽管需要更多的研究来确定居民评价的最佳做法,使用标准参考缩放可以提供比标准参考缩放更有效的数据。
    Background: Little research to date has examined the quality of data obtained from resident performance evaluations. This study sought to address this need and compared inter-rater reliability obtained from norm-referenced and criterion-referenced evaluation scaling approaches for faculty completing resident performance evaluations. Methods: Resident performance evaluation data were examined from 2 institutions (3 programs, 2 internal medicine and 1 surgery; 426 residents in total), with 4 evaluation forms: 2 criterion-referenced (1 with an additional norm-referenced item) and 2 norm-referenced. Faculty inter-rater reliability was calculated with intraclass correlation coefficients (ICCs) (1,10) for each competency area within the form. ICCs were transformed to z-scores, and 95% CIs were computed. Reliabilities for each evaluation form and competency, averages within competency, and averages within scaling type were examined. Results: Inter-rater reliability averages were higher for all competencies that used criterion-referenced scaling relative to those that used norm-referenced scaling. Aggregate scores of all independent categories (competencies and the items assessing overall competence) for criterion-referenced scaling demonstrated higher reliability (z=1.37, CI 1.26-1.48) than norm-referenced scaling (z=0.88, CI 0.77-0.99). Moreover, examination of the distributions of composite scores (average of all competencies and raters for each individual being rated) suggested that the criterion-referenced evaluations better represented the performance continuum. Conclusion: Criterion-referenced evaluation approaches appear to provide superior inter-rater reliability relative to norm-referenced evaluation scaling approaches. Although more research is needed to identify resident evaluation best practices, using criterion-referenced scaling may provide more valid data than norm-referenced scaling.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号