testing

Testing
  • 文章类型: Journal Article
    本文分析了安大略省中期测试(称为EQAO)的性质和感知效果,加拿大。安大略省的中间赌注测试旨在确保问责制和透明度,并确保全系统的改进,同时避免了高风险对手的负面影响和不利激励。本文提供了来自两个项目的新证据,该项目涵盖了安大略省72个学区中的10个学区的近10年时间跨度。它表明,即使中桩测试在表现形式和效果上比高风险测试温和,人们仍然担心这种测试的必要性和副作用。调查结果涉及安大略省教育改革的两个时期。在第一阶段,特别关注提高识字和数学的表现,管理人员和特殊教育支持人员认为,这些评估提高了教师的期望和紧迫感,从而稳步提高了衡量的成绩,但是也有负面影响的证据,特别是在过分关注“泡沫”学生刚刚低于最低熟练程度的门槛时。在第二次改革时期,以广泛卓越为重点,福祉和公平作为包容性,中期测试被认为具有更广泛的负面影响。这些包括对考试的教学,文化偏见,避免创新,是否将高度脆弱的学生纳入测试过程的困境,以及学生和老师之间的情感不适。本文得出的结论是,安大略省二十世纪的大规模系统,中期评估并没有跟上21世纪对更深入学习和更强大福祉的承诺。
    This paper analyzes the nature and perceived effects of mid-stakes testing (known as the EQAO) in Ontario, Canada. Ontario\'s mid-stakes tests were meant to ensure accountability and transparency, and assure system-wide improvement, while avoiding the negative effects and perverse incentives of their high-stakes counterparts. The paper provides new evidence from two projects covering almost a 10-year time-span in 10 of Ontario\'s 72 school districts. It shows that even though mid-stakes testing is milder in its manifestations and effects than high-stakes testing, concerns remain about the need for and side effects of such testing. The findings concern two periods of Ontario educational reform. In the first period, with a specific focus on improving performance in literacy and mathematics, administrators and special education support staff felt that the assessments raised teachers\' expectations and sense of urgency leading to steady improvements in measured achievement, but that there was also evidence of negative effects, especially on paying undue attention to \"bubble\" students just below the threshold for minimum proficiency. In the second reform period focused on broad excellence, well-being and equity as inclusion, mid-stakes tests were perceived as having more widespread negative effects. These included teaching to the test, cultural bias, avoidance of innovation, dilemmas of whether to include highly vulnerable students in the testing process or not, and emotional ill-being among students and teachers. The paper concludes that Ontario\'s twentieth century system of large scale, mid-stakes assessment has not kept pace with its twenty first century commitments to deeper learning and stronger well-being.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:尽管指示条件(IC)指导的HIV检测(IC-HIVT)可有效促进及时的HIV诊断,台湾对IC类别和相关艾滋病毒风险的研究有限。为了改善IC-HIVT在台湾的采用和传播,这项研究比较了HIV感染者(PLWH)和非HIV对照的IC类别,并调查了HIV感染诊断的延迟.
    方法:全国范围内,回顾性,1:10匹配的病例对照研究分析了来自法定疾病监测系统和国家健康保险研究数据库的数据,以评估2009年至2015年匹配的HIV诊断日期之前5年的42个IC。ICs分为1类ICs(定义为艾滋病的机会性疾病[AOI]),2类IC(与免疫受损或恶性肿瘤相关的疾病,但与AOI无关),第3类IC(与性行为相关的IC),和4类IC(单核细胞增多症或单核细胞增多症样综合征)。在指标日期之前,使用Logistic回归评估与每个IC类别(总体和年度水平)相关的HIV风险。进行了Wilcoxon秩和检验,以通过HIV传播途径评估事件IC类别后诊断延迟的变化。
    结果:一万四千三百四十七个PLWH与143,470个非HIV对照相匹配。所有IC和1-4类IC的患病率结果为,分别,42.59%,11.16%,15.68%,26.48%,PLWH中的0.97%和8.73%,1.05%,4.53%,3.69%,非HIV对照者为0.02%(均P<0.001)。每个IC类别在总体上和每年的艾滋病毒感染风险都高得多。HIV诊断的中位数(四分位距)潜在延迟为15(7-44),324.5(36-947),234(13-976),1-4类IC为74(33-476)天,分别。除了与男性发生性关系的男性的第一类,这些值在2009-2015年期间保持稳定,无论HIV传播途径如何.
    结论:鉴于持续的HIV诊断延迟,IC-HIVT应升级并适应每个IC类别,以加强早期艾滋病毒诊断。
    BACKGROUND: Although indicator condition (IC)-guided HIV testing (IC-HIVT) is effective at facilitating timely HIV diagnosis, research on IC categories and the related HIV risk in Taiwan is limited. To improve the adoption and spread of IC-HIVT in Taiwan, this study compared the IC categories of people living with HIV (PLWH) and non-HIV controls and investigated delays in the diagnosis of HIV infection.
    METHODS: This nationwide, retrospective, 1:10-matched case-control study analyzed data from the Notifiable Diseases Surveillance System and National Health Insurance Research Database to evaluate 42 ICs for the 5-year period preceding a matched HIV diagnostic date from 2009 to 2015. The ICs were divided into category 1 ICs (AIDS-defining opportunistic illnesses [AOIs]), category 2 ICs (diseases associated with impaired immunity or malignancy but not AOIs), category 3 ICs (ICs associated with sexual behaviors), and category 4 ICs (mononucleosis or mononucleosis-like syndrome). Logistic regression was used to evaluate the HIV risk associated with each IC category (at the overall and annual levels) before the index date. Wilcoxon rank-sum test was performed to assess changes in diagnostic delays following an incident IC category by HIV transmission routes.
    RESULTS: Fourteen thousand three hundred forty-seven PLWH were matched with 143,470 non-HIV controls. The prevalence results for all ICs and category 1-4 ICs were, respectively, 42.59%, 11.16%, 15.68%, 26.48%, and 0.97% among PLWH and 8.73%, 1.05%, 4.53%, 3.69%, and 0.02% among non-HIV controls (all P < 0.001). Each IC category posed a significantly higher risk of HIV infection overall and annually. The median (interquartile range) potential delay in HIV diagnosis was 15 (7-44), 324.5 (36-947), 234 (13-976), and 74 (33-476) days for category 1-4 ICs, respectively. Except for category 1 for men who have sex with men, these values remained stable across 2009-2015, regardless of the HIV transmission route.
    CONCLUSIONS: Given the ongoing HIV diagnostic delay, IC-HIVT should be upgraded and adapted to each IC category to enhance early HIV diagnosis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Editorial
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Randomized Controlled Trial
    背景:数字健康工具可以促进护理的连续性。必须加强数字援助,以防止信息缺口或冗余,以及促进灵活护理计划的支持。
    目的:该研究提出了健康电路,一种适应性病例管理方法,使医疗保健专业人员和患者能够实施个性化的循证干预措施,由于动态沟通渠道和以患者为中心的服务工作流程;分析医疗保健影响;并确定其可用性和在医疗保健专业人员和患者中的可接受性。
    方法:从2019年9月到2020年3月,对健康的影响,可用性(用系统可用性量表测量;SUS),在分组随机临床试验(n=100)中,对住院风险高的患者(研究1)进行了健康电路初始原型的可接受性(用净启动子评分;NPS测量).从2020年7月到2021年7月,对104名接受大手术前康复的高危患者进行了可用性(使用SUS)和可接受性(使用NPS)的上市前试点研究(研究2)。
    结果:在研究1中,HealthCircuit导致急诊室就诊减少(4/7,13%vs7/16,44%),增强患者的授权(P<.001),并显示出良好的可接受性和可用性评分(NPS:31;SUS:54/100)。在研究2中,NPS为40,SUS为85/100。接受率也很高(平均得分为8.4/10)。
    结论:尽管是原型系统,但健康电路显示出医疗保健价值生成的潜力以及良好的可接受性和可用性,提示需要在现实世界的场景中测试一个完整的系统。
    背景:ClinicalTrials.govNCT04056663;https://clinicaltrials.gov/ct2/show/NCT04056663。
    Digital health tools may facilitate the continuity of care. Enhancement of digital aid is imperative to prevent information gaps or redundancies, as well as to facilitate support of flexible care plans.
    The study presents Health Circuit, an adaptive case management approach that empowers health care professionals and patients to implement personalized evidence-based interventions, thanks to dynamic communication channels and patient-centered service workflows; analyze the health care impact; and determine its usability and acceptability among health care professionals and patients.
    From September 2019 to March 2020, the health impact, usability (measured with the system usability scale; SUS), and acceptability (measured with the net promoter score; NPS) of an initial prototype of Health Circuit were tested in a cluster randomized clinical pilot (n=100) in patients with high risk for hospitalization (study 1). From July 2020 to July 2021, a premarket pilot study of usability (with the SUS) and acceptability (with the NPS) was conducted among 104 high-risk patients undergoing prehabilitation before major surgery (study 2).
    In study 1, Health Circuit resulted in a reduction of emergency room visits (4/7, 13% vs 7/16, 44%), enhanced patients\' empowerment (P<.001) and showed good acceptability and usability scores (NPS: 31; SUS: 54/100). In study 2, the NPS was 40 and the SUS was 85/100. The acceptance rate was also high (mean score of 8.4/10).
    Health Circuit showed potential for health care value generation and good acceptability and usability despite being a prototype system, prompting the need for testing a completed system in real-world scenarios.
    ClinicalTrials.gov NCT04056663; https://clinicaltrials.gov/ct2/show/NCT04056663.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    未经评估:边境管制减轻了当地的感染,但要付出沉重的经济代价,特别是对于依赖旅游的国家。虽然有研究支持边境管制在抑制跨境传播方面的功效,尚未研究进口和次要案例以及损失的经济活动成本之间的权衡。这项针对COVID-19大流行期间新加坡的案例研究旨在了解不同检疫时间和检测策略对经济和卫生系统的影响。此外,我们探讨了允许未接种疫苗的旅行者解决新出现的公平问题的影响.我们假设社区传播是稳定的,疫苗接种率足够高,以至于入境旅客不会被劝阻。
    UNASSIGNED:考虑到更长的隔离时间会降低旅行意愿,因此预测了旅行者的数量。一个微观模拟模型预测了旅行者中COVID-19的数量,由此产生的次要案例,以及每组出现症状的概率。与开放前状态相比,新加坡的增量净货币收益(INB)在每个边境开放政策下进行了量化,根据旅游收入,测试和检疫的成本/利润,以及COVID-19病例造成的成本和健康损失。
    UNASSIGNED:与聚合酶链反应(PCR)相比,快速抗原检测(ART)检测的输入病例较少,但继发病例较少.长时间的检疫导致病例减少,但由于旅游收入减少,INB降低。假设未接种疫苗的旅行者比例很小(当地为8%,全球为24%),允许未接种疫苗的旅行者将在不超过重症监护病房(ICU)容量的情况下获得更高的INB。所有旅客每月最高的INB为223624万美元,每月有46.69例ICU病例,在出发前和抵达时使用ARTs实现,无需检疫。在各种模型假设的变化下,就最高INB而言,最优策略是稳健的。在所有成本效益组成部分中,INB的主要驱动力是旅游收入。
    UNASSIGNED:本地和全球疫苗接种率高,社区传播稳定,无论疫苗接种状况如何,向旅行者开放边境将增加目的地国家的经济增长。在不超过ICU容量的情况下,案件量仍然可以管理,案件成本由旅行者产生的经济价值抵消。
    Border control mitigates local infections but bears a heavy economic cost, especially for tourism-reliant countries. While studies have supported the efficacy of border control in suppressing cross-border transmission, the trade-off between costs from imported and secondary cases and from lost economic activities has not been studied. This case study of Singapore during the COVID-19 pandemic aims to understand the impacts of varying quarantine length and testing strategies on the economy and health system. Additionally, we explored the impact of permitting unvaccinated travelers to address emerging equity concerns. We assumed that community transmission is stable and vaccination rates are high enough that inbound travelers are not dissuaded from traveling.
    The number of travelers was predicted considering that longer quarantine reduces willingness to travel. A micro-simulation model predicted the number of COVID-19 cases among travelers, the resultant secondary cases, and the probability of being symptomatic in each group. The incremental net monetary benefit (INB) of Singapore was quantified under each border-opening policy compared to pre-opening status, based on tourism receipts, cost/profit from testing and quarantine, and cost and health loss due to COVID-19 cases.
    Compared to polymerase chain reaction (PCR), rapid antigen test (ART) detects fewer imported cases but results in fewer secondary cases. Longer quarantine results in fewer cases but lower INB due to reduced tourism receipts. Assuming the proportion of unvaccinated travelers is small (8% locally and 24% globally), allowing unvaccinated travelers will accrue higher INB without exceeding the intensive care unit (ICU) capacity. The highest monthly INB from all travelers is $2,236.24 m, with 46.69 ICU cases per month, achieved with ARTs at pre-departure and on arrival without quarantine. The optimal policy in terms of highest INB is robust under changes to various model assumptions. Among all cost-benefit components, the top driver for INB is tourism receipts.
    With high vaccination rates locally and globally alongside stable community transmission, opening borders to travelers regardless of vaccination status will increase economic growth in the destination country. The caseloads remain manageable without exceeding ICU capacity, and costs of cases are offset by the economic value generated from travelers.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:COVID-19大流行严重影响了许多低收入和中等收入国家(LMICs),比如秘鲁,压倒他们的卫生系统。SARS-CoV-2是导致COVID-19的病毒,已经提出了快速抗原检测自检,安全,负担得起的,和易于执行的方法,以改善在资源有限的人群中SARS-CoV-2的早期发现和监测,这些人群在获得医疗保健方面存在差距。
    目的:本研究旨在探讨决策者对SARS-CoV-2自我测试的价值观和态度。
    方法:2021年,我们在秘鲁的2个地区(城市利马和农村ValledelMantaro)进行了定性研究。目的抽样用于确定民间社会团体(RSC)的代表,卫生保健工作者(HCWs),和潜在的实施者(PI)充当信息提供者,其声音将为公众围绕自我测试的态度提供代理。
    结果:总计,30名举报人参加了个人,半结构化访谈(SSIs)和29名举报人参加了5次焦点小组讨论(FGD)。自我测试被认为是增加秘鲁农村和城市公众接受测试的一种方法。结果显示,公众更喜欢基于唾液的自我测试,更喜欢在社区药房使用。此外,对于秘鲁的每个人口亚组,有关如何进行自检的信息应该是清楚的。测试应该是高质量和低成本的。健康知情的沟通策略也必须伴随着任何自我测试的引入。
    结论:在秘鲁,决策者认为公众愿意接受SARS-CoV-2自检,使用安全,容易获得,和负担得起的。关于自检功能和说明的充分信息,以及使用后获得咨询和护理,必须通过秘鲁卫生部提供。
    BACKGROUND: The COVID-19 pandemic heavily impacted many low- and middle-income countries (LMICs), such as Peru, overwhelming their health systems. Rapid antigen detection self-tests for SARS-CoV-2, the virus that causes COVID-19, have been proposed as a portable, safe, affordable, and easy-to-perform approach to improve early detection and surveillance of SARS-CoV-2 in resource-constrained populations where there are gaps in access to health care.
    OBJECTIVE: This study aims to explore decision makers\' values and attitudes around SARS-CoV-2 self-testing.
    METHODS: In 2021, we conducted a qualitative study in 2 areas of Peru (urban Lima and rural Valle del Mantaro). Purposive sampling was used to identify representatives of civil society groups (RSCs), health care workers (HCWs), and potential implementers (PIs) to act as informants whose voices would provide a proxy for the public\'s attitudes around self-testing.
    RESULTS: In total, 30 informants participated in individual, semistructured interviews (SSIs) and 29 informants participated in 5 focus group discussions (FGDs). Self-tests were considered to represent an approach to increase access to testing that both the rural and urban public in Peru would accept. Results showed that the public would prefer saliva-based self-tests and would prefer to access them in their community pharmacies. In addition, information about how to perform a self-test should be clear for each population subgroup in Peru. The tests should be of high quality and low cost. Health-informed communication strategies must also accompany any introduction of self-testing.
    CONCLUSIONS: In Peru, decision makers consider that the public would be willing to accept SARS-CoV-2 self-tests if they are accurate, safe to use, easily available, and affordable. Adequate information about the self-tests\' features and instructions, as well as about postuse access to counseling and care, must be made available through the Ministry of Health in Peru.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    OBJECTIVE: To compare the accuracy in the estimation of the Smith machine bench press 1-repetition maximum (1RM) when using a novel minimum velocity threshold (MVT) called optimal MVT (MVT that minimizes the differences between the actual and predicted 1RM in a preliminary session) with respect to using the 2 standard MVTs (general and individual MVTs).
    METHODS: A total of 126 young men (Smith machine bench press 1RM = 80.7 [13.6] kg) completed 2 identical sessions consisting of an incremental loading test until reaching the 1RM load. Four individual load-velocity relationships were modeled in each session considering all loading conditions until reaching the load that showed the closest mean velocity to 0.60, 0.50, 0.40, and 0.30 m·s-1. The first testing session was used to determine the preindividual MVT and 4 optimal MVTs (1 for each final test velocity), while the second testing session was used to estimate the 1RM using 4 types of MVT (general MVT, preindividual MVT, actual-individual MVT, and optimal MVT).
    RESULTS: The absolute errors in the prediction of the 1RM were significantly lower for the optimal MVT (2.94 [2.40] kg) compared to the general MVT (3.66 [2.99] kg), preindividual MVT (3.80 [3.15] kg), and actual-individual MVT (4.02 [3.21] kg). The optimal MVT (intraclass correlation coefficient [ICC] ranged from .56 to .62) was always more reliable than the individual MVT (ICC = .34).
    CONCLUSIONS: The optimal MVT provides more accurate estimates of the Smith machine bench press 1RM than the standard MVTs previously used in scientific research (general and individual MVTs).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:2019年冠状病毒病(COVID-19)在几个月内成为大流行。分析大流行的第一年,数据和监测差距随后浮出水面。然而,对他们国家打击COVID-19战略的政策决定和公众信任依赖于病例数,死亡人数和其他不熟悉的指标。国际上对COVID-19病例数存在许多限制,这使得原始数据和政策反应的跨国比较变得困难。
    UNASSIGNED:本文介绍并描述了测试和报告过程中的步骤,以许多国家在每个步骤中遇到的障碍为例,所有这些都导致COVID-19病例被低估。这项工作提出了在COVID-19数据中需要考虑的因素,并提供了建议,以告知COVID-19的当前情况以及未来大流行中需要注意的问题。
    BACKGROUND: Coronavirus disease 2019 (COVID-19) became a pandemic within a matter of months. Analysing the first year of the pandemic, data and surveillance gaps have subsequently surfaced. Yet, policy decisions and public trust in their country\'s strategies in combating COVID-19 rely on case numbers, death numbers and other unfamiliar metrics. There are many limitations on COVID-19 case counts internationally, which make cross-country comparisons of raw data and policy responses difficult.
    UNASSIGNED: This paper presents and describes steps in the testing and reporting process, with examples from a number of countries of barriers encountered in each step, all of which create an undercount of COVID-19 cases. This work raises factors to consider in COVID-19 data and provides recommendations to inform the current situation with COVID-19 as well as issues to be aware of in future pandemics.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    COVID-19大流行强调了社会不平等对健康(SIH)的影响。各种研究表明,与COVID-19相关的死亡率和发病率以及健康的社会决定因素的影响显著不平等。这项定性案例研究的目的是分析在法国两个关键的COVID-19预防和控制干预措施的设计中对SIH的考虑:测试和接触者追踪。对参与干预措施设计和/或政府应对大流行的36名关键线人进行了访谈,并审查了相关文件(n=15)。我们应用数据三角剖分和混合演绎和归纳分析来分析数据。调查结果揭示了对SIH的不同理解和观点,以及在大流行的开始阶段与考虑这些挑战相关的挑战。尽管参与者之间对SIH有共同的关注,流行病学参考框架主导了干预措施的设计。它产生了一个模型,其中对SIH的考虑作为一个补充,干预的临床目标是:打破COVID-19传播链。尽管COVID-19健康危机凸显了SIH的重要性,这似乎不是一个在应对努力中进一步考虑它们的机会。本文基于定性调查,为SIH在设计测试和接触者追踪干预措施方面提供了原始见解。
    COVID-19大流行强调了社会健康不平等(SIH)的重要性以及大流行的不成比例的负担及其与社会经济地位有关的后果,种族和种族,在其他健康决定因素中。如果在设计阶段不考虑公共卫生干预措施,则可能会增加SIH。通过定性案例研究,我们分析了第一个为法兰西岛地区普通民众提供测试和接触者追踪服务的本地计划之一的设计(巴黎地区,法国)以应对COVID-19大流行。本文讨论了在干预设计中考虑SIH的不确定性和挑战。它探讨了参与者对SIH的不同理解,以及在健康危机时期解决SIH的跨部门伙伴关系的复杂性。尽管受访者对这个问题有共识,流行病学参考框架主导了干预设计。它产生了一个模型,其中对SIH的考虑作为一个补充,干预的临床目标是:打破COVID-19传播链。
    The COVID-19 pandemic highlighted the impact of social inequalities in health (SIH). Various studies have shown significant inequalities in mortality and morbidity associated with COVID-19 and the influence of social determinants of health. The objective of this qualitative case study was to analyze the consideration of SIH in the design of two key COVID-19 prevention and control interventions in France: testing and contact tracing. Interviews were conducted with 36 key informants involved in the design of the intervention and/or the government response to the pandemic as well as relevant documents (n = 15) were reviewed. We applied data triangulation and a hybrid deductive and inductive analysis to analyze the data. Findings revealed the divergent understandings and perspectives about SIH, as well as the challenges associated with consideration for these at the beginning stages of the pandemic. Despite a shared concern for SIH between the participants, an epidemiological frame of reference dominated the design of the intervention. It resulted in a model in which consideration for SIH appeared as a complement, with a clinical goal of the intervention: breaking the chain of COVID-19 transmission. Although the COVID-19 health crisis highlighted the importance of SIH, it did not appear to be an opportunity to further their consideration in response efforts. This article provides original insights into consideration for SIH in the design of testing and contact-tracing interventions based upon a qualitative investigation.
    The COVID-19 pandemic has highlighted the importance of social inequalities in health (SIH) and the disproportionate burden of the pandemic and its consequences related to socioeconomic status, ethnicity and race, among other determinants of health. Public health interventions are likely to increase SIH when they are not considered in the design phase. Through a qualitative case study, we analyzed the design of one of the first local initiative providing testing and contact tracing offer to the general population in the Île-de-France region (Paris region, France) in response to the COVID-19 pandemic. This article discusses the uncertainty and challenges associated with consideration for SIH in the intervention design. It explores the diverse understandings of SIH among the actors and the complexities of cross-sectoral partnerships addressing SIH in times of health crisis. Despite a consensual concern for this issue among the respondents, an epidemiological frame of reference dominated the intervention design. It resulted in a model in which consideration for SIH appeared as a complement, with a clinical goal of the intervention: breaking the chain of COVID-19 transmission.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    我们区分了深度神经网络(DNN)的两种一般测试模式:离线测试,其中DNN基于获得的测试数据集作为单独的单元进行测试,而不涉及测试中的DNN。和在线测试,其中DNN被嵌入到特定的应用环境中,并在与应用环境交互的闭环模式下进行测试。通常,DNN在其开发生命周期中都要经历这两种类型的测试,其中在DNN培训后立即应用离线测试,在离线测试后以及在特定应用环境中部署DNN后进行在线测试。在本文中,我们研究了离线测试和在线测试之间的关系。Ourgoalistodeterminehowofflinetestingandonlinetestingdifferentorcomplementoneanotherandifofflinetestingresultscanbeusedtohelpreducethecostofonlinetesting?Thoughthesequestionsaregenerallyrelevanttoallautonomoussystems,我们在自动驾驶系统的背景下研究它们,作为研究对象,我们使用DNNs自动端到端控制自动驾驶车辆的转向功能。我们的结果表明,离线测试不如在线测试有效,因为在线测试识别的许多安全违规行为无法通过离线测试识别,而离线测试产生的大量预测错误总是导致在线测试可检测到的严重安全违规。Further,我们不能利用离线测试结果来降低实际在线测试的成本,因为我们无法识别离线测试在识别违反安全要求方面可以像在线测试一样准确的特定情况。
    We distinguish two general modes of testing for Deep Neural Networks (DNNs): Offline testing where DNNs are tested as individual units based on test datasets obtained without involving the DNNs under test, and online testing where DNNs are embedded into a specific application environment and tested in a closed-loop mode in interaction with the application environment. Typically, DNNs are subjected to both types of testing during their development life cycle where offline testing is applied immediately after DNN training and online testing follows after offline testing and once a DNN is deployed within a specific application environment. In this paper, we study the relationship between offline and online testing. Our goal is to determine how offline testing and online testing differ or complement one another and if offline testing results can be used to help reduce the cost of online testing? Though these questions are generally relevant to all autonomous systems, we study them in the context of automated driving systems where, as study subjects, we use DNNs automating end-to-end controls of steering functions of self-driving vehicles. Our results show that offline testing is less effective than online testing as many safety violations identified by online testing could not be identified by offline testing, while large prediction errors generated by offline testing always led to severe safety violations detectable by online testing. Further, we cannot exploit offline testing results to reduce the cost of online testing in practice since we are not able to identify specific situations where offline testing could be as accurate as online testing in identifying safety requirement violations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号