Generalizability

泛化
  • 文章类型: Journal Article
    抑郁症是一种对心理治疗或药物有反应的可治疗的疾病;每种疾病的疗效已在数百项对照试验中确立。尽管如此,尽管存在有效的治疗方法,但近年来抑郁症的患病率仍在增加,这种现象被称为治疗-患病率悖论.我们考虑这个悖论的几种可能的解释,这包括对抑郁症本质的误解,已建立的治疗方法的功效膨胀,缺乏有效的治疗方法。我们发现这些可能的解释中的每一个都得到了支持,尤其是很大一部分人群无法获得按预期实施的有效治疗的观点。最后,我们描述了使用非专业治疗师和数字技术来克服这种缺乏机会并覆盖历史上服务不足的人群并同时保证所提供干预措施的质量的潜力。
    Depression is an eminently treatable disorder that responds to psychotherapy or medications; the efficacy of each has been established in hundreds of controlled trials. Nonetheless, the prevalence of depression has increased in recent years despite the existence of efficacious treatments-a phenomenon known as the treatment-prevalence paradox. We consider several possible explanations for this paradox, which range from a misunderstanding of the very nature of depression, inflated efficacy of the established treatments, and a lack of access to efficacious delivery of treatments. We find support for each of these possible explanations but especially the notion that large segments of the population lack access to efficacious treatments that are implemented as intended. We conclude by describing the potential of using lay therapists and digital technologies to overcome this lack of access and to reach historically underserved populations and simultaneously guarantee the quality of the interventions delivered.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:设计心力衰竭(HF)临床试验的传统方法历来依赖于专业知识和过去的实践。然而,不断发展的医疗保健景观,以新型数据科学应用的出现和数据可用性的增加为标志,提供了一个令人信服的机会,在试验设计中过渡到数据驱动的范式。这项研究旨在通过利用自然语言处理来分析试验资格标准来评估临床试验和注册之间差异的范围和决定因素。这些发现有助于建立一个强大的设计框架,以指导未来的HF试验。
    结果:确定了截至2021年底在ClinicalTrials.gov上注册的HF介入III期试验。自然语言处理用于提取和构建定量分析的合格标准。射血分数降低的HF(HFrEF)的最常见标准用于评估ASIAN-HF(N=4868)和BIOSTAT-CHF注册(N=2545)中注册患者的比例。在针对HF的375个III期试验中,确定了163项HFrEF试验。在这些试验中,最常遇到的纳入标准是纽约心脏协会(NYHA)功能等级(69%),HF恶化(23%),和利钠肽(18%),而最常见的基于合并症的排除标准是急性冠脉综合征(64%),肾脏疾病(55%),和心脏瓣膜病(47%)。平均而言,20%的注册患者符合HFrEF试验的条件。亚洲人[中位资格0.20,四分位距(IQR)0.08-0.43]和欧洲注册人群(中位资格0.17,IQR0.06-0.39)之间的资格分布没有差异(P=0.18)。随着时间的推移,HFrEF试验变得更加严格,患者资格从1985-2005年的0.40下降至2016-2022年的0.19(P=0.03).当考虑到试验中的频率时,最严格的资格标准是既往心肌梗死,NYHA类,年龄,和以前的HF住院。
    结论:基于14项试验标准,只有五分之一的注册患者符合III期HFrEF试验的条件.亚洲和欧洲患者队列的总体合格率没有差异。
    OBJECTIVE: Traditional approaches to designing clinical trials for heart failure (HF) have historically relied on expertise and past practices. However, the evolving landscape of healthcare, marked by the advent of novel data science applications and increased data availability, offers a compelling opportunity to transition towards a data-driven paradigm in trial design. This research aims to evaluate the scope and determinants of disparities between clinical trials and registries by leveraging natural language processing for the analysis of trial eligibility criteria. The findings contribute to the establishment of a robust design framework for guiding future HF trials.
    RESULTS: Interventional phase III trials registered for HF on ClinicalTrials.gov as of the end of 2021 were identified. Natural language processing was used to extract and structure the eligibility criteria for quantitative analysis. The most common criteria for HF with reduced ejection fraction (HFrEF) were applied to estimate patient eligibility as a proportion of registry patients in the ASIAN-HF (N = 4868) and BIOSTAT-CHF registries (N = 2545). Of the 375 phase III trials for HF, 163 HFrEF trials were identified. In these trials, the most frequently encountered inclusion criteria were New York Heart Association (NYHA) functional class (69%), worsening HF (23%), and natriuretic peptides (18%), whereas the most frequent comorbidity-based exclusion criteria were acute coronary syndrome (64%), renal disease (55%), and valvular heart disease (47%). On average, 20% of registry patients were eligible for HFrEF trials. Eligibility distributions did not differ (P = 0.18) between Asian [median eligibility 0.20, interquartile range (IQR) 0.08-0.43] and European registry populations (median 0.17, IQR 0.06-0.39). With time, HFrEF trials became more restrictive, where patient eligibility declined from 0.40 in 1985-2005 to 0.19 in 2016-2022 (P = 0.03). When frequency among trials is taken into consideration, the eligibility criteria that were most restrictive were prior myocardial infarction, NYHA class, age, and prior HF hospitalization.
    CONCLUSIONS: Based on 14 trial criteria, only one-fifth of registry patients were eligible for phase III HFrEF trials. Overall eligibility rates did not differ between the Asian and European patient cohorts.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    将社会经济地位(SES)作为自变量进行测量具有挑战性,尤其是在流行病学和社会研究中。这个问题在国家一级的大规模研究中更为关键。本研究旨在广泛评估伊朗SES问卷的有效性和可靠性。
    这种心理测量学,对3000户家庭进行了横断面研究,通过随机整群抽样从东阿塞拜疆省和德黑兰的不同地区选出,伊朗。此外,来自大不里士医科大学的250名学生被选为采访员,从伊朗40个地区收集数据。使用探索性和验证性因素分析以及Cronbachα评估SES问卷的结构效度和内部一致性。数据分析采用SPSS和AMOS。
    完整的伊朗版本的SES问卷由5个因素组成。Cronbach的α值计算为0.79、0.94、0.66、0.69和0.48,经济能力的自我评估,房子和家具,财富,和卫生支出,分别。此外,验证性因素分析结果表明数据与5因素模型(比较拟合指数=0.96;拟合优度指数=0.95;增量拟合指数=0.96;近似均方根误差=0.05)的相容性。
    根据结果,该工具的确证的有效性和可靠性表明,伊朗版本的SES问卷可以广泛使用相同的结构,并且可以适用于更广泛人群的SES测量.
    UNASSIGNED: Measuring socioeconomic status (SES) as an independent variable is challenging, especially in epidemiological and social studies. This issue is more critical in large-scale studies on the national level. The present study aimed to extensively evaluate the validity and reliability of the Iranian SES questionnaire.
    UNASSIGNED: This psychometric, cross-sectional study was conducted on 3000 households, selected via random cluster sampling from various areas in East Azerbaijan province and Tehran, Iran. Moreover, 250 students from Tabriz University of Medical Sciences were selected as interviewers to collect data from 40 districts in Iran. The construct validity and internal consistency of the SES questionnaire were assessed using exploratory and confirmatory factor analyses and the Cronbach\'s alpha. Data analysis was performed in SPSS and AMOS.
    UNASSIGNED: The complete Iranian version of the SES questionnaire consists of 5 factors. The Cronbach\'s alpha was calculated to be 0.79, 0.94, 0.66, 0.69, and 0.48 for the occupation, self-evaluation of economic capacity, house and furniture, wealth, and health expenditure, respectively. In addition, the confirmatory factor analysis results indicated the data\'s compatibility with the 5-factor model (comparative fit index = 0.96; goodness of fit index = 0.95; incremental fit index = 0.96; root mean square error of approximation = 0.05).
    UNASSIGNED: According to the results, the confirmed validity and reliability of the tool indicated that the Iranian version of the SES questionnaire could be utilized with the same structure on an extensive level and could be applicable for measuring the SES in a broader range of populations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    文献显示,美国亚裔美国人的年龄标准化痴呆发病率不同,夏威夷原住民,和太平洋岛民(AANHPI),但由于缺乏AANHPI纵向概率样本,因此不存在对人群代表性痴呆发病率的估计.我们使用加利福尼亚州健康访谈调查比较了AANHPIKaiserPermanente北加州成员(KPNC队列)与AANHPI60+的目标人群之间的协调特征。我们使用稳定的选择权重倒数几率(sIOSW)来估计目标人群中种族特定的粗和年龄标准化的痴呆发病率以及90岁时的累积风险。KPNC队列和目标人群之间的差异因种族而异。sIOSW消除了较大种族群体的大多数差异;较小的群体仍然存在一些差异。使用sIOSW估计的粗痴呆发病率(与未加权)在中国相似,菲律宾人,太平洋岛民和越南人,日语更高,韩国人,南亚人。南亚人的未加权和加权年龄标准化发病率不同。所有组的未加权和加权累积风险相似。我们估计了AANHPI种族中第一个具有人口代表性的痴呆发病率和累积风险。我们遇到了一些估计问题,加权估计不精确,突出挑战,使用权重将推论扩展到目标人群。
    Literature shows heterogeneous age-standardized dementia incidence rates across US Asian American, Native Hawaiian, and Pacific Islanders (AANHPI), but no estimates of population-representative dementia incidence exist due to lack of AANHPI longitudinal probability samples. We compared harmonized characteristics between AANHPI Kaiser Permanente Northern California members (KPNC cohort) and the target population of AANHPI 60+ with private or Medicare insurance using the California Health Interview Survey. We used stabilized inverse odds of selection weights (sIOSW) to estimate ethnicity-specific crude and age-standardized dementia incidence rates and cumulative risk by age 90 in the target population. Differences between the KPNC cohort and target population varied by ethnicity. sIOSW eliminated most differences in larger ethnic groups; some differences remained in smaller groups. Estimated crude dementia incidence rates using sIOSW (versus unweighted) were similar in Chinese, Filipinos, Pacific Islanders and Vietnamese, and higher in Japanese, Koreans, and South Asians. Unweighted and weighted age-standardized incidence rates differed for South Asians. Unweighted and weighted cumulative risk were similar for all groups. We estimated the first population-representative dementia incidence rates and cumulative risk in AANHPI ethnic groups. We encountered some estimation problems and weighted estimates were imprecise, highlighting challenges using weighting to extend inferences to target populations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    用于医学图像分析的深度学习分类模型通常对来自用于获取训练数据的扫描仪的数据表现良好。然而,当这些模型应用于来自不同供应商的数据时,他们的表现往往会大幅下降。仅在来自特定扫描仪的扫描内发生的工件是这种较差的可泛化性的主要原因。我们旨在使用一种称为基于不确定性的实例排除(UBIX)的新颖方法来增强深度学习分类模型的可靠性。UBIX是可在多实例学习(MIL)设置中采用的推理时间模块。MIL是其中袋子(通常是图像)的实例(通常是作物或切片)有助于袋子级输出的范例。而不是假设所有实例对袋级输出的贡献相等,UBIX使用不确定性估计检测由于本地工件而损坏的实例,在MIL汇集之前减少或完全忽略他们的贡献。在我们的实验中,实例是2D切片,袋子是体积图像,但替代定义也是可能的。虽然UBIX通常适用于不同的分类任务,我们关注光学相干断层扫描中年龄相关性黄斑变性的分期.我们的模型在来自单个扫描仪的数据上进行了训练,并在来自不同供应商的外部数据集上进行了测试。其中包括特定于供应商的工件。UBIX表现出可靠的行为,性能略有下降(二次加权κ(κw)从0.861下降到0.708),当应用于来自不同供应商的包含伪影的图像时;而没有UBIX的最先进的3D神经网络在同一测试集上的性能受到重大损害(κw从0.852到0.084)。我们表明,可以通过OOD检测来识别具有看不见的伪影的实例。UBIX可以减少它们对袋级预测的贡献,在不重新训练新数据的情况下提高可靠性。这可能会增加人工智能模型对其他扫描仪数据的适用性,而不是为其开发的扫描仪。UBIX的源代码,包括训练的模型权重,可通过https://github.com/qurAI-amsterdam/ubix-for-reliable-classification公开。
    Deep learning classification models for medical image analysis often perform well on data from scanners that were used to acquire the training data. However, when these models are applied to data from different vendors, their performance tends to drop substantially. Artifacts that only occur within scans from specific scanners are major causes of this poor generalizability. We aimed to enhance the reliability of deep learning classification models using a novel method called Uncertainty-Based Instance eXclusion (UBIX). UBIX is an inference-time module that can be employed in multiple-instance learning (MIL) settings. MIL is a paradigm in which instances (generally crops or slices) of a bag (generally an image) contribute towards a bag-level output. Instead of assuming equal contribution of all instances to the bag-level output, UBIX detects instances corrupted due to local artifacts on-the-fly using uncertainty estimation, reducing or fully ignoring their contributions before MIL pooling. In our experiments, instances are 2D slices and bags are volumetric images, but alternative definitions are also possible. Although UBIX is generally applicable to diverse classification tasks, we focused on the staging of age-related macular degeneration in optical coherence tomography. Our models were trained on data from a single scanner and tested on external datasets from different vendors, which included vendor-specific artifacts. UBIX showed reliable behavior, with a slight decrease in performance (a decrease of the quadratic weighted kappa (κw) from 0.861 to 0.708), when applied to images from different vendors containing artifacts; while a state-of-the-art 3D neural network without UBIX suffered from a significant detriment of performance (κw from 0.852 to 0.084) on the same test set. We showed that instances with unseen artifacts can be identified with OOD detection. UBIX can reduce their contribution to the bag-level predictions, improving reliability without retraining on new data. This potentially increases the applicability of artificial intelligence models to data from other scanners than the ones for which they were developed. The source code for UBIX, including trained model weights, is publicly available through https://github.com/qurAI-amsterdam/ubix-for-reliable-classification.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    认知神经科学中一个常见的研究协议是训练受试者在记录大脑活动的同时进行故意设计的实验,目的是了解认知背后的大脑机制。然而,很少讨论该协议的研究结果如何应用于技术。这里,我回顾了关于大脑时间处理的研究,作为这个研究方案的例子,以及神经科学的两个主要应用领域(神经工程和大脑启发的人工智能)。时间处理是认知的一个基本维度,时间也是任何现实世界信号在技术中处理的不可或缺的维度。因此,人们可能会期望认知中时间处理的研究会对大脑相关技术产生深远的影响。令人惊讶的是,我发现认知研究对时间处理的结果对解决实际问题几乎没有帮助。这种尴尬的局面可能是由于认知研究结果缺乏概括性,在良好控制的实验室条件下,现实生活中的情况。这种普遍性的缺乏可能源于世界的根本不可知性(包括认知)。总的来说,本文对上述认知神经科学研究方案的有用性和前景进行了质疑和批评。对今后的研究提出三点建议。首先,为了提高研究的普遍性,最好是在现实生活条件下研究大脑活动,而不是在控制良好的实验室实验中。第二,为了克服世界的不可知性,我们可以设计一个容易接近的被调查对象的代理人,这样我们就可以通过在代理人上进行实验来预测被调查对象的行为。第三,论文呼吁以技术为导向的研究,目的是创造技术而不是发现知识。
    A common research protocol in cognitive neuroscience is to train subjects to perform deliberately designed experiments while recording brain activity, with the aim of understanding the brain mechanisms underlying cognition. However, how the results of this protocol of research can be applied in technology is seldom discussed. Here, I review the studies on time processing of the brain as examples of this research protocol, as well as two main application areas of neuroscience (neuroengineering and brain-inspired artificial intelligence). Time processing is a fundamental dimension of cognition, and time is also an indispensable dimension of any real-world signal to be processed in technology. Therefore, one may expect that the studies of time processing in cognition profoundly influence brain-related technology. Surprisingly, I found that the results from cognitive studies on timing processing are hardly helpful in solving practical problems. This awkward situation may be due to the lack of generalizability of the results of cognitive studies, which are under well-controlled laboratory conditions, to real-life situations. This lack of generalizability may be rooted in the fundamental unknowability of the world (including cognition). Overall, this paper questions and criticizes the usefulness and prospect of the abovementioned research protocol of cognitive neuroscience. I then give three suggestions for future research. First, to improve the generalizability of research, it is better to study brain activity under real-life conditions instead of in well-controlled laboratory experiments. Second, to overcome the unknowability of the world, we can engineer an easily accessible surrogate of the object under investigation, so that we can predict the behavior of the object under investigation by experimenting on the surrogate. Third, the paper calls for technology-oriented research, with the aim of technology creation instead of knowledge discovery.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    总结最近关于差异研究中选择偏见的文献,解决描述性或因果关系问题,痴呆症研究的例子。
    定义一个明确的估计,包括目标人群,对于评估泛化偏差或对撞机分层偏差是否对推论构成威胁至关重要。差异研究中的选择偏差可能来自抽样策略,微分夹杂物管道,后续损失,和竞争事件。如果发生竞争事件,可以在不同的假设下估计几个潜在相关的估计,不同的解释。视差的表观幅度可以基于所选择的估计和而实质上不同。如果不是基于已知的抽样方案,随机和观察性研究都可能歪曲健康差异或治疗效果的异质性。
    研究人员最近在与选择偏差相关的概念化和方法方面取得了实质性进展。这一进展将提高描述性和因果健康差异研究的相关性。
    UNASSIGNED: To summarize recent literature on selection bias in disparities research addressing either descriptive or causal questions, with examples from dementia research.
    UNASSIGNED: Defining a clear estimand, including the target population, is essential to assess whether generalizability bias or collider-stratification bias are threats to inferences. Selection bias in disparities research can result from sampling strategies, differential inclusion pipelines, loss to follow-up, and competing events. If competing events occur, several potentially relevant estimands can be estimated under different assumptions, with different interpretations. The apparent magnitude of a disparity can differ substantially based on the chosen estimand. Both randomized and observational studies may misrepresent health disparities or heterogeneity in treatment effects if they are not based on a known sampling scheme.
    UNASSIGNED: Researchers have recently made substantial progress in conceptualization and methods related to selection bias. This progress will improve the relevance of both descriptive and causal health disparities research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    评估减肥手术治疗对糖尿病控制的随机对照试验(RCT)的外部有效性。
    多位点随机对照试验提供了最有力的证据支持临床治疗,并具有最大的内部有效性。然而,试验参与者的特征可能不能代表现实世界中接受治疗的患者.需要评估RCT的结果如何推广到正在接受治疗的所有当代患者群体。
    2018年1月8日至2023年5月19日在加州大学洛杉矶分校(UCLA)接受袖状胃切除术的所有患者均具有基线特征,体重变化,与参加手术治疗和药物可能有效根除糖尿病(STAMPEDE)和糖尿病手术研究(DSS)减重手术对糖尿病控制的影响的RCTs相比。比较了符合和不符合这些随机对照试验进入标准的UCLA患者的体重减轻和糖尿病控制。
    387例糖尿病患者中只有65例(17%)符合STAMPEDE的资格标准,29人(7.5%)因年龄较大而符合DSS标准,具有较高的体重指数,降低HbA1c。UCLA患者的体重减轻比RCT患者略少,但糖尿病控制相似。313名(81%)不符合进入任一RCT研究条件的患者与符合RCT条件的患者具有相似的长期糖尿病控制。
    尽管接受减肥手术的患者中只有很小一部分符合两项主要随机对照试验的资格标准,这一当代队列中的大多数患者具有相似的结局.来自STAMPEDE和DSS的糖尿病结果普遍适用于大多数接受减肥手术以控制糖尿病的患者。
    UNASSIGNED: To assess the external validity of randomized controlled trials (RCTs) of bariatric surgical treatment on diabetes control.
    UNASSIGNED: Multisite RCTs provide the strongest evidence supporting clinical treatments and have the greatest internal validity. However, characteristics of trial participants may not be representative of patients receiving treatment in the real world. There is a need to assess how the results of RCTs generalize to all contemporary patient populations undergoing treatments.
    UNASSIGNED: All patients undergoing sleeve gastrectomy at University of California Los Angeles (UCLA) between January 8, 2018 and May 19, 2023 had their baseline characteristics, weight change, and diabetes control compared with those enrolled in the surgical treatment and medications potentially eradicate diabetes efficiently (STAMPEDE) and diabetes surgery study (DSS) RCTs of bariatric surgery\'s effect on diabetes control. Weight loss and diabetes control were compared between UCLA patients who did and did not fit the entry criteria for these RCTs.
    UNASSIGNED: Only 65 (17%) of 387 patients with diabetes fulfilled the eligibility criteria for STAMPEDE, and 29 (7.5%) fulfilled the criteria for DSS due to being older, having higher body mass index, and lower HbA1c. UCLA patients experienced slightly less weight loss than patients in the RCTs but had similar diabetes control. The 313 (81%) patients not eligible for study entry into either RCT had similar long-term diabetes control as those who were eligible for the RCTs.
    UNASSIGNED: Even though only a very small proportion of patients undergoing bariatric surgery met the eligibility criteria for the 2 major RCTs, most patients in this contemporary cohort had similar outcomes. Diabetes outcomes from STAMPEDE and DSS generalize to most patients undergoing bariatric surgery for diabetes control.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    当从一般人群中分析选定的样本时,相对于一般人群的因果平均治疗效应(ATE),可能会出现选择偏差,并且还相对于所选样品本身的ATE。我们提供了简单的图形规则,表明:(1)对于每个ATE,所选样本分析是否无偏;(2)调整某些协变量是否可以消除选择偏差。可以在标准的单世界干预图中轻松检查规则。当治疗可能影响选择时,潜在科学兴趣的第三个估计是“净治疗差异”,即,如果普通人群的所有成员都接受治疗与不接受治疗,所选样本将发生的结果净变化,包括所选样本中个体的任何治疗效果。我们还为此估计提供了图形规则。我们将选定样本分析中相对于一般人群ATE的偏倚分解为:(1)相对于净治疗差异的“内部偏倚”;(2)“净外部偏倚”,净治疗差异与一般人群ATE之间存在差异。每个偏见都可以通过不同的图形规则进行明确的评估,为某些因果结构产生选择偏差的机制提供新的概念性见解。
    When analyzing a selected sample from a general population, selection bias can arise relative to the causal average treatment effect (ATE) for the general population, and also relative to the ATE for the selected sample itself. We provide simple graphical rules that indicate: (1) if a selected-sample analysis will be unbiased for each ATE; (2) whether adjusting for certain covariates could eliminate selection bias. The rules can easily be checked in a standard single-world intervention graph. When the treatment could affect selection, a third estimand of potential scientific interest is the \"net treatment difference\", namely the net change in outcomes that would occur for the selected sample if all members of the general population were treated versus not treated, including any effects of the treatment on which individuals are in the selected sample . We provide graphical rules for this estimand as well. We decompose bias in a selected-sample analysis relative to the general-population ATE into: (1) \"internal bias\" relative to the net treatment difference; (2) \"net-external bias\", a discrepancy between the net treatment difference and the general-population ATE. Each bias can be assessed unambiguously via a distinct graphical rule, providing new conceptual insight into the mechanisms by which certain causal structures produce selection bias.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号