Selection Bias

选择偏差
  • 文章类型: Journal Article
    电子健康记录(EHR)越来越被认为是临床研究中招募患者的一种具有成本效益的资源。然而,如何从数百万个体中最佳地选择一个队列来回答一个感兴趣的科学问题仍不清楚.考虑一项研究来估计昂贵结果的平均值或平均差。预测结果的廉价辅助协变量通常可以在患者的健康记录中获得,提供了一个有选择地招募病人的机会,这可能会提高下游分析的效率。在本文中,我们提出了一种两阶段采样设计,该设计利用了EHR数据中辅助协变量的可用信息。使用EHR数据进行多相采样的一个关键挑战是潜在的选择偏差,因为EHR数据不一定代表目标人群。扩展有关两阶段采样设计的现有文献,我们得出了一种最佳的两阶段抽样方法,该方法可以提高随机抽样的效率,同时考虑到EHR数据中潜在的选择偏差。我们通过模拟研究和利用密歇根基因组学计划的数据评估美国成年人高血压患病率的应用,证明了我们的采样设计的效率提高。密歇根医学的纵向生物储物库。
    Electronic health records (EHRs) are increasingly recognized as a cost-effective resource for patient recruitment in clinical research. However, how to optimally select a cohort from millions of individuals to answer a scientific question of interest remains unclear. Consider a study to estimate the mean or mean difference of an expensive outcome. Inexpensive auxiliary covariates predictive of the outcome may often be available in patients\' health records, presenting an opportunity to recruit patients selectively, which may improve efficiency in downstream analyses. In this paper we propose a two-phase sampling design that leverages available information on auxiliary covariates in EHR data. A key challenge in using EHR data for multiphase sampling is the potential selection bias, because EHR data are not necessarily representative of the target population. Extending existing literature on two-phase sampling design, we derive an optimal two-phase sampling method that improves efficiency over random sampling while accounting for the potential selection bias in EHR data. We demonstrate the efficiency gain from our sampling design via simulation studies and an application evaluating the prevalence of hypertension among U.S. adults leveraging data from the Michigan Genomics Initiative, a longitudinal biorepository in Michigan Medicine.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:疾病潜伏期定义为从疾病开始到疾病诊断的时间。疾病潜伏期偏倚(DLB)可能出现在流行病学研究,检查潜在的结果,由于疾病开始的确切时间是未知的,可能发生在暴露开始之前,可能导致偏见。虽然DLB可以影响流行病学研究,检查不同类型的慢性疾病(如阿尔茨海默病,癌症等),以前尚未阐明DLB在这些研究中引入偏倚的方式.关于偏见的特定类型的信息,和它们的结构,这可能是DLB的次要原因对研究人员来说至关重要,以便更好地理解和控制DLB。
    方法:在这里,我们描述了DLB可以将偏倚(通过不同的结构)引入流行病学研究以解决潜在结果的四种情况。使用有向无环图(DAG)。我们还讨论了潜在的策略,以更好地理解,在这些研究中检查和控制DLB。
    结论:使用因果图,我们发现疾病潜伏期偏倚可以通过以下方式影响流行病学研究的结果:(i)未测量的混杂因素;(ii)反向因果关系;(iii)选择偏倚;(iv)介体偏倚.
    结论:疾病潜伏期偏倚是一种重要的偏倚,可影响许多针对潜在结局的流行病学研究。因果图可以帮助研究人员更好地识别和控制这种偏见。
    BACKGROUND: Disease latency is defined as the time from disease initiation to disease diagnosis. Disease latency bias (DLB) can arise in epidemiological studies that examine latent outcomes, since the exact timing of the disease inception is unknown and might occur before exposure initiation, potentially leading to bias. Although DLB can affect epidemiological studies that examine different types of chronic disease (e.g. Alzheimer\'s disease, cancer etc), the manner by which DLB can introduce bias into these studies has not been previously elucidated. Information on the specific types of bias, and their structure, that can arise secondary to DLB is critical for researchers, to enable better understanding and control for DLB.
    METHODS: Here we describe four scenarios by which DLB can introduce bias (through different structures) into epidemiological studies that address latent outcomes, using directed acyclic graphs (DAGs). We also discuss potential strategies to better understand, examine and control for DLB in these studies.
    CONCLUSIONS: Using causal diagrams, we show that disease latency bias can affect results of epidemiological studies through: (i) unmeasured confounding; (ii) reverse causality; (iii) selection bias; (iv) bias through a mediator.
    CONCLUSIONS: Disease latency bias is an important bias that can affect a number of epidemiological studies that address latent outcomes. Causal diagrams can assist researchers better identify and control for this bias.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    长期以来,选择偏见一直是流行病学和其他领域方法论讨论的中心。在流行病学中,随着时间的推移,选择偏差的概念一直在不断演变。在本期杂志中,Mathur和Shpitser(AmJEpidemium。XXXX;XXX(XX):XXXX-XXXX)提出了使用单一世界干预图(SWIG)评估一般人群和选定样本中治疗效果时选择偏倚的存在的简单图形规则。值得注意的是,作者检查了治疗影响选择的设置,在现有的关于选择偏见的文献中,这是一个没有得到很好解决的问题。要将Mathur和Shpitser的作品放在上下文中,我们回顾了流行病学中选择偏差概念的演变,主要关注自将因果有向无环图(DAG)引入流行病学研究以来的过去20-30年的发展。
    Selection bias has long been central in methodological discussions across epidemiology and other fields. In epidemiology, the concept of selection bias has been continually evolving over time. In this issue of the Journal, Mathur and Shpitser (Am J Epidemiol. XXXX;XXX(XX):XXXX-XXXX) present simple graphical rules for using a Single World Intervention Graph (SWIG) to assess the presence of selection bias when estimating treatment effects in both the general population and a selected sample. Notably, the authors examine the setting in which the treatment affects selection, an issue not well-addressed in the existing literature on selection bias. To place the work by Mathur and Shpitser in context, we review the evolution of the concept of selection bias in epidemiology, with a primary focus on the developments in the last 20-30 years since the introduction of causal directed acyclic graphs (DAGs) to epidemiologic research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    我们的研究旨在建立随机对照试验(RCT)中的选择偏倚风险,根据Cochrane的“偏倚风险”,这些试验总体上被评为具有“低偏倚”风险。版本2(RoB2)工具。对当前RCT的系统评价进行了系统的文献检索。从确定的评论中,提取总体“高偏倚”和“低偏倚”RoB2风险评级的随机对照试验。对所有RCT进行选择偏倚风险的统计学检验。从测试结果来看,真积极,正负,假阳性,或者建立了假阴性评级,并以95%的置信区间(CI)计算误报率(FOR)。通过计算RoB2域1评分的负似然比(-LR)进行亚组分析:随机化过程引起的偏倚。确定并测试了总共1070个已发表的RCT(中位发表年份:2018年;四分位数范围:2013-2020年)。我们发现,所有“低偏倚”(RoB2)级RCT中有7.61%具有高选择偏倚风险(FOR7.61%;95%CI:6.31%-9.14%),并且“低偏倚”(RoB2域1)级RCT中高选择偏倚风险的可能性比低选择偏倚风险的高6%(-LR:1.06;95%CI:0.98)。这些发现提出了有关使用Cochrane的RoB2工具进行“低偏倚”风险评级的有效性以及最近发表的RCT的一些结果的有效性的问题。我们的结果还表明,“低偏倚”风险评估的临床证据实际上没有偏倚的可能性很低,这种概括基于有限的,预先指定的一套评估标准可能无法证明这些证据反映了真实的治疗效果。
    Our study aimed to establish the risk of selection bias in randomized controlled trials (RCT) that were overall rated as having \"low bias\" risk according to Cochrane\'s Risk of Bias, version 2 (RoB 2) tool. A systematic literature search of current systematic reviews of RCTs was conducted. From the identified reviews, RCTs with overall \"high bias\" and \"low bias\" RoB 2 risk ratings were extracted. All RCTs were statistically tested for selection bias risk. From the test results, true positive, true negative, false positive, or false negative ratings were established, and the false omission rate (FOR) with a 95% confidence interval (CI) was computed. Subgroup analysis was conducted by computing the negative likelihood ratio (-LR) concerning RoB 2 domain 1 ratings: bias arising from the randomization process. A total of 1070 published RCTs (median publication year: 2018; interquartile range: 2013-2020) were identified and tested. We found that 7.61% of all \"low bias\" (RoB 2)-rated RCTs were of high selection bias risk (FOR 7.61%; 95% CI: 6.31%-9.14%) and that the likelihood for high selection bias risk in \"low bias\" (RoB 2 domain 1)-rated RCTs was 6% higher than that for low selection bias risk (-LR: 1.06; 95% CI: 0.98-1.15). These findings raise issues about the validity of \"low bias\" risk ratings using Cochrane\'s RoB 2 tool as well as about the validity of some of the results from recently published RCTs. Our results also suggest that the likelihood of a \"low bias\" risk-rated body of clinical evidence being actually bias-free is low, and that generalization based on a limited, pre-specified set of appraisal criteria may not justify a high level of confidence that such evidence reflects the true treatment effect.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    观察性研究很少能代表其目标人群,因为有已知和未知的因素会影响个体的参与选择(选择机制)。如果结果与选择有关(以模型中的其他变量为条件),则选择会导致给定分析中的偏差。在实践中检测和调整选择偏差通常需要访问非选择个体的数据。这里,我们提出了在遗传研究中检测选择偏倚的方法,方法是将选择样本中的遗传变异与未选择下预期的遗传变异之间的相关性进行比较。我们检查了使用四个假设检验来识别所选样本中遗传变异之间的诱导关联。我们在蒙特卡罗模拟中评估了这些方法。最后,我们在一个应用例子中使用这些方法,使用来自英国生物银行(UKBB)的数据。拟议的测试表明,饮酒与选择UKBB之间存在关联。因此,以饮酒作为暴露或结果的UKBB分析可能会受到这种选择的偏见。
    Observational studies are rarely representative of their target population because there are known and unknown factors that affect an individual\'s choice to participate (the selection mechanism). Selection can cause bias in a given analysis if the outcome is related to selection (conditional on the other variables in the model). Detecting and adjusting for selection bias in practice typically requires access to data on nonselected individuals. Here, we propose methods to detect selection bias in genetic studies by comparing correlations among genetic variants in the selected sample to those expected under no selection. We examine the use of four hypothesis tests to identify induced associations between genetic variants in the selected sample. We evaluate these approaches in Monte Carlo simulations. Finally, we use these approaches in an applied example using data from the UK Biobank (UKBB). The proposed tests suggested an association between alcohol consumption and selection into UKBB. Hence, UKBB analyses with alcohol consumption as the exposure or outcome may be biased by this selection.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:随机对照试验是确定药物治疗效果的金标准。为了阻止有害的做法,如p-hacking和在结果已知后的假设,任何亚组分析和次要结局必须记录在案并预先指定.然而,他们仍然可以引入偏倚(和常规),如果他们不考虑相同的主要分析。
    方法:我们使用已发表的随机试验和因果有向无环图(DAG)描述了影响亚组和次要结局分析的几种偏倚来源。
    结果:我们使用RECOVERY和START试验来阐明亚组和次要结局分析中偏倚的来源。如果对于任何给定的亚组分析,对于主要分析,不寻求预后变量的分布,则可能会发生机会失衡。预后变量的这种差异分布也可以出现在次要结果的分析中。如果亚组变量与留在试验中存在因果关系,则可能会出现选择偏差。给定的后续损失通常不会在分组中解决,在这些情况下,磨损偏差可能会被忽视。在任何情况下,解决方案是对这些分析采取与我们对主要分析相同的考虑。
    结论:可以根据亚组或次要结局分析的结果批准治疗和临床决策。因此,重要的是给予他们与主要分析相同的治疗,以避免可预防的偏见。
    BACKGROUND: Randomized controlled trials are the gold standard for determining treatment efficacy in medicine. To deter harmful practices such as p-hacking and hypothesizing after the results are known, any analysis of subgroups and secondary outcomes must be documented and pre-specified. However, they can still introduce bias (and routinely do) if they are not treated with the same consideration as the primary analysis.
    METHODS: We describe several sources of bias that affect subgroup and secondary outcome analyses using published randomized trials and causal directed acyclic graphs (DAGs).
    RESULTS: We use the RECOVERY and START trials to elucidate sources of bias in analyses of subgroups and secondary outcomes. Chance imbalance can occur if the distribution of prognostic variables is not sought for any given subgroup analysis as for the main analysis. This differential distribution of prognostic variables can also occur in analyses of secondary outcomes. Selection bias can occur if the subgroup variable is causally related to staying in the trial. Given loss to follow up is not normally addressed in subgroups, attrition bias can pass unnoticed in these cases. In every case, the solution is to take the same considerations for these analyses as we do for primary analyses.
    CONCLUSIONS: Approval of treatments and clinical decisions can occur based on results from subgroup or secondary outcome analyses. Thus, it is important to give them the same treatment as primary analyses to avoid preventable biases.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    选择偏差是流行病学研究中普遍关注的问题。在文学中,选择偏差通常被视为缺失数据的问题。流行的方法来调整由于数据缺失导致的偏差,例如逆概率加权,依赖于数据随机缺失的假设,如果违反了这个假设,可能会产生有偏差的结果。在结果数据不是随机缺失的观察性研究中,Heckman的样本选择模型可用于调整数据缺失导致的偏差。在本文中,我们回顾了Heckman的方法和TchetgenTchetgen和Wirth(2017)提出的类似方法。然后,我们讨论如何使用个体水平的数据将这些方法应用于孟德尔随机化分析,缺少暴露或结果或两者的数据。我们探索与参与相关的遗传变异是否可以用作选择工具。然后,我们描述了如何获得错误调整的Wald比率,两阶段最小二乘和逆方差加权估计。在仿真中对这两种方法进行了评估和比较,结果表明,它们都可以减轻选择偏差,但在某些情况下可能会产生具有较大标准误差的参数估计。在一个说明性的真实数据应用程序中,我们使用来自Avon父母和儿童纵向研究的数据,调查体重指数对吸烟的影响.
    Selection bias is a common concern in epidemiologic studies. In the literature, selection bias is often viewed as a missing data problem. Popular approaches to adjust for bias due to missing data, such as inverse probability weighting, rely on the assumption that data are missing at random and can yield biased results if this assumption is violated. In observational studies with outcome data missing not at random, Heckman\'s sample selection model can be used to adjust for bias due to missing data. In this paper, we review Heckman\'s method and a similar approach proposed by Tchetgen Tchetgen and Wirth (2017). We then discuss how to apply these methods to Mendelian randomization analyses using individual-level data, with missing data for either the exposure or outcome or both. We explore whether genetic variants associated with participation can be used as instruments for selection. We then describe how to obtain missingness-adjusted Wald ratio, two-stage least squares and inverse variance weighted estimates. The two methods are evaluated and compared in simulations, with results suggesting that they can both mitigate selection bias but may yield parameter estimates with large standard errors in some settings. In an illustrative real-data application, we investigate the effects of body mass index on smoking using data from the Avon Longitudinal Study of Parents and Children.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    同意偏见是生物医学研究中的一种选择偏见,其中同意研究的人与不同意的人系统不同。它在精准医学研究中特别重要,因为这些研究的复杂性阻碍了某些亚组的理解,信任,并同意这项研究。因为同意偏见扭曲了研究结果,并导致研究利益的不公平分配,学者们提出了两种减少同意偏见的方案:改革现有的同意模式和完全取消同意要求。这项研究利用现有数据探索了在观察性研究中放弃同意的可能性,因为如果加强隐私保护,与临床试验相比,它们对参与者的风险更小。建议在未经同意的情况下,进行安全性增强和数据保护影响评估等数据保护机制,以保护观察性研究参与者的数据隐私。
    Consent bias is a type of selection bias in biomedical research where those consenting to the research differ systematically from those not consenting. It is particularly relevant in precision medicine research because the complexity of these studies prevents certain subgroups from understanding, trusting, and consenting to the research. Because consent bias distorts research findings and causes inequitable distribution of research benefits, scholars propose two types of schemes to reduce consent bias: reforming existing consent models and removing the consent requirement altogether. This study explores the possibility of waiving consent in observational studies using existing data, because they involve fewer risks to participants than clinical trials if privacy safeguards are strengthened. It suggests that data protection mechanisms such as security enhancement and data protection impact assessment should be conducted to protect data privacy of participants in observational studies without consent.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:在基于人群的研究中,怀孕可能是一个重复的事件。尽管已经发布了关于如何解决同一个人重复怀孕的指南,在围产期流行病学研究中观察到各种方法。虽然其中一些方法得到了所选研究问题的支持,其他是给定数据集固有约束的结果(例如,缺少奇偶校验信息)。这些决定决定了如何恰当地回答给定的研究问题以及总体的可概括性。
    目的:通过评估两种围产期结局的患病率及其与临床和社会独立变量的关联,比较围产期流行病学研究中常用的队列选择和分析方法研究设计:使用与单胎分娩产妇出院记录相关的生命记录,我们创建了4个队列:(1)全产(2)每个人随机选择1个出生(3)每个人首次观察到的出生(4)初产(胎次1).出生抽样不以集群为条件(即,我们没有对给定母亲的所有出生进行采样,而是采样个体出生)。研究结果是严重的产妇发病率和先兆子痫/子痫,自变量为自我报告的种族/民族(作为社会因素)和系统性红斑狼疮.比较四个队列,我们评估了产妇特征的分布,结果的普遍性,总体上按平价分层,以及结果与自变量关联的风险比。在所有出生的人中,然后,我们比较了三种分析策略的风险比:标准推断假设模型中的同一母亲独立抽样出生,使用聚类鲁棒推理,并调整平价。
    结果:我们观察到所有出生者之间的人口特征差异很小(N=2,736,693),随机选择,和首次观察到的出生队列(均为N=2,284,660),这些队列与初产出生队列之间的差异更大(N=1,054,684)。结果患病率在所有分娩中始终最低,在初产分娩中最高(例如,初产妇中每1,000例产妇的严重发病率为18.9例。在所有分娩中,每1000个分娩16.6个)。当按平价分层时,在两种结局中,产次2的新生儿结局患病率始终最低,产次1的新生儿结局患病率最高.所有四个队列研究结果的风险比不同,初产出生队列和其他队列之间的差异最明显。在所有出生的人中,稳健推理对估计的置信界限的影响最小,与标准推断相比,即,粗略估计(例如,狼疮重度孕产妇发病率关联:4.01,95%CI3.54-4.55vs.4.01,粗估计值95%CI3.53-4.56),在调整平价时,估计略有偏移,朝向严重孕产妇发病率的无效,远离先兆子痫/子痫的无效。
    结论:研究人员应该考虑他们使用的方法之间的一致性,他们的抽样策略,和他们的研究问题。这可能包括完善研究问题,以更好地匹配可用数据的推断,考虑到替代数据源,并适当注意数据限制和由此产生的偏差,以及调查结果的普遍性。如果奇偶校验是一个既定的效果修饰符,应提供分层结果。
    BACKGROUND: In population-based research, pregnancy may be a repeated event. Despite published guidance on how to address repeated pregnancies to the same individual, a variety of approaches are observed in perinatal epidemiological studies. While some of these approaches are supported by the chosen research question, others are consequences of constraints inherent to a given dataset (eg, missing parity information). These decisions determine how appropriately a given research question can be answered and overall generalizability.
    OBJECTIVE: To compare common cohort selection and analytic approaches used for perinatal epidemiological research by assessing the prevalence of two perinatal outcomes and their association with a clinical and a social independent variable.
    METHODS: Using vital records linked to maternal hospital discharge records for singleton births, we created four cohorts: (1) all-births (2) randomly selected one birth per individual (3) first-observed birth per individual (4) primiparous-births (parity 1). Sampling of births was not conditional on cluster (ie, we did not sample all births by a given mother, but rather sampled individual births). Study outcomes were severe maternal morbidity (SMM) and preeclampsia/eclampsia, and the independent variables were self-reported race/ethnicity (as a social factor) and systemic lupus erythematosus. Comparing the four cohorts, we assessed the distribution of maternal characteristics, the prevalence of outcomes, overall and stratified by parity, and risk ratios (RR) for the associations of outcomes with independent variables. Among all-births, we then compared RR from three analytic strategies: with standard inference that assumes independently sampled births to the same mother in the model, with cluster-robust inference, and adjusting for parity.
    RESULTS: We observed minor differences in the population characteristics between the all-birth (N=2736,693), random-selection, and first-observed birth cohorts (both N=2284,660), with more substantial differences between these cohorts and the primiparous-births cohort (N=1054,684). Outcome prevalence was consistently lowest among all-births and highest among primiparous-births (eg, SMM 18.9 per 1000 births among primiparous-births vs 16.6 per 1000 births among all-births). When stratified by parity, outcome prevalence was always the lowest in births of parity 2 and highest among births of parity 1 for both outcomes. RR differed for study outcomes across all four cohorts, with the most pronounced differences between the primiparous-birth cohort and other cohorts. Among all-births, robust inference minimally impacted the confidence bounds of estimates, compared to the standard inference, that is, crude estimates (eg, lupus-SMM association: 4.01, 95% confidence intervals [CI] 3.54-4.55 vs 4.01, 95% CI 3.53-4.56 for crude estimate), while adjusting for parity slightly shifted estimates, toward the null for SMM and away from the null for preeclampsia/eclampsia.
    CONCLUSIONS: Researchers should consider the alignment between the methods they use, their sampling strategy, and their research question. This could include refining the research question to better match inference possible for available data, considering alternative data sources, and appropriately noting data limitations and resulting bias, as well as the generalizability of findings. If parity is an established effect modifier, stratified results should be presented.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号