Partial identification

部分识别
  • 文章类型: Journal Article
    在因果调解分析中,自然间接效应的非参数识别通常依赖于,除了没有未观察到的暴露前混淆,(i)所谓的“跨世界-计数”独立性和(ii)没有暴露引起的混杂因素的基本假设。当中介是二进制的,当没有做出任何假设时,已经给出了部分识别的界限,或者当仅假设(Ii)时。我们将现有的界限扩展到多体介体的情况,并为仅假设(i)的情况提供界限。我们将这些界限应用于尼日利亚哈佛PEPFAR计划的数据,我们评估抗逆转录病毒治疗对病毒学失败的影响是由患者的依从性介导的程度,并表明对这种效应的推断对模型假设有些敏感。
    In causal mediation analysis, nonparametric identification of the natural indirect effect typically relies on, in addition to no unobserved pre-exposure confounding, fundamental assumptions of (i) so-called \"cross-world-countterfactuals\" independence and (ii) no exposure-induced confounding. When the mediator is binary, bounds for partial identification have been given when neither assumption is made, or alternatively when assuming only (ii). We extend existing bounds to the case of a polytomous mediator, and provide bounds for the case assuming only (i). We apply these bounds to data from the Harvard PEPFAR program in Nigeria, where we evaluate the extent to which the effects of antiretroviral therapy on virological failure are mediated by a patient\'s adherence, and show that inference on this effect is somewhat sensitive to model assumptions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    我们使用来自全国青少年与成人健康纵向研究的数据来调查高等教育质量-通过大学选择性来衡量-是否会影响中等年龄(24-34岁)和长期(大约10年后)的肥胖患病率。我们使用部分识别方法,这让我们,虽然依赖于薄弱的假设,为了克服大学选择性的潜在内生性以及由于学生之间的相互作用而导致的对稳定单位治疗价值假设的潜在违反,并获得大学选择性对肥胖的平均治疗效果的信息识别区域。我们发现,上一所更具选择性的大学会减少肥胖,无论是在中期还是从长远来看。我们提供的证据表明,大学选择性对肥胖的影响的机制包括收入的增加,减少身体活动以及快餐和甜味饮料的消费。
    We use data from the National Longitudinal Study of Adolescent to Adult Health to investigate whether the quality of tertiary education -measured by college selectivity-causally affects obesity prevalence in the medium run (by age 24-34) and in the longer run (about 10 years later). We use partial identification methods, which allow us, while relying on weak assumptions, to overcome the potential endogeneity of college selectivity as well as the potential violation of the stable unit treatment value assumption due to students interacting with each other, and to obtain informative identification regions for the average treatment effect of college selectivity on obesity. We find that attending a more selective college causally reduces obesity, both in the medium and in the longer run. We provide evidence that the mechanisms through which the impact of college selectivity on obesity operates include an increase in income, a reduction in physical inactivity and in the consumption of fast food and sweetened drinks.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    使用人口和健康调查(DHS)数据进行的HIV估计受到无应答和拒绝测试的限制。诸如插补之类的常规调整要求数据随机丢失。使用工具变量的方法允许受访者和非受访者之间的患病率不同,但是它们的性能在很大程度上取决于工具的有效性。使用Manski的部分识别方法,我们从一组候选工具中形成HIV流行的工具变量界限.我们的方法不要求所有候选工具都有效。我们使用模拟研究来评估和比较我们的方法与竞争对手。我们使用来自赞比亚的国土安全部数据说明了所提出的方法,马拉维和肯尼亚。我们的模拟表明,即使在轻度违反非随机错误的情况下,归因也会导致严重偏差的结果。使用不对无响应机制做出假设的最坏情况识别界限是稳健的,但不是信息性的。通过采用工具变量边界的联合来平衡边界的信息性和稳健性,以包含一些无效的工具。无反应和拒绝在基于人群的HIV数据中普遍存在,例如在DHS下收集的数据。部分识别界限为HIV流行率估计提供了可靠的解决方案,而无需进行强有力的假设。在不牺牲信誉的情况下,工会界限比最坏的界限信息丰富得多。
    HIV estimation using data from the demographic and health surveys (DHS) is limited by the presence of non-response and test refusals. Conventional adjustments such as imputation require the data to be missing at random. Methods that use instrumental variables allow the possibility that prevalence is different between the respondents and non-respondents, but their performance depends critically on the validity of the instrument. Using Manski\'s partial identification approach, we form instrumental variable bounds for HIV prevalence from a pool of candidate instruments. Our method does not require all candidate instruments to be valid. We use a simulation study to evaluate and compare our method against its competitors. We illustrate the proposed method using DHS data from Zambia, Malawi and Kenya. Our simulations show that imputation leads to seriously biased results even under mild violations of non-random missingness. Using worst case identification bounds that do not make assumptions about the non-response mechanism is robust but not informative. By taking the union of instrumental variable bounds balances informativeness of the bounds and robustness to inclusion of some invalid instruments. Non-response and refusals are ubiquitous in population based HIV data such as those collected under the DHS. Partial identification bounds provide a robust solution to HIV prevalence estimation without strong assumptions. Union bounds are significantly more informative than the worst case bounds without sacrificing credibility.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    捕获-再捕获(CRC)调查用于估计无法直接列举其成员的人口规模。CRC调查已用于估计2019年冠状病毒病(COVID-19)感染的数量,吸毒的人,性工作者,冲突伤亡,贩运受害者。当获得k-捕获样本时,样本子集中的单位捕获计数自然由2k列联表表示,其中一个元素-没有一个样本中出现的个体数量-仍未观察到。在没有额外假设的情况下,人口规模是不可识别的(即,点识别)。关于样本之间的依赖性的严格假设通常用于实现点识别。然而,真实世界的CRC调查通常使用便利样本,在这些样本中,假设的依赖性不能得到保证,在这些假设下的人口规模估计可能缺乏经验可信度。在这项工作中,我们应用部分识别理论来表明,关于样本之间依赖性质的弱假设或定性知识可以用来表征真实种群大小的非平凡置信集。我们使用两种方法在成对捕获概率的界限下构造置信集:测试反演自举置信区间和轮廓似然置信区间。仿真结果表明,每种方法都有很好的校准置信度集。在一项广泛的现实世界研究中,我们将新方法应用于使用异质调查数据来估计布鲁塞尔注射毒品的人数的问题,比利时。
    Capture-recapture (CRC) surveys are used to estimate the size of a population whose members cannot be enumerated directly. CRC surveys have been used to estimate the number of Coronavirus Disease 2019 (COVID-19) infections, people who use drugs, sex workers, conflict casualties, and trafficking victims. When k-capture samples are obtained, counts of unit captures in subsets of samples are represented naturally by a 2k contingency table in which one element-the number of individuals appearing in none of the samples-remains unobserved. In the absence of additional assumptions, the population size is not identifiable (i.e., point identified). Stringent assumptions about the dependence between samples are often used to achieve point identification. However, real-world CRC surveys often use convenience samples in which the assumed dependence cannot be guaranteed, and population size estimates under these assumptions may lack empirical credibility. In this work, we apply the theory of partial identification to show that weak assumptions or qualitative knowledge about the nature of dependence between samples can be used to characterize a nontrivial confidence set for the true population size. We construct confidence sets under bounds on pairwise capture probabilities using two methods: test inversion bootstrap confidence intervals and profile likelihood confidence intervals. Simulation results demonstrate well-calibrated confidence sets for each method. In an extensive real-world study, we apply the new methodology to the problem of using heterogeneous survey data to estimate the number of people who inject drugs in Brussels, Belgium.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    尽管已知精神健康状况与社会经济困难有关,它们的因果效应在很大程度上仍未被探索。在国家健康访谈调查(NHIS)中使用低收入家庭的样本,我们评估了严重精神疾病(SMI)和相关精神健康状况对家庭食品安全的因果影响.我们在统一的框架中应用部分识别方法来解决基本的内生性和测量识别问题。要实现这些方法,我们将NHIS中SMI的代理测量与物质滥用和心理健康服务管理局对SMI真实比率的估计相结合。当只有自我报告的患病率可用时,我们还开发了一种创新的方法来近似真实的患病率。对潜在的粮食安全结果应用相对较弱的单调性假设,我们发现,缓解SMI将使粮食安全率提高至少9.5个百分点,或15%。JEL代码:C21,I10,I38。
    Although mental health conditions are known to be associated with socioeconomic hardships, their causal effects remain largely unexplored. Using a sample of low-income families in the National Health Interview Survey (NHIS), we assess causal effects of serious mental illness (SMI) and related mental health conditions on family food security. We apply partial identification methods to account for fundamental endogeneity and measurement identification problems in a unified framework. To implement these methods, we combine a proxy measure of SMI in the NHIS with an estimate of the true rate of SMI from the Substance Abuse and Mental Health Services Administration. We also develop an innovative approach to approximate true prevalence rates when only self-reported prevalence rates are available. Applying relatively weak monotonicity assumptions on latent food security outcomes, we find that alleviating SMI would improve the food security rate by at least 9.5 percentage points, or 15 %. JEL codes: C21, I10, I38.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    扩大获得医疗保健的机会通常会导致对以前未被发现的疾病进行新的诊断。新的诊断使得很难确定扩大健康保险对具有特定诊断的个体的因果影响:治疗组中新诊断的患者可能在未观察到的方式上与对照组不同。本文根据研究人员可用的数据和诊断特定知识,提供了两种处理此问题的方法。如果数据没有面板维,然后,根据所讨论的条件,可以从上方或下方对感兴趣的子群的因果效应进行限制。如果面板数据可用,然后可以识别新诊断的人,以及他们的治疗结果从兴趣的整体效果中减去。我应用这些方法发现,不连续性差异估计器低估了Medicare处方药覆盖范围对首次使用者胰岛素摄取的影响20%。
    Expanded access to health care often leads to new diagnoses for previously undetected conditions. New diagnoses make it difficult to identify the causal effect of expanding health insurance on individuals with particular diagnoses: the newly diagnosed in the treatment group are likely to differ in unobserved ways from the control group. This paper provides two methods for dealing with this problem depending on the data available to the researcher and diagnosis-specific knowledge. If there is no panel dimension to the data, then the causal effect for the subgroup of interest can be bounded from either above or below depending on the condition in question. If panel data are available, then the newly diagnosed can be identified, and their treated outcomes subtracted from the overall effect of interest. I apply these methods to find that the difference-in-discontinuities estimator underestimates the effect of Medicare prescription drug coverage on the uptake of insulin by first-time users by 20%.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    许多部分识别问题可以通过集合上的函数的最佳值来表征,其中函数和集合都需要通过经验数据来估计。尽管凸性问题取得了一些进展,在这种一般情况下的统计推断仍有待发展。为了解决这个问题,通过对估计集的适当放松,我们得出了最优值的渐近有效置信区间。然后,我们将此一般结果应用于基于人群的队列研究中的选择偏差问题。我们证明了现有的敏感性分析,这些往往是保守的,很难实施,可以在我们的框架中制定,并通过有关人口的辅助信息提供更多的信息。我们进行了仿真研究,以评估我们的推理程序的有限样本性能,并在高度选择的英国生物银行队列中,以一个实质性的激励例子来总结教育对收入的因果影响。我们证明了我们的方法可以使用合理的总体级别辅助约束来产生信息边界。我们在[公式:seetext]包[公式:seetext]中实现此方法。
    Many partial identification problems can be characterized by the optimal value of a function over a set where both the function and set need to be estimated by empirical data. Despite some progress for convex problems, statistical inference in this general setting remains to be developed. To address this, we derive an asymptotically valid confidence interval for the optimal value through an appropriate relaxation of the estimated set. We then apply this general result to the problem of selection bias in population-based cohort studies. We show that existing sensitivity analyses, which are often conservative and difficult to implement, can be formulated in our framework and made significantly more informative via auxiliary information on the population. We conduct a simulation study to evaluate the finite sample performance of our inference procedure, and conclude with a substantive motivating example on the causal effect of education on income in the highly selected UK Biobank cohort. We demonstrate that our method can produce informative bounds using plausible population-level auxiliary constraints. We implement this method in the [Formula: see text] package [Formula: see text].
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    多脉络相关性是序数数据的常用关联度量。它估计了潜在的相关性,即,潜在向量的相关性。假设这个向量是双变量正常的,一个不可能总是合理的假设。当双变量正态不成立时,多脉络相关性不一定接近真正的潜在相关性,即使观察到的变量有很多类别。当潜在双变量正态不一定为真时,我们计算潜在相关性的可能值集合,但至少潜在的边缘是已知的。得到的集合称为部分识别集,并显示随着类别数量的增加而缩小到真正的潜在相关性。此外,我们在潜在系谱是对称的附加假设下研究部分识别,并计算当一个变量为序数而另一个变量为连续时的部分识别集。我们表明,关于潜在相关性几乎没有什么可说的,除非我们有很多类别,或者我们对潜在向量的分布了解很多。一个开源R包可用于应用我们的结果。
    The polychoric correlation is a popular measure of association for ordinal data. It estimates a latent correlation, i.e., the correlation of a latent vector. This vector is assumed to be bivariate normal, an assumption that cannot always be justified. When bivariate normality does not hold, the polychoric correlation will not necessarily approximate the true latent correlation, even when the observed variables have many categories. We calculate the sets of possible values of the latent correlation when latent bivariate normality is not necessarily true, but at least the latent marginals are known. The resulting sets are called partial identification sets, and are shown to shrink to the true latent correlation as the number of categories increase. Moreover, we investigate partial identification under the additional assumption that the latent copula is symmetric, and calculate the partial identification set when one variable is ordinal and another is continuous. We show that little can be said about latent correlations, unless we have impractically many categories or we know a great deal about the distribution of the latent vector. An open-source R package is available for applying our results.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    本文为分析自我确认政策提供了一个通用框架。我们研究了有关真实随机模型的不完整信息的经常性决策问题中的自我确认均衡。我们在线性二次设置中描述固定货币政策。
    This paper provides a general framework for analyzing self-confirming policies. We study self-confirming equilibria in recurrent decision problems with incomplete information about the true stochastic model. We characterize stationary monetary policies in a linear-quadratic setting.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    作为医疗保健的基本组成部分,疾病筛查非常重要。通常,对特定疾病的两种筛查测试进行比较,以确定最佳筛查策略,例如,直肠指检(DRE)和血清前列腺特异性抗原(PSA)水平用于筛查前列腺癌。理想情况下,如果对每个被筛查的人进行黄金标准测试,以确定他们的真实疾病状态,可以评估两次测试之间的准确性差异。在实践中,然而,通常只有在至少一项筛查测试中测试呈阳性的人才能接受黄金标准测试,通常是侵入性的,由于道德原因,不能应用于两项测试结果均为阴性的人。在这种情况下,无法确定两个测试之间准确性度量差异的估计,因此,在这个框架内的推理问题是具有挑战性的。在这篇文章中,使用灵敏度和特异性作为测试准确性的衡量标准,我们证明了两个测试之间的差异是区间识别的,以可估计的尖锐界限为界。这里,我们利用解决部分识别参数的推理问题的方法,为边界的估计器开发渐近正态,并为差异构造置信区间。通过仿真研究评估了所构造的置信区间的差异及其尖锐界限的性能。我们还将所提出的方法应用于前列腺癌实例,以比较DRE和PSA的准确性。
    As a fundamental component of health care, disease screening is of highly importance. Oftentimes, two screening tests for a specific disease are compared in order to determine an optimal screening policy, for example, the digital rectal examination (DRE) and serum prostate specific antigen (PSA) level for screening prostate cancer. Ideally, if a gold standard test is given to each individual being screened to establish their true disease status, the difference in accuracy measures between two tests can be evaluated. In practice, however, it is common that only individuals who test positive on at least one screening test are to receive gold standard tests, which are often invasive and cannot be applied to those with negative results on both tests due to ethical reasons. Under such circumstances, estimates of the differences in accuracy measures between two tests cannot be determined, thus the inference problem within this framework is challenging. In this article, using sensitivity and specificity as measures of test accuracy, we show that their difference between two tests is interval-identified, as bounded by estimable sharp bounds. Here, we develop the asymptotic normality for the estimators of the bounds and construct confidence intervals for the difference by utilizing the method for solving inference problem for partially identified parameters. The performance of constructed confidence intervals for the difference and their sharp bounds are evaluated via simulation studies. We also apply the proposed method to the prostate cancer example to compare the accuracy of DRE and PSA.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号