directed acyclic graphs

有向无环图
  • 文章类型: Journal Article
    背景:随着流行病学家对因果推理和机器学习的兴趣与日俱增,关于指导协变量选择的因果发现算法的讨论越来越多。我们介绍了因果发现工具的新手应用的案例研究,并尝试根据既定的因果关系验证结果。
    方法:作为案例研究,在冠状动脉药物项目(CDP)数据集的安慰剂组中,我们尝试因果发现与依从性对死亡率的影响相关的关系.我们使用了四种可用作现有软件实现的算法,并改变了几种模型输入。
    结果:我们从17个模型参数中确定了15个调整集。当应用于基线协变量调整分析时,这15个调整集返回的效应估计与先前发表的结果具有相似的偏差大小和方向.当使用方法控制时变混杂因素时,与专家选择的调整集相比,残差偏差通常更多.
    结论:尽管因果发现算法可以与专家知识相提并论,我们不建议新手在没有专家参与因果发现的情况下使用因果发现。建议专家支持以帮助选择算法,选择输入参数,评估基本假设,并最终确定调整变量的选择。
    BACKGROUND: With growing interest in causal inference and machine learning among epidemiologists, there is increasing discussion of causal discovery algorithms for guiding covariate selection. We present a case study of novice application of causal discovery tools and attempt to validate the results against a well-established causal relationship.
    METHODS: As a case study, we attempted causal discovery of relationships relevant to the effect of adherence on mortality in the placebo arm of the Coronary Drug Project (CDP) dataset. We used four algorithms available as existing software implementations and varied several model inputs.
    RESULTS: We identified 15 adjustment sets from 17 model parameterizations. When applied to a baseline covariate adjustment analysis, these 15 adjustment sets returned effect estimates with similar magnitude and direction of bias as prior published results. When using methods to control for time-varying confounding, there was generally more residual bias than compared to expert-selected adjustment sets.
    CONCLUSIONS: Although causal discovery algorithms can perform on par with expert knowledge, we do not recommend novice use of causal discovery without the input of experts in causal discovery. Expert support is recommended to aid in choosing the algorithm, selecting input parameters, assessing underlying assumptions, and finalizing selection of the adjustment variables.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    具有先验知识的因果发现对于提高性能很重要。我们考虑纳入边际因果关系,这对应于因果模型中是否存在有向路径。我们提出了边际先验因果知识PC(MPPC)算法,将边际因果关系纳入基于约束的结构学习算法。通过结合观测数据和边际因果关系,我们提供了条件独立性的定理。我们将MPPC算法与仿真研究和现实网络中的其他结构学习方法进行了比较。结果表明,与其他基于约束的结构学习方法相比,MPPC算法可以融合边际因果关系,更有效,更高效。
    Causal discovery with prior knowledge is important for improving performance. We consider the incorporation of marginal causal relations, which correspond to the presence or absence of directed paths in a causal model. We propose the Marginal Prior Causal Knowledge PC (MPPC) algorithm to incorporate marginal causal relations into a constraint-based structure learning algorithm. We provide the theorems of conditional independence properties by combining observational data and marginal causal relations. We compare the MPPC algorithm with other structure learning methods in both simulation studies and real-world networks. The results indicate that, compare with other constraint-based structure learning methods, MPPC algorithm can incorporate marginal causal relations and is more effective and more efficient.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    暂无摘要。
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    与高温相关的全球健康负担令人严重关切,预计在气候变化下将进一步增加。虽然生理研究已经证明了湿度和温度在加剧人类热应激中的作用,流行病学发现仍然存在冲突。理解热量之间的复杂关系,湿度,湿度和健康结果对于告知适应和推动全球气候变化减缓努力至关重要。本文介绍了“有向无环图”(DAG)作为因果模型,以阐明关注湿热相关健康影响的观察性流行病学研究中的分析复杂性。DAG被用来描述这些研究中经常被忽视的隐含假设,将湿度描述为混杂因素,调解员,或效果修改器。我们还讨论了使用综合指数带来的复杂性,如湿球温度。代表与湿球温度相关的健康影响的DAG有助于理解将湿度的个体影响与湿球温度对健康的感知影响分开的局限性。还讨论了与每个因果假设相对应的回归模型的一般示例。我们的目标不是优先考虑一个因果模型,而是讨论适合代表湿热健康影响的因果模型,并强调选择一个模型而不是另一个模型的含义。WeexpectedthatthearticlewillpavethewayforfuturequantitativestudiesonthetopicandmotivateresearcherstoexplicitlycharacterizetheassumptionsunderstandingtheirmodelswithDAG,促进对调查结果的准确解释。该方法适用于类似的复杂复合事件。
    The global health burden associated with exposure to heat is a grave concern and is projected to further increase under climate change. While physiological studies have demonstrated the role of humidity alongside temperature in exacerbating heat stress for humans, epidemiological findings remain conflicted. Understanding the intricate relationships between heat, humidity, and health outcomes is crucial to inform adaptation and drive increased global climate change mitigation efforts. This article introduces \'directed acyclic graphs\' (DAGs) as causal models to elucidate the analytical complexity in observational epidemiological studies that focus on humid-heat-related health impacts. DAGs are employed to delineate implicit assumptions often overlooked in such studies, depicting humidity as a confounder, mediator, or an effect modifier. We also discuss complexities arising from using composite indices, such as wet-bulb temperature. DAGs representing the health impacts associated with wet-bulb temperature help to understand the limitations in separating the individual effect of humidity from the perceived effect of wet-bulb temperature on health. General examples for regression models corresponding to each of the causal assumptions are also discussed. Our goal is not to prioritize one causal model but to discuss the causal models suitable for representing humid-heat health impacts and highlight the implications of selecting one model over another. We anticipate that the article will pave the way for future quantitative studies on the topic and motivate researchers to explicitly characterize the assumptions underlying their models with DAGs, facilitating accurate interpretations of the findings. This methodology is applicable to similarly complex compound events.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • DOI:
    文章类型: Journal Article
    贝叶斯网络(BNs)以有向无环图(DAG)的形式表示一组随机变量(节点)之间的条件概率关系,并在知识发现中发现了不同的应用。我们研究了从连续观测数据中学习BN的稀疏DAG结构的问题。中心问题可以建模为混合整数程序,其目标函数由凸二次损失函数和受线性约束的正则化惩罚组成。已知此数学程序的最佳解决方案在某些条件下具有所需的统计特性。然而,最先进的优化求解器无法在合理的计算时间内为中等尺寸问题的现有数学公式获得可证明的最佳解决方案。为了解决这个困难,我们从计算和统计的角度来解决这个问题。一方面,我们提出了一个具体的提前停止准则来终止分支定界过程,以获得混合整数程序的近似最优解,并建立此近似解的一致性。另一方面,我们通过用二阶圆锥约束代替表示连续和二元指标变量之间关系的线性“big-M”约束来改进现有公式。我们的数值结果证明了所提出方法的有效性。
    Bayesian Networks (BNs) represent conditional probability relations among a set of random variables (nodes) in the form of a directed acyclic graph (DAG), and have found diverse applications in knowledge discovery. We study the problem of learning the sparse DAG structure of a BN from continuous observational data. The central problem can be modeled as a mixed-integer program with an objective function composed of a convex quadratic loss function and a regularization penalty subject to linear constraints. The optimal solution to this mathematical program is known to have desirable statistical properties under certain conditions. However, the state-of-the-art optimization solvers are not able to obtain provably optimal solutions to the existing mathematical formulations for medium-size problems within reasonable computational times. To address this difficulty, we tackle the problem from both computational and statistical perspectives. On the one hand, we propose a concrete early stopping criterion to terminate the branch-and-bound process in order to obtain a near-optimal solution to the mixed-integer program, and establish the consistency of this approximate solution. On the other hand, we improve the existing formulations by replacing the linear \"big- M \" constraints that represent the relationship between the continuous and binary indicator variables with second-order conic constraints. Our numerical results demonstrate the effectiveness of the proposed approaches.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:在没有完整的源种群的情况下,无法获得用于自我选择偏差校正的逆概率加权(IPW)的经验评估。我们的目标是:(i)调查自我选择如何偏差频率和关联措施,以及(ii)在具有注册链接的队列中使用IPW评估自我选择偏差校正。
    方法:来源人群包括2009-11年间邀请到哥本哈根老龄化和中年生物样本库的17936人(年龄49-63岁)。参与者计数7185(40.1%)。从邀请前7年到2020年底,获得了每个受邀人的注册数据。使用Cox回归模型估计参与者之间的教育和死亡率之间的关联,IPW参与者和来源人群。
    结果:受试者在基线前的社会经济地位较高,医院接触者较少。IPW后参与者的频率测量接近源人群的频率测量。与小学/初中教育相比,高中,短三级,学士和硕士/博士与参与者死亡风险降低相关(调整后风险比[95%CI]:0.60[0.46;0.77],0.68[0.42;1.11],0.37[0.25;0.54],0.28[0.18;0.46],分别)。IPW略微改变了估计值(0.59[0.45;0.77],0.57[0.34;0.93],0.34[0.23;0.50],0.24[0.15;0.39]),但不仅针对源人群的人群(0.57[0.51;0.64],0.43[0.32;0.60],0.38[0.32;0.47],0.22[0.16;0.29])。
    结论:研究参与者的频率测量可能无法反映存在自我选择的来源人群,但对关联措施的影响可能是有限的。IPW可用于(自)选择偏差校正,但是返回的结果仍然可以反映残差或其他偏差和随机误差。
    BACKGROUND: Empirical evaluation of inverse probability weighting (IPW) for self-selection bias correction is inaccessible without the full source population. We aimed to: (i) investigate how self-selection biases frequency and association measures and (ii) assess self-selection bias correction using IPW in a cohort with register linkage.
    METHODS: The source population included 17 936 individuals invited to the Copenhagen Aging and Midlife Biobank during 2009-11 (ages 49-63 years). Participants counted 7185 (40.1%). Register data were obtained for every invited person from 7 years before invitation to the end of 2020. The association between education and mortality was estimated using Cox regression models among participants, IPW participants and the source population.
    RESULTS: Participants had higher socioeconomic position and fewer hospital contacts before baseline than the source population. Frequency measures of participants approached those of the source population after IPW. Compared with primary/lower secondary education, upper secondary, short tertiary, bachelor and master/doctoral were associated with reduced risk of death among participants (adjusted hazard ratio [95% CI]: 0.60 [0.46; 0.77], 0.68 [0.42; 1.11], 0.37 [0.25; 0.54], 0.28 [0.18; 0.46], respectively). IPW changed the estimates marginally (0.59 [0.45; 0.77], 0.57 [0.34; 0.93], 0.34 [0.23; 0.50], 0.24 [0.15; 0.39]) but not only towards those of the source population (0.57 [0.51; 0.64], 0.43 [0.32; 0.60], 0.38 [0.32; 0.47], 0.22 [0.16; 0.29]).
    CONCLUSIONS: Frequency measures of study participants may not reflect the source population in the presence of self-selection, but the impact on association measures can be limited. IPW may be useful for (self-)selection bias correction, but the returned results can still reflect residual or other biases and random errors.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    系统的因果结构对可以由系统产生的变量的联合概率分布施加了约束。原型约束由变量之间的条件独立性组成。然而,特别是在存在隐藏变量的情况下,许多因果结构与从观察变量的边际分布推断的同一组独立性相容。额外的约束允许进一步测试数据与特定因果结构的兼容性。现有的因果信息不等式族比较了变量集合中包含的一组目标变量的信息,将包含在不同组中的信息的总和定义为该集合的子集。虽然以前已经得出了确定这些组分解不等式形式的程序,我们大大扩大了框架的适用性。我们推导了受较弱独立性条件限制的群分解不等式,由于组的配置要求较弱,并另外允许调节集。此外,我们展示了如何使用包含隐藏变量的集合导出具有更高推理能力的约束,然后使用数据处理不等式转换为可测试的约束。为此,我们应用条件互信息的标准数据处理不等式,并得出最近引入的条件唯一信息的度量的类似性质,以分离冗余,协同,以及对一组变量关于目标的信息的独特贡献。
    The causal structure of a system imposes constraints on the joint probability distribution of variables that can be generated by the system. Archetypal constraints consist of conditional independencies between variables. However, particularly in the presence of hidden variables, many causal structures are compatible with the same set of independencies inferred from the marginal distributions of observed variables. Additional constraints allow further testing for the compatibility of data with specific causal structures. An existing family of causally informative inequalities compares the information about a set of target variables contained in a collection of variables, with a sum of the information contained in different groups defined as subsets of that collection. While procedures to identify the form of these groups-decomposition inequalities have been previously derived, we substantially enlarge the applicability of the framework. We derive groups-decomposition inequalities subject to weaker independence conditions, with weaker requirements in the configuration of the groups, and additionally allowing for conditioning sets. Furthermore, we show how constraints with higher inferential power may be derived with collections that include hidden variables, and then converted into testable constraints using data processing inequalities. For this purpose, we apply the standard data processing inequality of conditional mutual information and derive an analogous property for a measure of conditional unique information recently introduced to separate redundant, synergistic, and unique contributions to the information that a set of variables has about a target.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    确定性变量是由一个或多个父变量在功能上确定的变量。它们通常在变量从一个或多个父变量功能创建时出现,与派生变量一样,在成分数据中,其中\'整个\'变量由其\'部分\'确定。本文介绍了如何在有向无环图(DAG)中描述确定性变量,以帮助识别和解释涉及派生变量和/或成分数据的因果效应。我们提出了一种两步法,其中所有变量最初都被考虑,并选择是专注于确定性变量还是其决定父母。在DAG中描述确定性变量会带来一些好处。更容易识别和避免误解同义反复关联,即,确定性变量与其父母之间的自我实现的关联,或在具有共享父母的兄弟变量之间。在组成数据中,更容易理解条件对“整个”变量的影响,并正确识别总体和相对因果效应。对于派生变量,它鼓励更多地考虑目标估计,并更严格地审查一致性和可交换性假设。具有确定性变量的DAG对于规划和解释涉及导出变量和/或组成数据的分析是有用的辅助。
    Deterministic variables are variables that are functionally determined by one or more parent variables. They commonly arise when a variable has been functionally created from one or more parent variables, as with derived variables, and in compositional data, where the \'whole\' variable is determined from its \'parts\'. This article introduces how deterministic variables may be depicted within directed acyclic graphs (DAGs) to help with identifying and interpreting causal effects involving derived variables and/or compositional data. We propose a two-step approach in which all variables are initially considered, and a choice is made whether to focus on the deterministic variable or its determining parents. Depicting deterministic variables within DAGs brings several benefits. It is easier to identify and avoid misinterpreting tautological associations, i.e., self-fulfilling associations between deterministic variables and their parents, or between sibling variables with shared parents. In compositional data, it is easier to understand the consequences of conditioning on the \'whole\' variable, and correctly identify total and relative causal effects. For derived variables, it encourages greater consideration of the target estimand and greater scrutiny of the consistency and exchangeability assumptions. DAGs with deterministic variables are a useful aid for planning and interpreting analyses involving derived variables and/or compositional data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    流行基因调控网络(GRN)构建方法依赖于广义相关分析。然而,在生物系统中,调节本质上是一种因果关系,不能仅仅通过相关性来充分捕获。因此,从因果角度推断GRN更为合理。现有的因果发现算法通常依赖于有向无环图(DAG)来建模因果关系,但是它通常需要遍历整个网络,这导致计算需求随着节点数量的增长而暴涨,并使因果发现算法仅适用于具有一或两百个节点或更少节点的小型网络。在这项研究中,我们提出了SLIVER(cauSaLdIscoveryViaDimensionalityReduction)算法,该算法集成了因果结构方程模型和图分解。SLIVER引入了一组因子节点,作为不同功能模块的抽象,根据各自的功能或途径整合基因之间的调控关系,从而将GRN简化为两个低维矩阵的乘积。随后,我们使用结构因果模型(SCM)来学习基因节点空间内的GRN,在低维空间中实施DAG约束,并通过余弦相似性引导每个因子聚合各种函数。我们评估了SLIVER算法在12个真实的单细胞转录组数据集上的性能,并证明它在GRN推理性能和计算资源使用方面均优于其他12种广泛使用的方法。对因子节点整合的基因信息的分析也证明了GRNs中因子节点的生物学解释。我们将其应用于2型糖尿病的scRNA-seq,以捕获高胰岛素需求下β细胞的转录调控结构变化。
    Prevalent Gene Regulatory Network (GRN) construction methods rely on generalized correlation analysis. However, in biological systems, regulation is essentially a causal relationship that cannot be adequately captured solely through correlation. Therefore, it is more reasonable to infer GRNs from a causal perspective. Existing causal discovery algorithms typically rely on Directed Acyclic Graphs (DAGs) to model causal relationships, but it often requires traversing the entire network, which result in computational demands skyrocketing as the number of nodes grows and make causal discovery algorithms only suitable for small networks with one or two hundred nodes or fewer. In this study, we propose the SLIVER (cauSaL dIscovery Via dimEnsionality Reduction) algorithm which integrates causal structural equation model and graph decomposition. SLIVER introduces a set of factor nodes, serving as abstractions of different functional modules to integrate the regulatory relationships between genes based on their respective functions or pathways, thus reducing the GRN to the product of two low-dimensional matrices. Subsequently, we employ the structural causal model (SCM) to learn the GRN within the gene node space, enforce the DAG constraint in the low-dimensional space, and guide each factor to aggregate various functions through cosine similarity. We evaluate the performance of the SLIVER algorithm on 12 real single cell transcriptomic datasets, and demonstrate it outperforms other 12 widely used methods both in GRN inference performance and computational resource usage. The analysis of the gene information integrated by factor nodes also demonstrate the biological explanation of factor nodes in GRNs. We apply it to scRNA-seq of Type 2 diabetes mellitus to capture the transcriptional regulatory structural changes of β cells under high insulin demand.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    我们如何构建因果DAG,例如,对于生命过程建模和分析?在这篇评论中,我回顾了因果DAG(因果发现)的数据驱动构造是如何演变的,它持有什么承诺,必须考虑什么限制或警告。总之,我发现专家或理论驱动的模型构建可能会受益于对数据的更多检查,而因果发现可以将新思想带入旧理论。
    How do we construct our causal directed acyclic graphs (DAGs)-for example, for life-course modeling and analysis? In this commentary, I review how the data-driven construction of causal DAGs (causal discovery) has evolved, what promises it holds, and what limitations or caveats must be considered. I find that expert- or theory-driven model-building might benefit from some more checking against the data and that causal discovery could bring new ideas to old theories.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号