variable selection

变量选择
  • 文章类型: Journal Article
    这份手稿总结了第一作者在2024年卡尔文·施瓦贝兽医流行病学和预防医学终身成就奖研讨会上发表的演讲,授予JanSargeant博士.流行病学研究在理解暴露与健康结果之间的复杂关系中起着至关重要的作用。然而,从这些调查中得出的结论的准确性取决于对暴露变量的精心选择和测量。适当的暴露变量选择对于了解疾病病因至关重要,但通常情况下,我们无法直接测量感兴趣的风险敞口变量,而是使用代理度量来评估风险敞口。不适当地使用代理措施可能会导致对真正的利益暴露做出错误的结论。这些错误可能导致对风险敞口和结果之间关联的估计有偏差。这种偏见的后果超出了研究的关注,因为可以根据有缺陷的证据做出健康决定。认识到并减轻这些偏见对于提供可靠的证据来指导卫生政策和干预措施至关重要。最终有助于改善人口健康结果。为了应对这些挑战,研究人员必须采用严格的方法进行暴露变量选择和验证研究,以最大程度地减少测量误差。
    This manuscript summarizes a presentation delivered by the first author at the 2024 symposium for the Calvin Schwabe Award for Lifetime Achievement in Veterinary Epidemiology and Preventive Medicine, which was awarded to Dr. Jan Sargeant. Epidemiologic research plays a crucial role in understanding the complex relationships between exposures and health outcomes. However, the accuracy of the conclusions drawn from these investigations relies upon the meticulous selection and measurement of exposure variables. Appropriate exposure variable selection is crucial for understanding disease etiologies, but it is often the case that we are not able to directly measure the exposure variable of interest and use proxy measures to assess exposures instead. Inappropriate use of proxy measures can lead to erroneous conclusions being made about the true exposure of interest. These errors may lead to biased estimates of associations between exposures and outcomes. The consequences of such biases extend beyond research concerns as health decisions can be made based on flawed evidence. Recognizing and mitigating these biases are essential for producing reliable evidence that informs health policies and interventions, ultimately contributing to improved population health outcomes. To address these challenges, researchers must adopt rigorous methodologies for exposure variable selection and validation studies to minimize measurement errors.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    白质中积累的神经脱髓鞘和脑损伤在T2加权MRI扫描中以病变的形式出现为高强度区域。在人口水平上对二进制图像进行建模,每个体素代表病变的存在,在了解衰老和炎症性疾病方面发挥着重要作用。我们提出了一个可扩展的分层贝叶斯空间模型,叫做BLESS,能够通过将连续的尖峰和平板混合先验放置在空间变化的参数上并对参数施加空间依赖性来处理二元响应,该参数决定了包含概率内的稀疏性。使用动态后验勘探的平均场变分推断,这是一种类似退火的策略,可以改善优化,允许我们的方法扩展到大样本量。我们的方法还通过提供基于贝叶斯自举思想和具有随机收缩目标的尖峰和平板先验的近似后验采样方法,来解释由于变分推断而导致的后验方差的低估。除了准确的不确定性量化,这种方法能够产生基于成像统计的新的簇大小,例如集群大小的可信间隔,以及集群发生可靠性的度量。最后,我们通过模拟研究和英国生物库的应用来验证我们的结果,一项大规模病变作图研究,样本量为40,000名受试者。
    Neural demyelination and brain damage accumulated in white matter appear as hyperintense areas on T2-weighted MRI scans in the form of lesions. Modeling binary images at the population level, where each voxel represents the existence of a lesion, plays an important role in understanding aging and inflammatory diseases. We propose a scalable hierarchical Bayesian spatial model, called BLESS, capable of handling binary responses by placing continuous spike-and-slab mixture priors on spatially-varying parameters and enforcing spatial dependency on the parameter dictating the amount of sparsity within the probability of inclusion. The use of mean-field variational inference with dynamic posterior exploration, which is an annealing-like strategy that improves optimization, allows our method to scale to large sample sizes. Our method also accounts for underestimation of posterior variance due to variational inference by providing an approximate posterior sampling approach based on Bayesian bootstrap ideas and spike-and-slab priors with random shrinkage targets. Besides accurate uncertainty quantification, this approach is capable of producing novel cluster size based imaging statistics, such as credible intervals of cluster size, and measures of reliability of cluster occurrence. Lastly, we validate our results via simulation studies and an application to the UK Biobank, a large-scale lesion mapping study with a sample size of 40,000 subjects.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    统计回归模型用于根据某些预测变量的值预测结果,或用于描述结果与预测因子的关联。手头有一组数据,回归模型可以很容易地适合标准软件包。这带来了数据分析师可能在没有足够基本属性知识的情况下急于执行复杂的分析的风险,他们数据的关联和错误,导致对建模结果的错误解释和表示缺乏清晰度。对数据的特殊特征(诸如冗余或特定分布)的忽视甚至可能使所选择的分析策略无效。初始数据分析(IDA)是回归分析的先决条件,因为它提供了有关确认所选模型构建策略的适当性或完善所需数据的知识。为了正确解释建模结果,并指导建模结果的呈现。为了便于再现,国际开发协会需要预先计划,IDA计划应包括在研究项目的一般统计分析计划中,结果应该有据可查。如果IDA放弃评估结果和预测因素的关联,则最终回归模型的有偏见的统计推断可以最小化。IDA的关键原则。我们就在回归建模的背景下进行数据筛选的IDA计划中考虑哪些方面提供建议,以补充统计分析计划。我们在典型的诊断建模项目的示例中说明了此IDA数据筛选计划,并给出了数据可视化的建议。
    Statistical regression models are used for predicting outcomes based on the values of some predictor variables or for describing the association of an outcome with predictors. With a data set at hand, a regression model can be easily fit with standard software packages. This bears the risk that data analysts may rush to perform sophisticated analyses without sufficient knowledge of basic properties, associations in and errors of their data, leading to wrong interpretation and presentation of the modeling results that lacks clarity. Ignorance about special features of the data such as redundancies or particular distributions may even invalidate the chosen analysis strategy. Initial data analysis (IDA) is prerequisite to regression analyses as it provides knowledge about the data needed to confirm the appropriateness of or to refine a chosen model building strategy, to interpret the modeling results correctly, and to guide the presentation of modeling results. In order to facilitate reproducibility, IDA needs to be preplanned, an IDA plan should be included in the general statistical analysis plan of a research project, and results should be well documented. Biased statistical inference of the final regression model can be minimized if IDA abstains from evaluating associations of outcome and predictors, a key principle of IDA. We give advice on which aspects to consider in an IDA plan for data screening in the context of regression modeling to supplement the statistical analysis plan. We illustrate this IDA plan for data screening in an example of a typical diagnostic modeling project and give recommendations for data visualizations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本文介绍了一种因果发现方法,用于学习由于混淆而具有相关高斯误差的有向无环图中的非线性关系。首先,我们在亚线性增长假设下推导了模型可辨识性。然后,我们提出了一种新的方法,命名为解扰功能结构估计(DeFuSE),由消除混淆效应的去混淆调整和估计变量因果顺序的顺序程序组成。我们通过前馈神经网络实现DeFuSE,以实现可扩展的计算。此外,我们在称为强因果极小性的假设下建立DeFUSE的一致性。在模拟中,DeFuSE与忽略混杂或非线性的最先进的竞争对手相比具有优势。最后,我们证明了该方法的实用性和有效性,并将其应用于基因调控网络分析。Python实现可在https://github.com/chunlinli/defuse获得。
    This article introduces a causal discovery method to learn nonlinear relationships in a directed acyclic graph with correlated Gaussian errors due to confounding. First, we derive model identifiability under the sublinear growth assumption. Then, we propose a novel method, named the Deconfounded Functional Structure Estimation (DeFuSE), consisting of a deconfounding adjustment to remove the confounding effects and a sequential procedure to estimate the causal order of variables. We implement DeFuSE via feedforward neural networks for scalable computation. Moreover, we establish the consistency of DeFuSE under an assumption called the strong causal minimality. In simulations, DeFuSE compares favorably against state-of-the-art competitors that ignore confounding or nonlinearity. Finally, we demonstrate the utility and effectiveness of the proposed approach with an application to gene regulatory network analysis. The Python implementation is available at https://github.com/chunlinli/defuse.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    我们通过潜在的多项变量来考虑无监督分类,该变量将标量响应分类为包含标量和功能协变量的混合模型的L分量之一。该过程可以被认为是分层模型,其中第一级根据参数分布的混合对标量响应进行建模,第二级通过具有函数和标量协变量的广义线性模型对混合概率进行建模。将功能协变量视为向量的传统方法不仅遭受维度的诅咒,因为功能协变量可以以非常小的间隔测量,导致高度参数化的模型,但也没有考虑到数据的性质。我们使用基础扩展来减少维数,并使用贝叶斯方法来估计参数,同时提供潜在分类向量的预测。该方法是由两个现有方法不容易处理的数据示例驱动的。第一个例子涉及在临床试验(正常混合物模型)上识别安慰剂响应者,另一个例子涉及预测挤奶奶牛的疾病(泊松模型的零膨胀混合物)。
    We consider unsupervised classification by means of a latent multinomial variable which categorizes a scalar response into one of the L components of a mixture model which incorporates scalar and functional covariates. This process can be thought as a hierarchical model with the first level modelling a scalar response according to a mixture of parametric distributions and the second level modelling the mixture probabilities by means of a generalized linear model with functional and scalar covariates. The traditional approach of treating functional covariates as vectors not only suffers from the curse of dimensionality, since functional covariates can be measured at very small intervals leading to a highly parametrized model, but also does not take into account the nature of the data. We use basis expansions to reduce the dimensionality and a Bayesian approach for estimating the parameters while providing predictions of the latent classification vector. The method is motivated by two data examples that are not easily handled by existing methods. The first example concerns identifying placebo responders on a clinical trial (normal mixture model) and the other predicting illness for milking cows (zero-inflated mixture of the Poisson model).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    激光诱导击穿光谱(LIBS)和可见近红外光谱(vis-NIRS)是光谱技术,为传统实验室方法提供了有希望的替代方案,可以快速,经济有效地大规模测定土壤性质。尽管他们的个人局限性,与单传感器方法相比,结合LIBS和vis-NIRS已被证明可以提高确定土壤特性的预测精度。在这项研究中,我们使用了一个全面的丹麦国家尺度的土壤数据集,其中包括从各种土地利用和土壤深度收集的大部分沙质土壤,以评估LIBS和vis-NIRS的性能。以及它们的组合光谱,预测土壤有机碳(SOC)和质地。首先,开发了偏最小二乘回归(PLSR)模型,以将LIBS和vis-NIRS光谱与参考数据相关联。随后,我们合并了LIBS和vis-NIRS数据,并开发了合并光谱的PLSR模型.最后,采用区间偏最小二乘回归(iPLSR)模型评估变量选择对LIBS和vis-NIRS预测准确性的影响.尽管技术完全不同,LIBS和vis-NIRS对所研究的土壤特性显示出可比的预测性能。LIBS实现了纹理预测均方根误差(RMSEP)<7%,SOC预测均方根误差<0.5%,而vis-NIRS实现了RMSEP<8%的质地和0.5%的SOC。结合LIBS和vis-NIRS光谱将粘土的预测精度提高了16%,6%为淤泥和沙子,与单传感器LIBS预测相比,SOC为2%。另一方面,粘土的vis-NIRS单传感器预测提高了10%,17%为淤泥,沙子占16%,SOC为4%。此外,应用iPLSR进行变量选择提高了LIBS和vis-NIRS的预测精度。与LIBSPLSR预测相比,iPLSR在粘土和砂预测的RMSEP中实现了27%和17%的降低,分别,淤泥和SOC预测减少8%。同样,vis-NIRSiPLSR模型显示粘土和SOC的RMSEP降低了6%和4%,分别,淤泥和沙子减少3%。有趣的是,LIBSiPLSR模型在预测准确性方面优于LIBS-vis-NIRS组合模型。尽管LIBS和vis-NIRS的结合提高了纹理和SOC的预测精度,结合变量选择的LIBS在预测准确性方面具有更大的益处。未来的研究应研究参考方法不确定性对预测准确性的影响。
    Laser-induced breakdown spectroscopy (LIBS) and visible near-infrared spectroscopy (vis-NIRS) are spectroscopic techniques that offer promising alternatives to traditional laboratory methods for the rapid and cost-effective determination of soil properties on a large scale. Despite their individual limitations, combining LIBS and vis-NIRS has been shown to enhance the prediction accuracy for the determination of soil properties compared to single-sensor approaches. In this study, we used a comprehensive Danish national-scale soil dataset encompassing mostly sandy soils collected from various land uses and soil depths to evaluate the performance of LIBS and vis-NIRS, as well as their combined spectra, in predicting soil organic carbon (SOC) and texture. Firstly, partial least squares regression (PLSR) models were developed to correlate both LIBS and vis-NIRS spectra with the reference data. Subsequently, we merged LIBS and vis-NIRS data and developed PLSR models for the combined spectra. Finally, interval partial least squares regression (iPLSR) models were applied to assess the impact of variable selection on prediction accuracy for both LIBS and vis-NIRS. Despite being fundamentally different techniques, LIBS and vis-NIRS displayed comparable prediction performance for the investigated soil properties. LIBS achieved a root mean square error of prediction (RMSEP) of <7% for texture and 0.5% for SOC, while vis-NIRS achieved an RMSEP of <8% for texture and 0.5% for SOC. Combining LIBS and vis-NIRS spectra improved the prediction accuracy by 16% for clay, 6% for silt and sand, and 2% for SOC compared to single-sensor LIBS predictions. On the other hand, vis-NIRS single-sensor predictions were improved by 10% for clay, 17% for silt, 16% for sand, and 4% for SOC. Furthermore, applying iPLSR for variable selection improved prediction accuracy for both LIBS and vis-NIRS. Compared to LIBS PLSR predictions, iPLSR achieved reductions of 27% and 17% in RMSEP for clay and sand prediction, respectively, and an 8% reduction for silt and SOC prediction. Similarly, vis-NIRS iPLSR models demonstrated reductions of 6% and 4% in RMSEP for clay and SOC, respectively, and a 3% reduction for silt and sand. Interestingly, LIBS iPLSR models outperformed combined LIBS-vis-NIRS models in terms of prediction accuracy. Although combining LIBS and vis-NIRS improved the prediction accuracy of texture and SOC, LIBS coupled with variable selection had a greater benefit in terms of prediction accuracy. Future studies should investigate the influence of reference method uncertainty on prediction accuracy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:患者预后的预测是个性化医疗的关键组成部分。通常,使用大量候选预测因子开发预测模型,称为高维数据,包括基因组数据,实验室测试,电子健康记录,等。变量选择,也称为降维,是使用高维数据开发预测模型的关键步骤。方法:本文,我们将流行的机器学习(ML)方法的变量选择和预测性能与我们提出的方法进行了比较。LASSO是一种流行的ML方法,通过对可能性施加L1范数惩罚来选择变量。通过这种方法,LASSO根据回归估计的大小选择特征,而不是它们的统计意义。因此,LASSO可能会错过重要的特征,而已知会过度选择特征。弹性网(EN),另一种流行的ML方法,倾向于选择比LASSO更多的特征,因为它使用L1-和L2-范数惩罚的组合,比L1-范数惩罚更不严格。拟合预测模型中包含的无关紧要的特征就像白噪声一样,,从而使拟合模型失去预测精度。此外,对于未来使用拟合预测模型,我们必须收集模型中包含的所有特征的数据,这将花费很多,如果特征数量过多,可能会降低数据的准确性。因此,我们提出了一种ML方法,叫做重复筛分,用逐步变量选择扩展标准回归方法。通过根据特征的统计意义选择特征,它解决了高维数据的过度选择问题。结果:通过广泛的数值研究和真实的数据例子,我们的结果表明,重复筛分方法选择的特征远远少于LASSO和EN,但比现有的ML方法具有更高的预测精度。结论:我们得出的结论是,我们的重复筛分方法在变量选择和预测方面都表现良好,节省了未来对所选因素的调查成本。
    Background: The prediction of patients\' outcomes is a key component in personalized medicine. Oftentimes, a prediction model is developed using a large number of candidate predictors, called high-dimensional data, including genomic data, lab tests, electronic health records, etc. Variable selection, also called dimension reduction, is a critical step in developing a prediction model using high-dimensional data. Methods: In this paper, we compare the variable selection and prediction performance of popular machine learning (ML) methods with our proposed method. LASSO is a popular ML method that selects variables by imposing an L1-norm penalty to the likelihood. By this approach, LASSO selects features based on the size of regression estimates, rather than their statistical significance. As a result, LASSO can miss significant features while it is known to over-select features. Elastic net (EN), another popular ML method, tends to select even more features than LASSO since it uses a combination of L1- and L2-norm penalties that is less strict than an L1-norm penalty. Insignificant features included in a fitted prediction model act like white noises, so that the fitted model will lose prediction accuracy. Furthermore, for the future use of a fitted prediction model, we have to collect the data of all the features included in the model, which will cost a lot and possibly lower the accuracy of the data if the number of features is too many. Therefore, we propose an ML method, called repeated sieving, extending the standard regression methods with stepwise variable selection. By selecting features based on their statistical significance, it resolves the over-selection issue with high-dimensional data. Results: Through extensive numerical studies and real data examples, our results show that the repeated sieving method selects far fewer features than LASSO and EN, but has higher prediction accuracy than the existing ML methods. Conclusions: We conclude that our repeated sieving method performs well in both variable selection and prediction, and it saves the cost of future investigation on the selected factors.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    随着机器学习和人工智能(ML/AI)模型的发展,数据驱动的软传感器,尤其是基于神经网络的,已广泛用于预测污水处理厂(WWTP)中的关键水质指标。然而,最近的研究表明,预测性能和计算效率由于时变,废水处理过程的非线性和高维性质。本文提出了一种基于神经网络的双误差并行优化软测量,以实现对污水变量的及时准确预测。首先,依靠基于活动的分类(ABC)原理,引入皮尔逊相关系数(PCC)和互信息(MI)相结合的集成变量选择方法,选择最优的工艺变量作为辅助变量,从而降低数据维度,简化模型复杂度。随后,提出了一种同时最小化点预测误差和分布误差的双误差并行优化方法,旨在提高神经网络的训练效率和拟合质量。最后,有效性是在从基准模拟模型编号收集的两个数据集中进行定量评估的。1(BMS1)和实际的氧化沟WWTP。实验结果表明,所提出的软测量能实现对出水变量的精确预测,RMSE,MAE和R2值分别为0.0606、0.0486、0.99930和0.06939、0.05381、0.98040。因此,这种软测量可以加快神经网络训练过程的收敛速度,提高预测性能,从而有助于污水处理厂的有效优化管理。
    With the development of machine learning and artificial intelligence (ML/AI) models, data-driven soft sensors, especially the neural network-based, have widespread utilization for the prediction of key water quality indicators in wastewater treatment plants (WWTPs). However, recent research indicates that the prediction performance and computational efficiency are greatly compromised due to the time-varying, nonlinear and high-dimensional nature of the wastewater treatment process. This paper proposes a neural network-based soft sensor with double-errors parallel optimization to achieve more accurate prediction for effluent variables timely. Firstly, relying on the Activity Based Classification (ABC) principle, an ensemble variable selection method that combines Pearson correlation coefficient (PCC) and mutual information (MI) is introduced to select the optimal process variables as auxiliary variables, thereby reducing the data dimensionality and simplifying the model complexity. Subsequently, a double-errors parallel optimization methodology with minimizing both point prediction error and distribution error simultaneously is proposed, aiming to enhancing the training efficiency and the fitting quality of neural networks. Finally, the effectiveness is quantitatively assessed in two datasets collected from the Benchmark Simulation Model no. 1 (BMS1) and an actual oxidation ditch WWTP. The experimental results illustrate that the proposed soft sensor achieves precise effluent variable prediction, with RMSE, MAE and R2 values being 0.0606, 0.0486, 0.99930, and 0.06939, 0.05381, 0.98040, respectively. Consequently, this soft sensor can expedite the convergence speed in the neural network training process and enhance the prediction performance, thereby contributing to the effective optimization management of WWTPs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    综合分析已成为生物医学研究中的重要工具,为“小n$$n$和大p$p$$”挑战提供解决方案。利用深度学习的强大功能来提取基因与疾病之间的复杂关系,我们在这项研究中的目标是将深度学习纳入综合分析的框架。识别候选特征内的冗余,我们在提出的综合深度学习方法中引入了一个专用的特征选择层。为了进一步提高特征选择的性能,丰富的先前研究被一种集成学习方法用来识别“先验信息”。这导致了提出的先验辅助集成深度学习(PANDA)方法。我们通过一系列仿真研究证明了PANDA方法的优越性,在特征选择和结果预测方面,它明显优于竞争方法。最后,通过PANDA方法对皮肤皮肤黑色素瘤(SKCM)数据集进行了广泛分析,以显示其实际应用。
    Integrative analysis has emerged as a prominent tool in biomedical research, offering a solution to the \"small n $$ n $$ and large p $$ p $$ \" challenge. Leveraging the powerful capabilities of deep learning in extracting complex relationship between genes and diseases, our objective in this study is to incorporate deep learning into the framework of integrative analysis. Recognizing the redundancy within candidate features, we introduce a dedicated feature selection layer in the proposed integrative deep learning method. To further improve the performance of feature selection, the rich previous researches are utilized by an ensemble learning method to identify \"prior information\". This leads to the proposed prior assisted integrative deep learning (PANDA) method. We demonstrate the superiority of the PANDA method through a series of simulation studies, showing its clear advantages over competing approaches in both feature selection and outcome prediction. Finally, a skin cutaneous melanoma (SKCM) dataset is extensively analyzed by the PANDA method to show its practical application.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    贝叶斯图形模型是推断高维复杂关系的强大工具,然而,往往充满了计算和统计挑战。如果有原则地利用,与主要感兴趣的数据一起收集的越来越多的信息构成了通过指导依赖性结构的检测来减轻这些困难的机会。例如,基因网络推断可以通过使用公开可用的关于遗传变异对基因的调节的汇总统计数据来获得。在这里,我们提出了一种新颖的高斯图形建模框架,以识别和利用条件独立图中节点中心性的信息。具体来说,我们考虑一个完全联合的分层模型,以同时推断(i)稀疏精度矩阵和(ii)节点级信息的相关性,以揭示所追求的网络结构。我们使用关于节点成为枢纽的倾向的尖峰和平板子模型将这些信息编码为候选辅助变量,这允许无假设选择和解释相关变量的稀疏子集。由于实际应用需要对大型后空间进行有效的探索,我们开发了一种变分期望条件最大化算法,可以将推理扩展到数百个样本,节点和辅助变量。我们在模拟和基因网络研究中说明并利用了我们方法的优势,该研究鉴定了与免疫介导的疾病相关的生物学途径中涉及的枢纽基因。
    Bayesian graphical models are powerful tools to infer complex relationships in high dimension, yet are often fraught with computational and statistical challenges. If exploited in a principled way, the increasing information collected alongside the data of primary interest constitutes an opportunity to mitigate these difficulties by guiding the detection of dependence structures. For instance, gene network inference may be informed by the use of publicly available summary statistics on the regulation of genes by genetic variants. Here we present a novel Gaussian graphical modeling framework to identify and leverage information on the centrality of nodes in conditional independence graphs. Specifically, we consider a fully joint hierarchical model to simultaneously infer (i) sparse precision matrices and (ii) the relevance of node-level information for uncovering the sought-after network structure. We encode such information as candidate auxiliary variables using a spike-and-slab submodel on the propensity of nodes to be hubs, which allows hypothesis-free selection and interpretation of a sparse subset of relevant variables. As efficient exploration of large posterior spaces is needed for real-world applications, we develop a variational expectation conditional maximization algorithm that scales inference to hundreds of samples, nodes and auxiliary variables. We illustrate and exploit the advantages of our approach in simulations and in a gene network study which identifies hub genes involved in biological pathways relevant to immune-mediated diseases.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号