multicollinearity

多重共线性
  • 文章类型: Journal Article
    多重共线性,以基因之间显著的共表达模式为特征,经常发生在高通量表达数据中,可能会影响预测模型的可靠性。这项研究检查了密切相关的基因之间的多重共线性,特别是在从暴露于5-氟尿嘧啶扰动的胚状体(EB)获得的RNA-Seq数据中,以鉴定与胚胎毒性相关的基因。六个基因-Dppa5a,Gdf3,Zfp42,Meis1,Hoxa2和Hoxb1-根据领域知识作为候选者出现,并在被39种测试物质干扰的EB中使用qPCR进行了验证。我们进行了相关性研究,并利用方差膨胀因子(VIF)来检查基因之间多重共线性的存在。使用交叉验证(RFECV)的递归特征消除将Zfp42和Hoxb1列为所考虑的七个特征中的前两名,确定它们是潜在的早期胚胎毒性评估生物标志物。因此,评估该双特征预测模型的统计学显著性的t检验得出的p值为0.0044,证实了通过RFECV成功减少了冗余和多重共线性.我们的研究提出了一种在转录组学数据分析中使用机器学习技术的系统方法,增强胚胎毒性筛选研究的潜在报告基因候选物的发现,提高预测模型的预测精度和可行性,同时减少财务和时间限制。
    Multicollinearity, characterized by significant co-expression patterns among genes, often occurs in high-throughput expression data, potentially impacting the predictive model\'s reliability. This study examined multicollinearity among closely related genes, particularly in RNA-Seq data obtained from embryoid bodies (EB) exposed to 5-fluorouracil perturbation to identify genes associated with embryotoxicity. Six genes-Dppa5a, Gdf3, Zfp42, Meis1, Hoxa2, and Hoxb1-emerged as candidates based on domain knowledge and were validated using qPCR in EBs perturbed by 39 test substances. We conducted correlation studies and utilized the variance inflation factor (VIF) to examine the existence of multicollinearity among the genes. Recursive feature elimination with cross-validation (RFECV) ranked Zfp42 and Hoxb1 as the top two among the seven features considered, identifying them as potential early embryotoxicity assessment biomarkers. As a result, a t test assessing the statistical significance of this two-feature prediction model yielded a p value of 0.0044, confirming the successful reduction of redundancies and multicollinearity through RFECV. Our study presents a systematic methodology for using machine learning techniques in transcriptomics data analysis, enhancing the discovery of potential reporter gene candidates for embryotoxicity screening research, and improving the predictive model\'s predictive accuracy and feasibility while reducing financial and time constraints.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    经济增长与二氧化碳排放之间的关系进行了分析,检验了环境库兹涅茨曲线假说,但是传统的计量经济学方法可能存在缺陷。提出了一种使用分段样本回归的替代方法,并在1822年至2018年的不同时期在164个国家(占世界人口的98.34%)实施。结果表明,虽然人均GDP和人均二氧化碳排放量之间的关系随着时间的推移而减弱,它在全球范围内仍然是积极的,近年来,只有一些高收入国家表现出相反的联系。虽然49个国家的排放与经济增长脱钩,115没有。大多数非洲人,美国人,和亚洲国家没有脱钩,而大多数欧洲人和大洋洲人都有。这些发现凸显了有效气候政策的紧迫性,因为在全球范围内脱钩仍未实现。我们正在远离,而不是接近,巴黎协定的目标是将温度升高限制在工业化前水平以上1.5°C。
    The relationship between economic growth and CO2 emissions has been analyzed testing the environmental Kuznets curve hypothesis, but traditional econometric methods may be flawed. An alternative method is proposed using segmented-sample regressions and implemented in 164 countries (98.34% of world population) over different periods from 1822 to 2018. Results suggest that while the association between GDP per capita and CO2 emissions per capita is weakening over time, it remains positive globally, with only some high-income countries showing a reversed association in recent years. While 49 countries have decoupled emissions from economic growth, 115 have not. Most African, American, and Asian countries have not decoupled, whereas most European and Oceanians have. These findings highlight the urgency for effective climate policies because decoupling remains unachieved on a global scale, and we are moving away from, rather than approaching, the Paris Agreement goal of limiting temperature increase to 1.5 °C above preindustrial levels.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    饲养员的方程,Δz^=Gβ,让我们了解遗传学(遗传协方差矩阵,G)和线性选择梯度β的向量相互作用以生成进化轨迹。使用相对适合度的性状值多元回归估算β,彻底改变了我们在实验室和野生种群中研究选择的方式。然而,多重共线性,或预测因子的相关性,会导致β元素之间非常高的方差和协方差,对参数估计的解释构成了挑战。这在大数据时代尤其重要,其中预测因子的数量可能接近或超过观察值的数量。多共线预测因子的一种常见方法是丢弃其中的一些,从而失去可能从这些特征中获得的任何信息。使用模拟,我们展示了如何,一方面,多重共线性会导致对选择的不准确估计,and,另一方面,从分析中去除相关表型如何提供对选择目标的误导。我们证明了正则化回归,这对β的各个元素的大小施加了数据验证的约束,在存在多重共线性和有限数据的情况下,可以对多元选择的总强度和方向产生更准确的估计,当多重共线性较低时,成本往往很小。在对三个已发表的案例研究的重新分析中,我们还比较了标准和正则化的回归估计的选择,表明正则化回归可以改善独立数据中的适应度预测。我们的结果表明,正则化回归是一种有价值的工具,可以用作传统最小二乘选择估计的重要补充。在某些情况下,它的使用可以改善对个体健康的预测,并改进了多元选择的总强度和方向的估计。
    The breeder\'s equation, Δ z ¯ = G β   , allows us to understand how genetics (the genetic covariance matrix, G) and the vector of linear selection gradients β interact to generate evolutionary trajectories. Estimation of β using multiple regression of trait values on relative fitness revolutionized the way we study selection in laboratory and wild populations. However, multicollinearity, or correlation of predictors, can lead to very high variances of and covariances between elements of β, posing a challenge for the interpretation of the parameter estimates. This is particularly relevant in the era of big data, where the number of predictors may approach or exceed the number of observations. A common approach to multicollinear predictors is to discard some of them, thereby losing any information that might be gained from those traits. Using simulations, we show how, on the one hand, multicollinearity can result in inaccurate estimates of selection, and, on the other, how the removal of correlated phenotypes from the analyses can provide a misguided view of the targets of selection. We show that regularized regression, which places data-validated constraints on the magnitudes of individual elements of β, can produce more accurate estimates of the total strength and direction of multivariate selection in the presence of multicollinearity and limited data, and often has little cost when multicollinearity is low. We also compare standard and regularized regression estimates of selection in a reanalysis of three published case studies, showing that regularized regression can improve fitness predictions in independent data. Our results suggest that regularized regression is a valuable tool that can be used as an important complement to traditional least-squares estimates of selection. In some cases, its use can lead to improved predictions of individual fitness, and improved estimates of the total strength and direction of multivariate selection.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    概率回归模型的混合是将协变量信息纳入群体异质性学习的最常见技术之一。尽管具有灵活性,由于协变量之间的多重共线性,可能会出现不可靠的估计。在本文中,我们通过无监督学习方法开发了Liu型收缩方法,以在存在多重共线性的情况下估计模型系数。我们通过期望最大化算法的分类和随机版本来评估我们提出的方法的性能。我们使用数值模拟表明,所提出的方法优于其Ridge和最大似然对应物。最后,我们应用我们的方法分析50岁及以上女性的骨矿物质数据。
    The mixture of probabilistic regression models is one of the most common techniques to incorporate the information of covariates into learning of the population heterogeneity. Despite its flexibility, unreliable estimates can occur due to multicollinearity among covariates. In this paper, we develop Liu-type shrinkage methods through an unsupervised learning approach to estimate the model coefficients in the presence of multicollinearity. We evaluate the performance of our proposed methods via classification and stochastic versions of the expectation-maximization algorithm. We show using numerical simulations that the proposed methods outperform their Ridge and maximum likelihood counterparts. Finally, we apply our methods to analyze the bone mineral data of women aged 50 and older.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在这篇文章中,利用Kernel近似,定义了部分线性混合测量误差模型的混合预测因子和随机约束预测因子。在矩阵均方误差准则下,我们对新定义的预测因子的线性组合进行了优势比较。然后研究了测量误差的渐近正态特征和未知协方差矩阵的情况。最后,该研究以蒙特卡罗模拟研究和COVID-19数据应用结束。
    In this article, we define mixed predictor and stochastic restricted ridge predictor of partially linear mixed measurement error models by taking advantage of Kernel approximation. Under matrix mean square error criterion, we make the comparison of the superiorities the linear combinations of the new defined predictors. Then we investigate the asymptotic normality characteristics and the situation of the unknown covariance matrix of measurement errors. Finally, the study is ended with a Monte Carlo simulation study and COVID-19 data application.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    航磁测量广泛应用于地质勘探中,矿产资源评价,环境监测,军事侦察,和其他领域。有必要对这些场中的干扰进行磁补偿。近年来,大型无人机(UAV)由于可以承载更大的负载,因此更适合进行磁探测任务。本文提出了一些大型多载无人机的磁补偿方法。由于大型平台和仪器噪声的干扰,本文使用的补偿数据的标准偏差(stds)较大。在本文的开头,使用传统的T-L模型,避免了三轴磁门磁强计抗磁干扰能力的缺点。所述方向余弦信息通过惯性导航系统获得,全球定位系统,还有一个三轴磁门磁力计.然后,我们在补偿过程中增加了机动幅度;这在一定程度上减少了补偿矩阵中的多重共线性问题,但它也会导致更大的磁场干扰。最后,我们采用Lasso正则化牛顿迭代法(LRNM)。与传统的最小二乘(LS)和奇异值分解(SVD)方法相比,LRNM提供了34%和27%的改进,分别。总之,该系列方案可用于对大型多负载无人机进行有效补偿,提高大型无人机的实际使用,使它们在航磁测量数据的测量中更加准确。
    Aeromagnetic surveys are widely used in geological exploration, mineral resource assessment, environmental monitoring, military reconnaissance, and other areas. It is necessary to perform magnetic compensation for interference in these fields. In recent years, large unmanned aerial vehicles (UAVs) have been more suitable for magnetic detection missions because of the greater loads they can carry. This article proposes some methods for the magnetic compensation of large multiload UAVs. Because of the interference of the large platform and instrument noise, the standard deviations (stds) of the compensation data used in this paper are larger. At the beginning of this article, using the traditional T-L model, we avoid the shortcomings of the anti-magnetic interference ability of triaxial magnetic gate magnetometers. The direction cosine information is obtained by using an inertial navigation system, the global positioning system, and a triaxial magnetic gate magnetometer. Then, we increase the amplitude of the maneuvers in the compensation process; this reduces the multicollinearity problems in the compensation matrix to a certain extent, but it also results in greater magnetic field interference. Lastly, we employ the method of Lasso regularization Newton iteration (LRNM). Compared to the traditional methods of least squares (LS) and singular value decomposition (SVD), LRNM provides improvements of 34% and 27%, respectively. In summary, this series of schemes can be used to perform effective compensation for large multi-load UAVs and improve the actual use of large UAVs, making them more accurate in the measurement of aeromagnetic survey data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:在世界最贫穷的地区,妇女一生中因怀孕或分娩而死亡的风险约为六分之一。
    目的:本研究旨在确定Birbhum区育龄组(15-49岁)已婚妇女的孕产妇风险患病率和影响变量,西孟加拉邦.
    方法:通过有目的的分层随机抽样方法和预先设计的半结构化问卷,在229名受访者的样本中进行了基于队列的回顾性横断面研究。采用序数逻辑回归(OLR)模型作为评价工具。在开发比例OLR模型之前,我们已经检查了预测因子之间的多重共线性效应,并评估了一阶效应修饰符。我们使用SPSS版本26进行数据分析。
    结果:结果显示,文盲妇女(赔率[OR]=2.81,95%CI,0.277-1.791),来自较低的生活水平(OR=1.14,95%CI,-0.845-1.116),在15岁之前结婚(OR=21.96,95%CI,-0.55-6.73)和15-18岁之间结婚(OR=24.51。95%CI,-0.45-6.85)更容易受到母亲风险浓度较高的影响。其他重要的预测因素是怀孕登记的时间。考虑到运输和相关的途中因果关系,结果描绘了一幅清晰的画面,其中距离和旅行时间成为决定孕产妇风险集中的重要因素。
    结论:应该限制童婚的发生率。消除影响个人寻求护理决定的因素将是排除主要孕产妇风险因素的重要贡献。
    BACKGROUND: The risk of a woman dying as a result of pregnancy or childbirth during her lifetime is about one in six in the poorest parts of the world.
    OBJECTIVE: The present study aims to determine prevalence of maternal risk and the influencing variables among ever-married women belonging to the reproductive age group (15-49) of Birbhum district, West Bengal.
    METHODS: A cohort-based retrospective cross-sectional study was carried out among the sample of 229 respondents through a purposive stratified random sampling method and a pre-designed semi-structured questionnaire. The ordinal logistic regression (OLR) model was taken as a tool of assessment. Before developing the proportional OLR model, we have checked the multicollinearity effect among the predictors and the first-order effect modifier was evaluated as well. We performed data analysis using SPSS version 26.
    RESULTS: The result shows that illiterate women (Odds ratios [OR] = 2.81, 95% CI, 0.277-1.791), from lower standard of living (OR = 1.14, 95% CI, -0.845-1.116), married before the age of 15 years (OR = 21.96, 95% CI, -0.55-6.73) and between the age of 15-18 years (OR = 24.51. 95% CI, -0.45-6.85) are more likely to be affected by the higher concentration of maternal risk. Other important predictor is the time of pregnancy registration. Considering the transport and related en-route causalities, the result portraying a clear picture where the distance and travel time becoming significant factors in determining the concentration of maternal risk.
    CONCLUSIONS: Incidences of child marriages should be restricted. Eradicating factors influencing an individual\'s decision to seek care would be an essential contribution in excluding the dominant maternal risk factors.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    系统神经科学的一个重要目标是了解神经元相互作用的结构,经常通过研究记录的神经元信号之间的功能关系来接近。常用的成对措施(例如,相关系数)提供有限的洞察力,既不能解决估计的神经元相互作用的特异性,也不能解决神经元信号之间潜在的协同耦合。三方措施,例如部分相关,方差划分,和部分信息分解,通过将功能关系解开到可解释的信息原子(唯一的,冗余,和协同作用)。这里,我们将这些三方措施应用于模拟神经元记录,以调查它们对噪声的敏感性。我们发现,所考虑的措施对于无噪声源的信号大多是准确且特定的,但对于有噪声源却存在很大的偏差。我们表明,即使对于较小的噪声部分和较大的数据大小,对此类措施进行置换测试也会导致较高的假阳性率。我们提出了一个保守的零假设,用于三方测度的显著性检验,这显著降低了假阳性率,但以增加假阴性率为可承受的代价。我们希望我们的研究提高对显著性测试和功能关系解释的潜在陷阱的认识,提供概念和实用的建议。
    三方功能关系测量能够研究神经记录中的有趣效应,比如冗余,功能连接特异性,和协同耦合。然而,这种关系的估计器通常使用无噪声信号进行验证,而神经记录通常包含噪声。在这里,我们系统地研究了使用模拟噪声神经信号的三方估计器的性能。我们证明了置换测试不是从常用的三方关系估计器推断地面实况统计关系的可靠程序。我们开发了一个调整后的保守测试程序,当应用于嘈杂数据时,降低了所研究估计量的假阳性率。除了解决显著性测试,我们的结果应该有助于准确解释三方功能关系和功能连通性。
    An important goal in systems neuroscience is to understand the structure of neuronal interactions, frequently approached by studying functional relations between recorded neuronal signals. Commonly used pairwise measures (e.g., correlation coefficient) offer limited insight, neither addressing the specificity of estimated neuronal interactions nor potential synergistic coupling between neuronal signals. Tripartite measures, such as partial correlation, variance partitioning, and partial information decomposition, address these questions by disentangling functional relations into interpretable information atoms (unique, redundant, and synergistic). Here, we apply these tripartite measures to simulated neuronal recordings to investigate their sensitivity to noise. We find that the considered measures are mostly accurate and specific for signals with noiseless sources but experience significant bias for noisy sources.We show that permutation testing of such measures results in high false positive rates even for small noise fractions and large data sizes. We present a conservative null hypothesis for significance testing of tripartite measures, which significantly decreases false positive rate at a tolerable expense of increasing false negative rate. We hope our study raises awareness about the potential pitfalls of significance testing and of interpretation of functional relations, offering both conceptual and practical advice.
    Tripartite functional relation measures enable the study of interesting effects in neural recordings, such as redundancy, functional connection specificity, and synergistic coupling. However, estimators of such relations are commonly validated using noiseless signals, whereas neural recordings typically contain noise. Here we systematically study the performance of tripartite estimators using simulated noisy neural signals. We demonstrate that permutation testing is not a robust procedure for inferring ground truth statistical relations from commonly used tripartite relation estimators. We develop an adjusted conservative testing procedure, reducing false positive rates of the studied estimators when applied to noisy data. Besides addressing significance testing, our results should aid in accurate interpretation of tripartite functional relations and functional connectivity.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在多层模型中,将预测因子分解为特定级别的部分(通常通过居中完成)有利于参数估计及其解释。然而,在有关共线性的多层次文献中,很少讨论水平特异性的重要性。在这项研究中,我们对多级模型中居中和共线性的交互性提出了新的见解。在整合了关于定心和共线性的广泛文献后,我们回顾了多级数据中特定级别和混合的相关性。接下来,通过推导预测共线性和多水平模型估计之间的形式关系,我们演示了共线性的后果如何在不同的定心规范中变化,并确定了可能加剧或减轻这些后果的数据特征。我们证明,当所有或某些一级预测因子不居中时,斜率估计可能会因共线性而有很大偏差。所有预测因子的分解消除了固定效应估计仅由于共线性而有偏差的可能性;但是,在某些数据条件下,共线性与有偏标准误差和随机效应(协)方差估计有关。最后,我们说明了在多级数据中解聚对诊断共线性的重要性,并为使用特定级别共线性诊断提供了建议.总的来说,以新颖的方式阐明了分解在多水平模型中识别和管理共线性后果的必要性。
    In multilevel models, disaggregating predictors into level-specific parts (typically accomplished via centering) benefits parameter estimates and their interpretations. However, the importance of level-specificity has been sparsely addressed in multilevel literature concerning collinearity. In this study, we develop novel insights into the interactivity of centering and collinearity in multilevel models. After integrating the broad literatures on centering and collinearity, we review level-specific and conflated correlations in multilevel data. Next, by deriving formal relationships between predictor collinearity and multilevel model estimates, we demonstrate how the consequences of collinearity change across different centering specifications and identify data characteristics that may exacerbate or mitigate those consequences. We show that when all or some level-1 predictors are uncentered, slope estimates can be greatly biased by collinearity. Disaggregation of all predictors eliminates the possibility that fixed effect estimates will be biased due to collinearity alone; however, under some data conditions, collinearity is associated with biased standard errors and random effect (co)variance estimates. Finally, we illustrate the importance of disaggregation for diagnosing collinearity in multilevel data and provide recommendations for the use of level-specific collinearity diagnostics. Overall, the necessity of disaggregation for identifying and managing collinearity\'s consequences in multilevel models is clarified in novel ways.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Family cohesion and parental monitoring promote Latino adolescents\' positive adjustment. For Latino immigrant families, these parenting processes tend to be interdependent due to shared roots in cultural values emphasizing family togetherness and parental authority. This covariance poses a significant methodological problem with respect to multicollinearity. The present article uses a novel technique-residual centering-to remove shared variance among family cohesion and parental monitoring constructs and, in turn, to identify how the unique variance of each is associated with Latino adolescent adjustment. Participants include 249 9th and 10th graders in Mexican and Central American immigrant families. We compared findings from structural equation models in which parenting constructs were examined simultaneously with residual-centered models, in which shared variance among parenting constructs was removed for each parenting variable. Findings from residual-centered models revealed that parents\' monitoring of youth\'s daily activities was associated with less alcohol use and fewer youth depressive symptoms, and that parents\' monitoring of youth\'s peer activities outside the home was associated with less marijuana use and more depressive symptoms. Family cohesion was unrelated to Latino youth outcomes in residual-centered models. By isolating specific, \"pure\" parenting effects, residual centering can clarify the ways in which family cohesion and parental monitoring behaviors matter for Latino adolescents\' adjustment.
    La cohesión familiar y la supervisión de hijos promueven el bien estar de los adolescentes Latinos. Para las familias inmigrantes, estos procesos de crianza son interdependientes por que los valores de unidad y autoridad dentro de la familia son ambos culturales. Esta covarianza es un problema metodológico por que causa multicolinealidad. Este estudio usa una técnica innovadora (“residual centering”) para resolver el problema de covarianza entre los constructos de la cohesión familiar y la supervisión de hijos; y de esta manera, identificar como la varianza única de cada constructo es asociada con el ajustamiento de los adolescentes. Participantes fueron 249 adolescentes del grado 9° y 10° de familias inmigrantes de México y Centroamérica. Comparamos los resultados de modelos de ecuaciones estructurales en que los dos constructos fueron examinados simultáneamente a los modelos de “residual centering” en que los constructos fueron examinados independientemente. Según los modelos de “residual centering,” la supervisión de las actividades diarias de hijos es asociada con menos consumo de alcohol y síntomas de depresión, y la supervisión de las actividades fuera de casa es asociada con menos consumo de marihuana pero más síntomas de depresión. Sin embargo, la cohesión familiar no tuvo asociación con el ajustamiento de los adolescentes. En separar los efectos de los constructos, esta técnica de “residual centering” puede clarificar el impacto único de la cohesión familiar y la supervisión de hijos en los adolescentes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号