compositional data

成分数据
  • 文章类型: Journal Article
    有效预测能源消费结构对我国实现"双碳"目标至关重要。然而,关于能源消费结构的整体性和内在性质的现有研究很少受到关注。因此,本文将成分数据理论纳入能源消费结构研究,这不仅考虑了结构内部特征的特殊性,但也更深入地挖掘相关信息。同时,基于组合数据中Aitchison距离平方的最小化理论,基于三个单一模型的组合模型,即新陈代谢灰色模型(MGM),反向传播神经网络(BPNN)模型,和自回归积分移动平均(ARIMA)模型,是本文的结构。2023-2040年能源消费结构预测结果表明,未来我国能源消费结构将朝着更加多元化的方向发展,但是天然气和非化石能源的比例尚未达到政府设定的政策目标。本文不仅表明联合预测模型的成分数据在能源领域具有很高的适用性,对适应和改善我国能源消费结构具有一定的理论意义。
    Effective forecasting of energy consumption structure is vital for China to reach its \"dual carbon\" objective. However, little attention has been paid to existing studies on the holistic nature and internal properties of energy consumption structure. Therefore, this paper incorporates the theory of compositional data into the study of energy consumption structure, which not only takes into account the specificity of the internal features of the structure, but also digs deeper into the relative information. Meanwhile, based on the minimization theory of squares of the Aitchison distance in the compositional data, a combined model based on the three single models, namely the metabolism grey model (MGM), back-propagation neural network (BPNN) model, and autoregressive integrated moving average (ARIMA) model, is structured in this paper. The forecast results of the energy consumption structure in 2023-2040 indicate that the future energy consumption structure of China will evolve towards a more diversified pattern, but the proportion of natural gas and non-fossil energy has yet to meet the policy goals set by the government. This paper not only suggests that compositional data from joint prediction models have a high applicability value in the energy sector, but also has some theoretical significance for adapting and improving the energy consumption structure in China.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:本研究旨在探讨手臂抬高与颈/肩痛之间的关系。以及家庭护理人员的躯干向前弯曲和腰痛。
    方法:来自特隆赫姆11个家庭护理单位的家庭护理工作者(N=116),挪威,填写疼痛评估和工作时间问卷,并连续7天佩戴3个加速度计。工作时间被划分成直立的尴尬姿势,不尴尬的姿势,和非直立的时间,即坐着。在组成方法框架内,姿势时间组成以对数比坐标表示,用于统计分析和建模.采用泊松广义线性混合模型分析直立体位上臂抬高与颈肩痛的关系,在直立姿势的躯干向前弯曲和腰痛之间,分别。使用等时替代分析来研究疼痛评估与在不同姿势中花费的时间的重新分配之间的关联。
    结果:在尴尬姿势中花费的时间很少,特别是对于更极端的角度(60°和90°)。调整年龄,性别,和身体质量指数,我们的研究表明,家庭护理人员在尴尬姿势中花费的时间组成与疼痛评估显着相关(P<0.01)。等时替代分析表明,从直立姿势重新分配5分钟,手臂升高到60°和90°以上,颈/肩痛评分增加6.8%和19.9%,分别。从直立下方到30°以上的前弯姿势重新分配5min,60°,90°与1.8%相关,3.5%,下腰痛增加4.0%,分别。
    结论:尽管暴露于尴尬的姿势是适度的,我们的结果显示,在家庭护理工作者中,在尴尬姿势中花费的时间增加与颈部/肩部疼痛和腰背痛的增加之间存在关联.由于肌肉骨骼疼痛是疾病缺席的主要原因,这些发现表明,家庭护理单位可以从重新组织工作中受益,以避免工人手臂过度抬高和躯干向前弯曲。
    OBJECTIVE: This study aimed to explore the association between arm elevation and neck/shoulder pain, and trunk forwarding bending and low back pain among home care workers.
    METHODS: Home care workers (N = 116) from 11 home care units in Trondheim, Norway, filled in pain assessment and working hours questionnaire, and wore 3 accelerometers for up to 7 consecutive days. Work time was partitioned into upright awkward posture, nonawkward posture, and nonupright time, i.e. sitting. Within a compositional approach framework, posture time compositions were expressed in terms of log-ratio coordinates for statistical analysis and modeling. Poisson generalized linear mixed models were used to analyze the relationship between arm elevation in upright postures and neck/shoulder pain, and between trunk forward bending in upright postures and low back pain, respectively. Isotemporal substitution analysis was used to investigate the association of pain assessment with the reallocation of time spent in the different postures.
    RESULTS: Time spent in awkward postures was modest, especially for the more extreme angles (60° and 90°). Adjusting for age, gender, and body mass index, our study suggested that the compositions of time spent by home care workers in awkward postures were significantly associated with pain assessment (P < 0.01). Isotemporal substitution analysis showed that reallocating 5 min from upright posture with arms elevated below to above 60° and 90° was associated with a 6.8% and 19.9% increase in the neck/shoulder pain score, respectively. Reallocating 5 min from a forward bending posture while upright below to above 30°, 60°, and 90° was associated with 1.8%, 3.5%, and 4.0% increase in low back pain, respectively.
    CONCLUSIONS: Although the exposure to awkward postures was modest, our results showed an association between increased time spent in awkward postures and an increase in neck/shoulder pain and low back pain in home care workers. As musculoskeletal pain is the leading cause of sickness absence, these findings suggest that home care units could benefit from re-organizing work to avoid excessive arm elevation and trunk forward bending in workers.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    传统的过程监控控制图(CC)侧重于使用固定采样间隔(FSI)的采样方法。可变采样间隔(VSI)方案受到越来越多的关注,其中采样间隔(SI)长度根据过程监视统计而变化。当过程质量表明可能出现失控(OOC)情况时,考虑较短的SI;否则,较长的SI是优选的。在这项研究中,提出了基于使用等距对数比(ilr)变换的坐标表示的组成数据(VSI-MEWMACoDa)CC的VSI多元指数移动平均值。提出了一种通过考虑零态(ZS)平均发信号时间(ZATS)和稳态(SS)平均发信号时间(SATS)来获得最佳参数的方法。对于两种情况,都基于连续时间马尔可夫链(CTMC)方法评估了所提出的CC的统计性能,ZS和SS使用固定值的控制(IC)ATS0。模拟结果表明,VSI-MEWMACoDaCC比FSIMEWMACoDaCC显着降低了OOC平均发信号时间(ATS)。此外,发现变量的数量(d)对VSI-MEWMACoDaCC的ATS有负面影响,并且亚组大小(n)对VSI-MEWMACoDaCC的ATS有轻度的积极影响。同时,对于n和d的所有值,VSI-MEWMACoDaCC的SATS小于VSI-MEWMACoDaCC的ZATS。与竞争对手相比,在稳态下提出的VSI-MEWMACoDaCC有效地执行,例如FSI-MEWMACoDaCC,VSI-T2CoDaCC和FSI-T2CoDaCC。还给出了来自欧洲工厂的工业问题的示例,以研究VSI-MEWMACoDaCC的统计意义。
    Traditional process monitoring control charts (CCs) focused on sampling methods using fixed sampling intervals (FSIs). The variable sampling intervals (VSIs) scheme is receiving increasing attention, in which the sampling interval (SI) length varies according to the process monitoring statistics. A shorter SI is considered when the process quality indicates the possibility of an out-of-control (OOC) situation; otherwise, a longer SI is preferred. The VSI multivariate exponentially moving average for compositional data (VSI-MEWMACoDa) CC based on a coordinate representation using isometric log-ratio (ilr) transformation is proposed in this study. A methodology is proposed to obtain the optimal parameters by considering the zero-state (ZS) average time to signal (ZATS) and the steady-state (SS) average time to signal (SATS). The statistical performance of the proposed CC is evaluated based on a continuous-time Markov chain (CTMC) method for both cases, the ZS and the SS using a fixed value of in-control (IC) ATS0. Simulation results demonstrate that the VSI-MEWMACoDa CC has significantly decreased the OOC average time to signal (ATS) than the FSIMEWMACoDa CC. Moreover, it is found that the number of variables (d) has a negative impact on the ATS of the VSI-MEWMACoDa CC, and the subgroup size (n) has a mildly positive impact on the ATS of the VSI-MEWMACoDa CC. At the same time, the SATS of the VSI-MEWMACoDa CC is less than the ZATS of the VSI-MEWMACoDa CC for all the values of n and d. The proposed VSI-MEWMACoDa CC under steady-State performs effectively compared to its competitors, such as the FSI-MEWMACoDa CC, the VSI-T2CoDa CC and the FSI-T2CoDa CC. An example of an industrial problem from a plant in Europe is also given to study the statistical significance of the VSI-MEWMACoDa CC.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:建立了生存结局的成分介导模型,以探索24小时时间使用行为是否介导了抑郁与死亡率之间的关系。
    方法:对国家健康和营养调查(NHANES2005-2006)的4137名成年人进行了随访,直至2019年。Cox比例风险回归模型用于估计抑郁症对死亡率的总影响。成分数据分析用于检查24小时时间使用成分与死亡率之间的关系。此外,我们构建了生存结局的成分中介模型,以研究24小时时间使用行为对抑郁和死亡率的中介作用.
    结果:与没有抑郁的参与者相比,抑郁症患者的总死亡率风险显著较高(HR=1.49,95%CI:1.25,1.79),心血管疾病特异性死亡率(HR=1.89,95%CI:(1.37,2.63))和心血管疾病或癌症以外的原因死亡率(HR=1.62,95%CI:(1.25,2.08)).身体活动,尤其是中等到剧烈的体力活动,显著介导抑郁症与全因死亡率和CVD特异性死亡率之间的关系。
    结论:尽管是一项队列研究,在基线时测量暴露量和介质.需要进一步的研究,以要求暴露和中介变量之间的时间顺序。
    结论:我们的研究结果表明,24小时时间使用行为将抑郁与死亡率联系起来。特别是,增加体力活动的时间可以降低抑郁症患者的死亡风险。这一发现为降低抑郁症患者的死亡风险提供了潜在的干预措施。
    BACKGROUND: A compositional mediation model of survival outcomes was established to explore whether 24-h time-use behaviors mediate the relationship between depression and mortality.
    METHODS: 4137 adults from the National Health and Nutrition Examination Survey (NHANES 2005-2006) were followed up to 2019. Cox proportional hazards regression model was used to estimate the total effect of depression on mortality. Compositional data analysis was used to examine the relationship between 24-h time-use compositions and mortality. Furthermore, we constructed a compositional mediation model for survival outcomes to investigate the mediating effect of 24-h time-use behaviors on depression and mortality.
    RESULTS: Compared with participants without depression, depressive patients had a significantly higher risk of overall mortality (HR = 1.49, 95 % CI: 1.25,1.79), cardiovascular disease -specific mortality (HR =1.89, 95 % CI: (1.37,2.63)) and mortality from causes other than cardiovascular disease or cancer (HR = 1.62, 95 % CI: (1.25,2.08)). Physical activity, especially moderate-to-vigorous physical activity, significantly mediated the relationship between depression and all-cause and CVD-specific mortality.
    CONCLUSIONS: Despite being a cohort study, the exposure and mediatiors were measured at the baseline. Further research is necessary to require a temporal order between the exposure and mediating variables.
    CONCLUSIONS: Our findings indicate that 24-h time-use behaviors link depression to mortality. In particular, increasing the time spent on physical activity can reduce the risk of death in patients with depression. This finding provides potential interventions for reducing the risk of death in patients with depression.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    河流可以是通过自然和人为过程输入其系统中的潜在有毒元素(PTE)的汇。已经提出了许多指标来评估沉积物的污染程度和表层水体的环境条件。最重要的是,富集因子(EF)是最常用的工具,但它的局限性也是争论最多的。对参考元素和背景/基线组成的需求使得EF方法依赖于研究人员的专业知识,暗示其可重复性可能不被授予。从地球化学过程的认识开始,带来环境矩阵的组成变化,涉及多个元素而不是单个变量,我们基于元素关联的使用开发了一种改进的EF(mEF)。不同的多元统计方法(即稳健主成分分析和模糊聚类),在组合数据分析(CoDA)的视角中,用于设置mEF的所有术语。将mEF应用于从2m长的岩心收集的101个沉积物样品,沉积期约150年(1850-2007年),位于长江下游(中国)。该方法有效地识别了大多数信号,这些信号来自主要的自然和人为事件,这些事件在所考虑的时间跨度内影响了下游流域。记录的最大地球化学变化与洪水事件的发生相吻合;此外,最近的社会经济发展(在1949年内战结束和1978年经济改革开始之后)和三峡大坝(世界上最大的电站)的启动对该系统产生的影响也被拦截。所提出的方法代表了提高EF在区分地球化学异常方面的有效性的一步,这对于评估人类历史对环境的影响可能具有重要意义。
    Rivers can be sinks for potential toxic elements (PTEs) inputted in their systems by both natural and anthropic processes. Many indices have been proposed to assess the contamination degree of sediments and the environmental conditions of surficial water bodies. Above all, enrichment factor (EF) is the most used tool, but also it is the most debated for its limitations. The need for a reference element and for a background/baseline composition makes the EF method dependent on the researcher\'s expertise, implying that its repeatability may not be granted. Starting from the awareness that geochemical processes, bringing to compositional changes in the environmental matrices, involve multiple elements rather than individual variables, we developed a modified EF (mEF) based on the use of elemental associations. Different multivariate statistical methods (i.e. Robust Principal Component Analysis and Fuzzy Clustering), in a compositional data analysis (CoDA) perspective, were used to set all the terms of the mEF. The mEF was applied to 101 sediment samples collected from a 2 m-long core, covering a sedimentation period of about 150 years (1850-2007), located in the lower Changjiang River (China). The method resulted effective in recognizing most of the signals proceeding from the main natural and anthropogenic events which affected the lower river basin in the considered timespan. The largest geochemical variations recorded fit well the flooding events occurred; besides, the effects produced on the system by the recent socio-economic development (following the end of the civil war in 1949 and the beginning of economic reforms in 1978) and the start-up of the Three Gorges Dam (the world\'s largest power station since 2012) were also intercepted. The proposed method represents a step forward to enhance the effectiveness of the EF in discriminating geochemical anomalies that may be significant to assess the human historical impact on the environment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    使用成分变量进行回归分析的常用方法是以对数比坐标(系数)表示成分,然后在实空间中执行标准统计处理。类似于在真实空间中工作,问题是,当所有组成协变量的部分数高于观察数时,标准最小二乘回归失败。本研究的目的是详细分析可以解决此问题的偏最小二乘(PLS)回归。在本文中,我们关注一个以上成分响应变量和一个以上成分协变量之间的PLS回归.首先,我们给出了具有组合变量对数比坐标的PLS回归模型,然后我们直接用单纯形表示PLS模型。我们还证明了PLS模型在坐标系变化下是不变的,例如具有不同对比度矩阵或clr系数的ilr坐标。此外,给出了PLS模型中参数的估计和推断。最后,采用clr系数PLS模型分析黄芪化学代谢产物与给予黄芪后大鼠血浆代谢产物的关系。
    The common approach for regression analysis with compositional variables is to express compositions in log-ratio coordinates (coefficients) and then perform standard statistical processing in real space. Similar to working in real space, the problem is that the standard least squares regression fails when the number of parts of all compositional covariates is higher than the number of observations. The aim of this study is to analyze in detail the partial least squares (PLS) regression which can deal with this problem. In this paper, we focus on the PLS regression between more than one compositional response variable and more than one compositional covariate. First, we give the PLS regression model with log-ratio coordinates of compositional variables, then we express the PLS model directly in the simplex. We also prove that the PLS model is invariant under the change of coordinate system, such as the ilr coordinates with a different contrast matrix or the clr coefficients. Moreover, we give the estimation and inference for parameters in PLS model. Finally, the PLS model with clr coefficients is used to analyze the relationship between the chemical metabolites of Astragali Radix and the plasma metabolites of rat after giving Astragali Radix.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:在营养流行病学领域,主成分分析(PCA)已被广泛用于确定膳食模式。最近,成分数据分析(CoDA)已成为获得膳食模式的替代方法。我们的目标是直接比较和评估PCA和主平衡分析(PBA)的能力,CoDA中的数据驱动方法,确定饮食模式及其与高血压风险的关系。
    方法:队列研究。使用24小时饮食回忆问卷收集饮食数据。采用多因素logistic回归分析膳食模式与高血压的关系。
    方法:2004、2009年中国健康与营养调查。
    方法:共纳入3892名18-60岁的研究参与者作为受试者。
    结果:PCA和PBA分别确定了五种模式。PCA模式包括所有食物组的线性组合,而PBA模式包括几个零负荷的食物组。通过PBA确定的粗粮模式与高血压风险呈负相关(最高五分之一:OR=0·74(95%CI0·57,0·95);Pfor趋势=0·037)。五种PCA模式均与高血压无关。与PCA模式相比,PBA模式可以清楚地解释,并且占食物摄入差异的较高百分比.
    结论:研究结果表明,PBA可能是膳食模式分析中一种合适且有前景的方法。对粗粮饮食模式的更高依从性与较低的高血压风险相关。然而,PBA优于PCA的优势应该在未来的研究中得到证实。
    In the field of nutritional epidemiology, principal component analysis (PCA) has been used extensively in identifying dietary patterns. Recently, compositional data analysis (CoDA) has emerged as an alternative approach for obtaining dietary patterns. We aimed to directly compare and evaluate the ability of PCA and principal balances analysis (PBA), a data-driven method in CoDA, in identifying dietary patterns and their associations with the risk of hypertension.
    Cohort study. A 24-h dietary recall questionnaire was used to collect dietary data. Multivariate logistic regression analysis was used to analyse the association between dietary patterns and hypertension.
    2004 and 2009 China Health and Nutrition Survey.
    A total of 3892 study participants aged 18-60 years were included as the subjects.
    PCA and PBA identified five patterns each. PCA patterns comprised a linear combination of all food groups, whereas PBA patterns included several food groups with zero loadings. The coarse cereals pattern identified by PBA was inversely associated with hypertension risk (highest quintile: OR = 0·74 (95 % CI 0·57, 0·95); Pfor trend = 0·037). None of the five PCA patterns was associated with hypertension. Compared with the PCA patterns, the PBA patterns were clearly interpretable and accounted for a higher percentage of variance in food intake.
    Findings showed that PBA might be an appropriate and promising approach in dietary pattern analysis. Higher adherence to the coarse cereals dietary pattern was associated with a lower risk of hypertension. Nevertheless, the advantages of PBA over PCA should be confirmed in future studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Zeros in compositional data are very common and can be classified into rounded and essential zeros. The rounded zero refers to a small proportion or below detection limit value, while the essential zero refers to the complete absence of the component in the composition. In this article, we propose a new framework for analyzing compositional data with zero entries by introducing a stochastic representation. In particular, a new distribution, namely the Dirichlet composition distribution, is developed to accommodate the possible essential-zero feature in compositional data. We derive its distributional properties (e.g., its moments). The calculation of maximum likelihood estimates via the Expectation-Maximization (EM) algorithm will be proposed. The regression model based on the new Dirichlet composition distribution will be considered. Simulation studies are conducted to evaluate the performance of the proposed methodologies. Finally, our method is employed to analyze a dataset of fluorescence in situ hybridization (FISH) for chromosome detection.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    血脂异常与生活方式行为有关,而一些生活方式行为在一些人群中共同存在。这项研究旨在确定生活方式行为簇及其与血脂异常的关系。这项横断面研究是在乌海市进行的,中国。聚类分析结合成分数据分析,以日常活动和饮食模式的24小时使用时间作为输入变量。采用多因素logistic回归分析比较各组间血脂异常。共有4306名参与者参加。在第1组参与者中发现新诊断的血脂异常的患病率较高(久坐行为(SB)和睡眠最短,高盐和高油饮食)/第5组(最长的SB和短睡眠),相对于两个年龄组的其他集群(<50岁和≥50岁)。总之,不健康的生活方式行为可能在一些人群中共同存在,表明这些人是健康教育和行为干预的潜在主体。未来的研究应该进行,以调查特定的生活方式行为与血脂异常的相对意义。
    Dyslipidemia is associated with lifestyle behaviors, while several lifestyle behaviors exist collectively among some populaitons. This study aims to identify lifestyle behavior clusters and their relations to dyslipidemia. This cross-sectional study was conducted in Wuhai City, China. Cluster analysis combined with compositional data analysis was conducted, with 24-h time-use on daily activities and dietary patterns as input variables. Multiple logistic regression was conducted to compare dyslipidemia among clusters. A total of 4306 participants were included. A higher prevalence of newly diagnosed dyslipidemia was found among participants in cluster 1 (long sedentary behavior (SB) and the shortest sleep, high-salt and oil diet) /cluster 5 (the longest SB and short sleep), relative to the other clusters in both age groups (<50 years and ≥50 years). In conclusion, unhealthy lifestyle behaviors may exist together among some of the population, suggesting that these people are potential subjects of health education and behavior interventions. Future research should be conducted to investigate the relative significance of specific lifestyle behaviors in relation to dyslipidemia.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Microbial communities analysis is drawing growing attention due to the rapid development fire of high-throughput sequencing techniques nowadays. The observed data has the following typical characteristics: it is high-dimensional, compositional (lying in a simplex) and even would be leptokurtic and highly skewed due to the existence of overly abundant taxa, which makes the conventional correlation analysis infeasible to study the co-occurrence and co-exclusion relationship between microbial taxa. In this article, we address the challenges of covariance estimation for this kind of data. Assuming the basis covariance matrix lying in a well-recognized class of sparse covariance matrices, we adopt a proxy matrix known as centered log-ratio covariance matrix in the literature. We construct a Median-of-Means estimator for the centered log-ratio covariance matrix and propose a thresholding procedure that is adaptive to the variability of individual entries. By imposing a much weaker finite fourth moment condition compared with the sub-Gaussianity condition in the literature, we derive the optimal rate of convergence under the spectral norm. In addition, we also provide theoretical guarantee on support recovery. The adaptive thresholding procedure of the MOM estimator is easy to implement and gains robustness when outliers or heavy-tailedness exist. Thorough simulation studies are conducted to show the advantages of the proposed procedure over some state-of-the-arts methods. At last, we apply the proposed method to analyze a microbiome dataset in human gut.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号