关键词: Case–control studies Conditional logistic regression Multiple blocks of predictors/features Stability selection

Mesh : Logistic Models Software Case-Control Studies Humans Genomics / methods Computational Biology / methods

来  源:   DOI:10.1186/s12859-024-05850-2   PDF(Pubmed)

Abstract:
BACKGROUND: The matched case-control design, up until recently mostly pertinent to epidemiological studies, is becoming customary in biomedical applications as well. For instance, in omics studies, it is quite common to compare cancer and healthy tissue from the same patient. Furthermore, researchers today routinely collect data from various and variable sources that they wish to relate to the case-control status. This highlights the need to develop and implement statistical methods that can take these tendencies into account.
RESULTS: We present an R package penalizedclr, that provides an implementation of the penalized conditional logistic regression model for analyzing matched case-control studies. It allows for different penalties for different blocks of covariates, and it is therefore particularly useful in the presence of multi-source omics data. Both L1 and L2 penalties are implemented. Additionally, the package implements stability selection for variable selection in the considered regression model.
CONCLUSIONS: The proposed method fills a gap in the available software for fitting high-dimensional conditional logistic regression models accounting for the matched design and block structure of predictors/features. The output consists of a set of selected variables that are significantly associated with case-control status. These variables can then be investigated in terms of functional interpretation or validation in further, more targeted studies.
摘要:
背景:匹配的病例对照设计,直到最近主要与流行病学研究有关,在生物医学应用中也越来越常见。例如,在组学研究中,比较同一患者的癌症和健康组织是很常见的。此外,今天的研究人员通常从各种可变来源收集他们希望与病例控制状态相关的数据。这突出表明需要制定和实施能够考虑到这些趋势的统计方法。
结果:我们提出了一个R包惩罚,这提供了惩罚条件逻辑回归模型的实施,用于分析匹配的病例对照研究。它允许对协变量的不同块进行不同的惩罚,因此,它在存在多源组学数据的情况下特别有用。L1和L2惩罚都被实施。此外,该软件包在考虑的回归模型中实现变量选择的稳定性选择。
结论:所提出的方法填补了用于拟合高维条件逻辑回归模型的可用软件中的空白,该软件考虑了预测因子/特征的匹配设计和块结构。输出包括一组与病例控制状态显著相关的选定变量。然后,可以根据功能解释或进一步验证来研究这些变量,更有针对性的研究。
公众号