测序技术的进步已经允许收集大量的全基因组信息,这大大促进了肺癌的诊断和预后。鉴定感兴趣的临床终点的有影响的标志物是统计分析流程中不可或缺的关键组成部分。然而,经典的变量选择方法对于高通量遗传数据是不可行或不可靠的。我们的目标是为高通量右删失数据提出一种无模型的基因筛选程序,并使用拟议的程序开发肺鳞状细胞癌(LUSC)的预测性基因签名。
■基于最近提出的独立性度量开发了一种基因筛选程序。然后研究关于LUSC的癌症基因组图谱(TCGA)数据。进行筛选程序以将有影响的基因的集合缩小到378个候选基因。然后将惩罚的Cox模型拟合到简化的集合,进一步鉴定了LUSC预后的6个基因签名。在来自基因表达综合的数据集上验证6-基因签名。
■模型拟合和验证结果都表明,我们的方法选择了具有影响力的基因,这些基因可以导致生物学上合理的发现以及更好的预测性能,与现有的替代方案相比。根据我们的多变量Cox回归分析,在控制临床协变量的同时,6个基因标记确实是一个显著的预后因素(p值<0.001).
■基因筛选作为一种快速降维技术在分析高通量数据中起着重要作用。本文的主要贡献是介绍了一种基本而实用的无模型基因筛查方法,该方法有助于对正确审查的癌症数据进行统计分析,并提供与LUSC上下文中其他可用方法的横向比较。
UNASSIGNED: Advances in sequencing technologies have allowed collection of massive genome-wide information that substantially advances lung cancer diagnosis and prognosis. Identifying influential markers for clinical endpoints of interest has been an indispensable and critical component of the statistical analysis pipeline. However, classical variable selection methods are not feasible or reliable for high-throughput genetic data. Our objective is to propose a model-free gene screening procedure for high-throughput right-censored data, and to develop a predictive gene signature for lung squamous cell carcinoma (LUSC) with the proposed procedure.
UNASSIGNED: A gene screening procedure was developed based on a recently proposed independence measure. The Cancer Genome Atlas (TCGA) data on LUSC was then studied. The screening procedure was conducted to narrow down the set of influential genes to 378 candidates. A penalized Cox model was then fitted to the reduced set, which further identified a 6-gene signature for LUSC prognosis. The 6-gene signature was validated on datasets from the Gene Expression Omnibus.
UNASSIGNED: Both model-fitting and validation results reveal that our method selected influential genes that lead to biologically sensible findings as well as better predictive performance, compared to existing alternatives. According to our multivariable Cox regression analysis, the 6-gene signature was indeed a significant prognostic factor (p-value < 0.001) while controlling for clinical covariates.
UNASSIGNED: Gene screening as a fast dimension reduction technique plays an important role in analyzing high-throughput data. The main contribution of this paper is to introduce a fundamental yet pragmatic model-free gene screening approach that aids statistical analysis of right-censored cancer data, and provide a lateral comparison with other available methods in the context of LUSC.