关键词: adaptive lasso cancer classification gene selection penalized logistic regression shift-lasso

来  源:   DOI:10.1515/ijb-2022-0025

Abstract:
Cancer classification and gene selection are important applications in DNA microarray gene expression data analysis. Since DNA microarray data suffers from the high-dimensionality problem, automatic gene selection methods are used to enhance the classification performance of expert classifier systems. In this paper, a new penalized logistic regression method that performs simultaneous gene coefficient estimation and variable selection in DNA microarray data is discussed. The method employs prior information about the gene coefficients to improve the classification accuracy of the underlying model. The coordinate descent algorithm with screening rules is given to obtain the gene coefficient estimates of the proposed method efficiently. The performance of the method is examined on five high-dimensional cancer classification datasets using the area under the curve, the number of selected genes, misclassification rate and F-score measures. The real data analysis results indicate that the proposed method achieves a good cancer classification performance with a small misclassification rate, large area under the curve and F-score by trading off some sparsity level of the underlying model. Hence, the proposed method can be seen as a reliable penalized logistic regression method in the scope of high-dimensional cancer classification.
摘要:
癌症分类和基因选择是DNA微阵列基因表达数据分析中的重要应用。由于DNA微阵列数据存在高维问题,自动基因选择方法用于提高专家分类器系统的分类性能。在本文中,讨论了一种新的惩罚逻辑回归方法,该方法可以在DNA微阵列数据中同时进行基因系数估计和变量选择。该方法利用基因系数的先验信息来提高基础模型的分类精度。给出了带有筛选规则的坐标下降算法,以有效地获得该方法的基因系数估计。使用曲线下的面积在五个高维癌症分类数据集上检查了该方法的性能,选择的基因的数量,误分类率和F分数度量。实际数据分析结果表明,该方法具有较好的癌症分类性能,误分类率较小,通过权衡基础模型的一些稀疏性水平,曲线下的大面积区域和F分数。因此,所提出的方法可以看作是高维癌症分类范围内可靠的惩罚逻辑回归方法。
公众号