无监督机器学习的特征选择。Feature Selection for Unsupervised Machine Learning.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

Compared to supervised machine learning (ML), the development of feature selection for unsupervised ML is far behind. To address this issue, the current research proposes a stepwise feature selection approach for clustering methods with a specification to the Gaussian mixture model (GMM) and the k-means. Rather than the existing GMM and k-means which are carried out based on all the features, the proposed method selects a subset of features to implement the two methods, respectively. The research finds that a better result can be obtained if the existing GMM and k-means methods are modified by nice initializations. Experiments based on Monte Carlo simulations show that the proposed method is more computationally efficient and the result is more accurate than the existing GMM and k-means methods based on all the features. The experiment based on a real-world dataset confirms this finding.

摘要：

与监督机器学习(ML)相比，无监督ML的特征选择的发展远远落后。为了解决这个问题，当前的研究提出了一种用于聚类方法的逐步特征选择方法，该方法具有高斯混合模型（GMM）和k均值的规范。而不是基于所有特征执行的现有GMM和k-means，所提出的方法选择特征的子集来实现这两种方法，分别。研究发现，如果通过良好的初始化来修改现有的GMM和k-means方法，可以获得更好的结果。基于蒙特卡罗模拟的实验表明，与现有的基于所有特征的GMM和k-means方法相比，该方法具有更高的计算效率和更高的精度。基于真实世界数据集的实验证实了这一发现。