关键词: dimensionality reducing k-nearest neighbor manifold learning t-SNE

来  源:   DOI:10.3390/e25071065   PDF(Pubmed)

Abstract:
In machine learning and data analysis, dimensionality reduction and high-dimensional data visualization can be accomplished by manifold learning using a t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm. We significantly improve this manifold learning scheme by introducing a preprocessing strategy for the t-SNE algorithm. In our preprocessing, we exploit Laplacian eigenmaps to reduce the high-dimensional data first, which can aggregate each data cluster and reduce the Kullback-Leibler divergence (KLD) remarkably. Moreover, the k-nearest-neighbor (KNN) algorithm is also involved in our preprocessing to enhance the visualization performance and reduce the computation and space complexity. We compare the performance of our strategy with that of the standard t-SNE on the MNIST dataset. The experiment results show that our strategy exhibits a stronger ability to separate different clusters as well as keep data of the same kind much closer to each other. Moreover, the KLD can be reduced by about 30% at the cost of increasing the complexity in terms of runtime by only 1-2%.
摘要:
在机器学习和数据分析中,降维和高维数据可视化可以通过使用t分布随机邻居嵌入(t-SNE)算法的流形学习来实现。通过为t-SNE算法引入预处理策略,我们显着改进了这种流形学习方案。在我们的预处理中,我们首先利用拉普拉斯特征映射来减少高维数据,它可以聚合每个数据集群,并显著减少Kullback-Leibler散度(KLD)。此外,k-近邻(KNN)算法也参与了我们的预处理,以提高可视化性能,降低计算和空间复杂度。我们将策略的性能与MNIST数据集上标准t-SNE的性能进行了比较。实验结果表明,我们的策略具有更强的分离能力,可以将相同类型的数据彼此更接近。此外,KLD可以减少约30%,而代价是仅将运行时的复杂性增加1-2%。
公众号