■数据聚类是机器学习的重要领域,在广泛领域具有适用性,喜欢,业务分析,制造,能源,healthcare,旅行,和物流。已经开发了各种聚类应用。基于自组织映射(SOM)的数据聚类方法通常使用(网格)范围从2×2到8×8(4-64个神经元[微集群])的映射维度,而没有任何明确的理由使用特定的维度。因此,没有获得优化的结果。这些算法使用一些次要方法将这些微集群映射到较低维度(实际集群数量),喜欢,2、3或4,视情况而定,基于特定数据集中的最佳聚类数量。次要方法,在大多数作品中观察到,不是SOM,是一种算法,喜欢,砍树或其他。
■在这项工作中,所提出的方法将给出如何为给定数据集选择最优的更高维的SOM的想法,并且此维度再次聚集到较低的实际维度中。主要和次要,两者都利用SOM对数据进行聚类,并发现SOM的权重矩阵非常有意义。SOM的优化二维配置对于每个数据集都不相同,和这项工作也试图发现这种配置。
■在虹膜上获得的调整后的随机指数,葡萄酒,威斯康星诊断乳腺癌,新甲状腺,种子,A1,不平衡,皮肤科,大肠杆菌,电离层是,分别,0.7173、0.9134、0.7543、0.8041、0.7781、0.8907、0.8755、0.7543、0.5013和0.1728,其性能优于网络上所有其他可用的结果,并且在此工作中没有进行属性减少时。
■发现SOM优于或等于其他聚类方法,喜欢,k-means或其他,并且可以成功地用于对所有类型的数据集进行群集。来自医疗等不同领域的十个基准数据集,生物,在这项工作中测试了化学物质,包括合成数据集。
UNASSIGNED: Data clustering is an important field of machine learning that has applicability in wide areas, like, business analysis, manufacturing, energy, healthcare, traveling, and logistics. A variety of clustering applications have already been developed. Data clustering approaches based on self-organizing map (SOM) generally use the map dimensions (of the grid) ranging from 2 × 2 to 8 × 8 (4-64 neurons [microclusters]) without any explicit reason for using the particular dimension, and therefore optimized results are not obtained. These algorithms use some secondary approaches to map these microclusters into the lower dimension (actual number of clusters), like, 2, 3, or 4, as the case may be, based on the optimum number of clusters in the specific data set. The secondary approach, observed in most of the works, is not SOM and is an algorithm, like, cut tree or the other.
UNASSIGNED: In this work, the proposed approach will give an idea of how to select the most optimal higher dimension of SOM for the given data set, and this dimension is again clustered into the lower actual dimension. Primary and secondary, both utilize the SOM to cluster the data and discover that the weight matrix of the SOM is very meaningful. The optimized two-dimensional configuration of SOM is not the same for every data set, and this work also tries to discover this configuration.
UNASSIGNED: The adjusted randomized index obtained on the Iris, Wine, Wisconsin diagnostic breast cancer, New Thyroid, Seeds, A1, Imbalance, Dermatology, Ecoli, and Ionosphere is, respectively, 0.7173, 0.9134, 0.7543, 0.8041, 0.7781, 0.8907, 0.8755, 0.7543, 0.5013, and 0.1728, which outperforms all other results available on the web and when no reduction of attributes is done in this work.
UNASSIGNED: It is found that SOM is superior to or on par with other clustering approaches, like, k-means or the other, and could be used successfully to cluster all types of data sets. Ten benchmark data sets from diverse domains like medical, biological, and chemical are tested in this work, including the synthetic data sets.