关键词: deep neural network fuzzy support vector machine imbalance classification machine learning oversampling deep neural network fuzzy support vector machine imbalance classification machine learning oversampling

来  源:   DOI:10.3389/fbioe.2021.802712   PDF(Pubmed)

Abstract:
Imbalanced classification is widespread in the fields of medical diagnosis, biomedicine, smart city and Internet of Things. The imbalance of data distribution makes traditional classification methods more biased towards majority classes and ignores the importance of minority class. It makes the traditional classification methods ineffective in imbalanced classification. In this paper, a novel imbalance classification method based on deep learning and fuzzy support vector machine is proposed and named as DFSVM. DFSVM first uses a deep neural network to obtain an embedding representation of the data. This deep neural network is trained by using triplet loss to enhance similarities within classes and differences between classes. To alleviate the effects of imbalanced data distribution, oversampling is performed in the embedding space of the data. In this paper, we use an oversampling method based on feature and center distance, which can obtain more diverse new samples and prevent overfitting. To enhance the impact of minority class, we use a fuzzy support vector machine (FSVM) based on cost-sensitive learning as the final classifier. FSVM assigns a higher misclassification cost to minority class samples to improve the classification quality. Experiments were performed on multiple biological datasets and real-world datasets. The experimental results show that DFSVM has achieved promising classification performance.
摘要:
不平衡分类广泛存在于医学诊断领域,生物医学,智慧城市和物联网。数据分布的不均衡使得传统的分类方法更偏向于多数类,忽视了少数类的重要性。这使得传统的分类方法在不平衡分类中失效。在本文中,提出了一种基于深度学习和模糊支持向量机的不平衡分类方法,并命名为DFSVM。DFSVM首先使用深度神经网络来获取数据的嵌入表示。该深度神经网络通过使用三元组损失来训练,以增强类内的相似性和类之间的差异。为了缓解数据分布不平衡的影响,在数据的嵌入空间中进行过采样。在本文中,我们使用基于特征和中心距离的过采样方法,这可以获得更多样化的新样本并防止过拟合。为了增强少数民族的影响力,我们使用基于代价敏感学习的模糊支持向量机(FSVM)作为最终分类器。FSVM为少数类样本分配了较高的误分类成本,以提高分类质量。对多个生物数据集和真实世界数据集进行实验。实验结果表明,DFSVM取得了良好的分类性能。
公众号