关键词: Dimensionality reduction Granular structure IF set Instance selection Rough set

来  源:   DOI:10.1038/s41598-024-62099-8

Abstract:
The dimension and size of data is growing rapidly with the extensive applications of computer science and lab based engineering in daily life. Due to availability of vagueness, later uncertainty, redundancy, irrelevancy, and noise, which imposes concerns in building effective learning models. Fuzzy rough set and its extensions have been applied to deal with these issues by various data reduction approaches. However, construction of a model that can cope with all these issues simultaneously is always a challenging task. None of the studies till date has addressed all these issues simultaneously. This paper investigates a method based on the notions of intuitionistic fuzzy (IF) and rough sets to avoid these obstacles simultaneously by putting forward an interesting data reduction technique. To accomplish this task, firstly, a novel IF similarity relation is addressed. Secondly, we establish an IF rough set model on the basis of this similarity relation. Thirdly, an IF granular structure is presented by using the established similarity relation and the lower approximation. Next, the mathematical theorems are used to validate the proposed notions. Then, the importance-degree of the IF granules is employed for redundant size elimination. Further, significance-degree-preserved dimensionality reduction is discussed. Hence, simultaneous instance and feature selection for large volume of high-dimensional datasets can be performed to eliminate redundancy and irrelevancy in both dimension and size, where vagueness and later uncertainty are handled with rough and IF sets respectively, whilst noise is tackled with IF granular structure. Thereafter, a comprehensive experiment is carried out over the benchmark datasets to demonstrate the effectiveness of simultaneous feature and data point selection methods. Finally, our proposed methodology aided framework is discussed to enhance the regression performance for IC50 of Antiviral Peptides.
摘要:
随着计算机科学和基于实验室的工程在日常生活中的广泛应用,数据的维度和规模正在迅速增长。由于模糊性的可用性,后来的不确定性,冗余,无关,和噪音,这在构建有效的学习模型方面提出了担忧。模糊粗糙集及其扩展已通过各种数据约简方法应用于处理这些问题。然而,构建一个能够同时应对所有这些问题的模型总是一项具有挑战性的任务。迄今为止,没有一项研究同时解决了所有这些问题。本文研究了一种基于直觉模糊(IF)和粗糙集概念的方法,通过提出一种有趣的数据约简技术来同时避免这些障碍。为了完成这项任务,首先,提出了一种新的IF相似关系。其次,在这种相似关系的基础上建立了IF粗糙集模型。第三,通过使用建立的相似关系和下近似,给出了IF颗粒结构。接下来,数学定理用于验证所提出的概念。然后,IF颗粒的重要性程度用于多余的尺寸消除。Further,讨论了重要度保留的降维。因此,可以同时执行大量高维数据集的实例和特征选择,以消除维度和大小上的冗余和不相关性,其中模糊性和后来的不确定性分别用粗糙集和IF集处理,而噪声是用中频颗粒结构解决的。此后,对基准数据集进行了全面的实验,以证明同时选择特征和数据点的方法的有效性。最后,我们提出的方法学辅助框架进行了讨论,以提高抗病毒肽的IC50的回归性能。
公众号