关键词: Forensic-based investigation Gene selection Global optimization High-dimensional genetic data Slime mould algorithm

Mesh : Algorithms Clinical Decision-Making Genetic Techniques Physarum polycephalum Technology

来  源:   DOI:10.1038/s41598-024-59064-w   PDF(Pubmed)

Abstract:
Modern medicine has produced large genetic datasets of high dimensions through advanced gene sequencing technology, and processing these data is of great significance for clinical decision-making. Gene selection (GS) is an important data preprocessing technique that aims to select a subset of feature information to improve performance and reduce data dimensionality. This study proposes an improved wrapper GS method based on forensic-based investigation (FBI). The method introduces the search mechanism of the slime mould algorithm in the FBI to improve the original FBI; the newly proposed algorithm is named SMA_FBI; then GS is performed by converting the continuous optimizer to a binary version of the optimizer through a transfer function. In order to verify the superiority of SMA_FBI, experiments are first executed on the 30-function test set of CEC2017 and compared with 10 original algorithms and 10 state-of-the-art algorithms. The experimental results show that SMA_FBI is better than other algorithms in terms of finding the optimal solution, convergence speed, and robustness. In addition, BSMA_FBI (binary version of SMA_FBI) is compared with 8 binary algorithms on 18 high-dimensional genetic data from the UCI repository. The results indicate that BSMA_FBI is able to obtain high classification accuracy with fewer features selected in GS applications. Therefore, SMA_FBI is considered an optimization tool with great potential for dealing with global optimization problems, and its binary version, BSMA_FBI, can be used for GS tasks.
摘要:
现代医学通过先进的基因测序技术产生了高维度的大型基因数据集,这些数据的处理对临床决策具有重要意义。基因选择(GS)是一种重要的数据预处理技术,旨在选择特征信息的子集以提高性能并降低数据维度。本研究提出了一种改进的基于法医调查(FBI)的包装GS方法。该方法在FBI中引入了煤泥模型算法的搜索机制,以改进原FBI;新提出的算法命名为SMA_FBI;然后通过传递函数将连续优化器转换为优化器的二进制版本来执行GS。为了验证SMA_FBI的优越性,首先在CEC2017的30功能测试集上进行实验,并与10种原始算法和10种最先进的算法进行比较。实验结果表明,SMA_FBI在寻找最优解方面优于其他算法,收敛速度,和鲁棒性。此外,BSMA_FBI(SMA_FBI的二进制版本)与UCI存储库中18个高维遗传数据的8个二进制算法进行了比较。结果表明,BSMA_FBI能够在GS应用中选择较少的特征来获得较高的分类精度。因此,SMA_FBI被认为是一种优化工具,具有处理全局优化问题的巨大潜力,和它的二进制版本,BSMA_FBI,可用于GS任务。
公众号