kNN imputer

  • 文章类型: Journal Article
    自闭症谱系障碍(ASD)是一种神经发育障碍。ASD不能完全治愈,但是早期诊断后的治疗和康复有助于自闭症患者过上高质量的生活。通过问卷调查和筛查测试(如自闭症频谱商10(AQ-10)和幼儿自闭症定量检查表(Q-chat))对ASD症状进行临床诊断是昂贵的,无法访问,和耗时的过程。机器学习(ML)技术有助于在诊断的初始阶段轻松预测ASD。这项工作的主要目的是使用ML分类器对ASD和典型开发(TD)类数据进行分类。在我们的工作中,我们使用了所有年龄组的不同ASD数据集(幼儿,成年人,孩子们,和青少年)对ASD和TD病例进行分类。我们实现了One-Hot编码,以在预处理期间将分类数据转换为数值数据。然后,我们使用kNNImputer和MinMaxScaler功能转换来处理缺失值和数据规范化。使用支持向量机对ASD和TD类数据进行分类,k-最近邻(KNN),随机森林(RF),和人工神经网络分类器。对于所有四种类型的数据集,RF在100%的准确性方面提供了最佳性能,并且没有过度拟合问题。我们还通过已经发表的工作检查了我们的结果,包括深度神经网络(DNN)和卷积神经网络(CNN)等最新方法。即使使用像DNN和CNN这样的复杂架构,我们提出的方法提供了最好的结果与低复杂度模型。相比之下,现有方法的准确率高达98%,对数损失高达15%。我们提出的方法证明了在临床试验中实时ASD检测的改进推广。
    Autism spectrum disorder (ASD) is a neurodevelopmental disorder. ASD cannot be fully cured, but early-stage diagnosis followed by therapies and rehabilitation helps an autistic person to live a quality life. Clinical diagnosis of ASD symptoms via questionnaire and screening tests such as Autism Spectrum Quotient-10 (AQ-10) and Quantitative Check-list for Autism in Toddlers (Q-chat) are expensive, inaccessible, and time-consuming processes. Machine learning (ML) techniques are beneficial to predict ASD easily at the initial stage of diagnosis. The main aim of this work is to classify ASD and typical developed (TD) class data using ML classifiers. In our work, we have used different ASD data sets of all age groups (toddlers, adults, children, and adolescents) to classify ASD and TD cases. We implemented One-Hot encoding to translate categorical data into numerical data during preprocessing. We then used kNN Imputer with MinMaxScaler feature transformation to handle missing values and data normalization. ASD and TD class data is classified using Support vector machine, k-nearest-neighbor (KNN), random forest (RF), and artificial neural network classifiers. RF gives the best performance in terms of the accuracy of 100% with different training and testing data split for all four types of data sets and has no over-fitting issue. We have also examined our results with already published work, including recent methods like Deep Neural Network (DNN) and Convolution Neural Network (CNN). Even using complex architectures like DNN and CNN, our proposed methods provide the best results with low-complexity models. In contrast, existing methods have shown accuracy upto 98% with log-loss upto 15%. Our proposed methodology demonstrates the improved generalization for real-time ASD detection during clinical trials.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:宫颈癌是发展中国家女性死亡的主要原因之一。最重要的程序,应该遵循,以保证减少子宫颈癌的后效是早期识别和最好的医学指导下治疗。发现这种恶性肿瘤的最佳方法之一是通过查看巴氏涂片图像。为了自动检测宫颈癌,可用的数据集通常有缺失的值,这可能会显著影响机器学习模型的性能。方法:为了应对这些挑战,这项研究提出了一种用于预测宫颈癌的自动化系统,该系统可以有效地处理具有SMOTE特征的缺失值,以实现高准确性。所提出的系统采用堆叠集成投票分类器模型,该模型结合了三种机器学习模型,以及KNNImputer和SMOTE上采样功能,用于处理缺失值。结果:所提出的模型达到99.99%的准确率,99.99%精度,99.99%召回,使用KNN估算的SMOTE功能时,F1得分为99.99%。该研究将所提出的模型与其他多种机器学习算法在四种场景下的性能进行了比较:删除缺失值,KNN的归责,具有SMOTE功能,并具有KNN估算的SMOTE功能。该研究验证了所提出的模型相对于现有最先进的方法的有效性。结论:本研究调查了宫颈癌检测数据中缺失值和类别不平衡的问题,可能有助于医生及时发现并为宫颈癌患者提供更好的护理。
    Objective: Cervical cancer ranks among the top causes of death among females in developing countries. The most important procedures that should be followed to guarantee the minimizing of cervical cancer\'s aftereffects are early identification and treatment under the finest medical guidance. One of the best methods to find this sort of malignancy is by looking at a Pap smear image. For automated detection of cervical cancer, the available datasets often have missing values, which can significantly affect the performance of machine learning models. Methods: To address these challenges, this study proposes an automated system for predicting cervical cancer that efficiently handles missing values with SMOTE features to achieve high accuracy. The proposed system employs a stacked ensemble voting classifier model that combines three machine learning models, along with KNN Imputer and SMOTE up-sampled features for handling missing values. Results: The proposed model achieves 99.99% accuracy, 99.99% precision, 99.99% recall, and 99.99% F1 score when using KNN imputed SMOTE features. The study compares the performance of the proposed model with multiple other machine learning algorithms under four scenarios: with missing values removed, with KNN imputation, with SMOTE features, and with KNN imputed SMOTE features. The study validates the efficacy of the proposed model against existing state-of-the-art approaches. Conclusions: This study investigates the issue of missing values and class imbalance in the data collected for cervical cancer detection and might aid medical practitioners in timely detection and providing cervical cancer patients with better care.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号