Classification Algorithms

分类算法
  • 文章类型: Journal Article
    医疗保健的有效性是每个干预措施和评估结果的特定特征。特别是在外科手术方面,组织,结构和过程在确定此参数中起着关键作用。此外,根据定义,医疗保健服务在资源有限的情况下运作,因此,服务组织的合理化成为医疗保健管理的首要目标。这个方面对于那些有大量的手术服务变得更加相关。因此,为了支持和优化接受外科手术的患者的管理,数据分析可以发挥重要作用。为此,在这项研究中,使用不同的分类算法来描述股骨颈骨折手术患者的过程。这些模型显示出显著的准确性,值为81%,和参数,如贫血和性别被证明是确定的危险因素,患者的住院时间。鉴于其支持股骨颈骨折住院过程的管理和优化的能力,对实施模型的预测能力进行了评估和讨论。并与不同的模型进行比较,以找出最有前途的算法。最后,人工智能算法的支持,为医疗从业者构建更准确的决策支持工具奠定基础。
    Effectiveness in health care is a specific characteristic of each intervention and outcome evaluated. Especially with regard to surgical interventions, organization, structure and processes play a key role in determining this parameter. In addition, health care services by definition operate in a context of limited resources, so rationalization of service organization becomes the primary goal for health care management. This aspect becomes even more relevant for those surgical services for which there are high volumes. Therefore, in order to support and optimize the management of patients undergoing surgical procedures, the data analysis could play a significant role. To this end, in this study used different classification algorithms for characterizing the process of patients undergoing surgery for a femoral neck fracture. The models showed significant accuracy with values of 81%, and parameters such as Anaemia and Gender proved to be determined risk factors for the patient\'s length of stay. The predictive power of the implemented model is assessed and discussed in view of its capability to support the management and optimisation of the hospitalisation process for femoral neck fracture, and is compared with different model in order to identify the most promising algorithms. In the end, the support of artificial intelligence algorithms laying the basis for building more accurate decision-support tools for healthcare practitioners.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:该研究旨在确定与CVD相关的最关键的参数,并采用新颖的数据集成细化程序来揭示这些参数的最佳模式,这可以导致高预测精度。
    结果:总共收集了369名患者的数据,281名患有CVD或有发展风险的患者,与88个其他健康的人相比。在281名心血管疾病或高危患者中,53例被诊断为冠状动脉疾病(CAD),16患有终末期肾病,47例新诊断为2型糖尿病和92例慢性炎症性疾病(21类风湿性关节炎,41牛皮癣,30血管炎)。使用基于人工智能的算法分析数据,其主要目的是识别定义CVD的参数的最佳模式。该研究强调了使用DERGA和ExtraTrees算法识别心血管疾病可能性的六参数组合的有效性。这些参数,按重要性排序,包括血小板衍生的微囊泡(PMV),高血压,年龄,吸烟,血脂异常,身体质量指数(BMI)。内皮和红细胞MV,与糖尿病一起是最不重要的预测因素。此外,达到的最高预测精度为98.64%。值得注意的是,单独使用PMV可以获得91.32%的准确率,而采用所有十个参数的最优模型,得到的预测精度为0.9783(97.83%)。
    结论:我们的研究显示了DERGA的疗效,一种创新的数据集成细化贪婪算法。DERGA加速评估个体发生CVD的风险,允许早期诊断,显著减少所需实验室测试的数量,并优化资源利用率。此外,它有助于确定对评估CVD敏感性至关重要的最佳参数,从而增强我们对潜在机制的理解。
    BACKGROUND: The study aimed to determine the most crucial parameters associated with CVD and employ a novel data ensemble refinement procedure to uncover the optimal pattern of these parameters that can result in a high prediction accuracy.
    RESULTS: Data were collected from 369 patients in total, 281 patients with CVD or at risk of developing it, compared to 88 otherwise healthy individuals. Within the group of 281 CVD or at-risk patients, 53 were diagnosed with coronary artery disease (CAD), 16 with end-stage renal disease, 47 newly diagnosed with diabetes mellitus 2 and 92 with chronic inflammatory disorders (21 rheumatoid arthritis, 41 psoriasis, 30 angiitis). The data were analyzed using an artificial intelligence-based algorithm with the primary objective of identifying the optimal pattern of parameters that define CVD. The study highlights the effectiveness of a six-parameter combination in discerning the likelihood of cardiovascular disease using DERGA and Extra Trees algorithms. These parameters, ranked in order of importance, include Platelet-derived Microvesicles (PMV), hypertension, age, smoking, dyslipidemia, and Body Mass Index (BMI). Endothelial and erythrocyte MVs, along with diabetes were the least important predictors. In addition, the highest prediction accuracy achieved is 98.64%. Notably, using PMVs alone yields a 91.32% accuracy, while the optimal model employing all ten parameters, yields a prediction accuracy of 0.9783 (97.83%).
    CONCLUSIONS: Our research showcases the efficacy of DERGA, an innovative data ensemble refinement greedy algorithm. DERGA accelerates the assessment of an individual\'s risk of developing CVD, allowing for early diagnosis, significantly reduces the number of required lab tests and optimizes resource utilization. Additionally, it assists in identifying the optimal parameters critical for assessing CVD susceptibility, thereby enhancing our understanding of the underlying mechanisms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    由抗生素的过度使用和生物膜的发展引起的多药耐药细菌(MRB)的出现和迅速传播,对全球公共卫生构成了越来越大的威胁。纳米颗粒作为抗生素的替代品被证明具有通过新的抗微生物机制应对MRB感染的实质性能力。特别是,具有独特(生物)物理化学特性的碳点(CD)在通过破坏细菌壁来对抗MRB方面受到了相当大的关注,与DNA或酶结合,局部诱导高温,或形成活性氧。
    这里,在机器学习(ML)工具的帮助下,研究了各种CD的物理化学特征如何影响其抗菌能力。
    首先收集来自121个样品的CD的合成条件和固有特性,以形成原始数据集,以最小抑制浓度(MIC)为输出。四种分类算法(KNN,SVM,射频,和XGBoost)用输入数据进行训练和验证。发现集成学习方法在我们的数据上是最好的。此外,开发了ε-聚(L-赖氨酸)CD(PL-CD),以验证经过良好训练的ML模型在实验室中的实际应用能力,该模型具有两个管理预测的集成模型。
    因此,我们的结果表明,基于ML的高通量理论计算可用于预测和解码CD特性与抗菌效果之间的关系,加速高性能纳米粒子的开发和潜在的临床翻译。
    UNASSIGNED: The emergence and rapid spread of multidrug-resistant bacteria (MRB) caused by the excessive use of antibiotics and the development of biofilms have been a growing threat to global public health. Nanoparticles as substitutes for antibiotics were proven to possess substantial abilities for tackling MRB infections via new antimicrobial mechanisms. Particularly, carbon dots (CDs) with unique (bio)physicochemical characteristics have been receiving considerable attention in combating MRB by damaging the bacterial wall, binding to DNA or enzymes, inducing hyperthermia locally, or forming reactive oxygen species.
    UNASSIGNED: Herein, how the physicochemical features of various CDs affect their antimicrobial capacity is investigated with the assistance of machine learning (ML) tools.
    UNASSIGNED: The synthetic conditions and intrinsic properties of CDs from 121 samples are initially gathered to form the raw dataset, with Minimum inhibitory concentration (MIC) being the output. Four classification algorithms (KNN, SVM, RF, and XGBoost) are trained and validated with the input data. It is found that the ensemble learning methods turn out to be the best on our data. Also, ε-poly(L-lysine) CDs (PL-CDs) were developed to validate the practical application ability of the well-trained ML models in a laboratory with two ensemble models managing the prediction.
    UNASSIGNED: Thus, our results demonstrate that ML-based high-throughput theoretical calculation could be used to predict and decode the relationship between CD properties and the anti-bacterial effect, accelerating the development of high-performance nanoparticles and potential clinical translation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    最近,各种机器学习方法已被广泛用于有效诊断和预测癌症等疾病,甲状腺,Covid-19等。同样,阿尔茨海默病(AD)也是一种进行性疾病,随着时间的推移会破坏记忆和认知功能。不幸的是,没有专门的基于AI的AD诊断解决方案与医疗诊断齐头并进,尽管多种因素有助于诊断,使AI成为非常可行的辅助诊断解决方案。本文报告了应用各种机器学习算法的努力,如SGD,k-最近的邻居,Logistic回归,决策树,随机森林,AdaBoost,神经网络,SVM,和朴素贝叶斯对受影响受害者的数据集进行诊断阿尔茨海默病。来自OASIS数据集的受试者的纵向集合已用于预测。此外,一些特征选择和降维方法,如信息增益,信息增益比,基尼系数,卡方,和PCA用于对不同因素进行排序,并从数据集中确定用于疾病诊断的最佳因素数。此外,根据ROC-AUC评估每个分类器的性能,准确度,F1得分,召回,和精度,以及包括算法之间的比较分析。我们的研究表明,在最高评级的四个功能CDR下观察到大约90%的分类准确率,SES,nWBV,和EDUC。
    In recent times, various machine learning approaches have been widely employed for effective diagnosis and prediction of diseases like cancer, thyroid, Covid-19, etc. Likewise, Alzheimer\'s (AD) is also one progressive malady that destroys memory and cognitive function over time. Unfortunately, there are no dedicated AI-based solutions for diagnoses of AD to go hand in hand with medical diagnosis, even though multiple factors contribute to the diagnosis, making AI a very viable supplementary diagnostic solution. This paper reports an endeavor to apply various machine learning algorithms like SGD, k-Nearest Neighbors, Logistic Regression, Decision tree, Random Forest, AdaBoost, Neural Network, SVM, and Naïve Bayes on the dataset of affected victims to diagnose Alzheimer\'s disease. Longitudinal collections of subjects from OASIS dataset have been used for prediction. Moreover, some feature selection and dimension reduction methods like Information Gain, Information Gain Ratio, Gini index, Chi-Squared, and PCA are applied to rank different factors and identify the optimum number of factors from the dataset for disease diagnosis. Furthermore, performance is evaluated of each classifier in terms of ROC-AUC, accuracy, F1 score, recall, and precision as well as included comparative analysis between algorithms. Our study suggests that approximately 90% classification accuracy is observed under top-rated four features CDR, SES, nWBV, and EDUC.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    重要的是要确定在急诊科就诊的COVID-19患者进入重症监护病房(ICU)的风险。使用人工神经网络,我们提出了一种新的数据集成细化贪婪算法(DERGA)基于15个容易获得的血液学指标。使用了1596名COVID-19患者的数据库;它被分为1257个训练数据集(数据库的80%)用于训练算法和339个测试数据集(数据库的20%)用于检查算法的可靠性。提供最佳预测的血液学指标的最佳组合仅包括以下四个血液学指标:中性粒细胞与淋巴细胞比率(NLR),乳酸脱氢酶,铁蛋白,和白蛋白。最佳预测对应于97.12%的特别高的准确度。总之,我们的新方法提供了仅基于基本血液学参数的稳健模型,用于预测ICU入住风险,并在临床实践中优化COVID-19患者管理.
    It is important to determine the risk for admission to the intensive care unit (ICU) in patients with COVID-19 presenting at the emergency department. Using artificial neural networks, we propose a new Data Ensemble Refinement Greedy Algorithm (DERGA) based on 15 easily accessible hematological indices. A database of 1596 patients with COVID-19 was used; it was divided into 1257 training datasets (80 % of the database) for training the algorithms and 339 testing datasets (20 % of the database) to check the reliability of the algorithms. The optimal combination of hematological indicators that gives the best prediction consists of only four hematological indicators as follows: neutrophil-to-lymphocyte ratio (NLR), lactate dehydrogenase, ferritin, and albumin. The best prediction corresponds to a particularly high accuracy of 97.12 %. In conclusion, our novel approach provides a robust model based only on basic hematological parameters for predicting the risk for ICU admission and optimize COVID-19 patient management in the clinical practice.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    补体抑制在各种疾病中显示出希望,包括COVID-19。包括补体遗传变异的预测工具至关重要。这项研究旨在确定关键的补体相关变异,并确定准确预测疾病结果的最佳模式。使用基于人工智能的算法分析了2020年4月至2021年4月在三个转诊中心住院的204例COVID-19患者的遗传数据,以预测疾病结局(ICU与非ICU入院)。最近引入的α指数确定了30种最具预测性的遗传变异。DERGA算法,采用多种分类算法,确定了这些关键变体的最佳模式,预测疾病结果的准确率为97%。每个患者的个体差异从40到161个变异,检测到977种变体。这项研究证明了α指数在对大量遗传变异进行排名中的实用性。这种方法能够实现完善的分类算法,有效地确定遗传变异在高精度预测结果中的相关性。
    Complement inhibition has shown promise in various disorders, including COVID-19. A prediction tool including complement genetic variants is vital. This study aims to identify crucial complement-related variants and determine an optimal pattern for accurate disease outcome prediction. Genetic data from 204 COVID-19 patients hospitalized between April 2020 and April 2021 at three referral centres were analysed using an artificial intelligence-based algorithm to predict disease outcome (ICU vs. non-ICU admission). A recently introduced alpha-index identified the 30 most predictive genetic variants. DERGA algorithm, which employs multiple classification algorithms, determined the optimal pattern of these key variants, resulting in 97% accuracy for predicting disease outcome. Individual variations ranged from 40 to 161 variants per patient, with 977 total variants detected. This study demonstrates the utility of alpha-index in ranking a substantial number of genetic variants. This approach enables the implementation of well-established classification algorithms that effectively determine the relevance of genetic variants in predicting outcomes with high accuracy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在这项研究中,它旨在通过使用机器学习算法(MLM)预测发达国家股市指数的运动方向并确定最佳估计算法来比较算法的性能。为此,纽约证券交易所100指数(美国)等指数的运动方向,NIKKEI225(日本),FTSE100(英国),CAC40(法国),DAX30(德国),FTSEMIB(意大利),和TSX(加拿大)通过使用决策树进行估计,随机森林k-近邻,天真的贝叶斯,逻辑回归,支持向量机和人工神经网络算法。根据获得的结果,人工神经网络被发现是纽约证券交易所100、FTSE100、DAX30和FTSEMIB指数的最佳算法,而逻辑回归被确定为NIKKEI225,CAC40和TSX指数的最佳算法。人工神经网络,表现出最高的平均预测性能,已被确定为发达国家股市指数的最佳预测算法。人们还指出,人工神经网络,逻辑回归,和支持向量机算法能够预测所有指标的方向运动,准确率超过70%。
    In this study, it is aimed to compare the performances of the algorithms by predicting the movement directions of stock market indexes in developed countries by employing machine learning algorithms (MLMs) and determining the best estimation algorithm. For this purpose, the movement directions of indexes such as the NYSE 100 (the USA), NIKKEI 225 (Japan), FTSE 100 (the UK), CAC 40 (France), DAX 30 (Germany), FTSE MIB (Italy), and TSX (Canada) were estimated by employing the decision tree, random forest k-nearest neighbor, naive Bayes, logistic regression, support vector machines and artificial neural network algorithms. According to the results obtained, artificial neural networks were found to be the best algorithm for NYSE 100, FTSE 100, DAX 30 and FTSE MIB indices, while logistic regression was determined to be the best algorithm for the NIKKEI 225, CAC 40, and TSX indices. The artificial neural networks, which exhibited the highest average prediction performance, have been determined as the best prediction algorithm for the stock market indices of developed countries. It was also noted that artificial neural networks, logistic regression, and support vector machines algorithms were capable of predicting the directional movements of all indices with an accuracy rate of over 70 %.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    我们讨论了基于化学电阻传感器上的低频噪声测量的气体传感系统的实施挑战。各种气体传感材料的电阻波动,在通常高达几kHz的频率范围内,可以通过考虑其强度和功率谱密度的斜率来增强气体传感。电阻式气体传感器中的低频噪声测量问题,特别是在具有气体传感特性的二维材料中,被考虑。我们介绍了气体检测的测量设置和噪声处理方法。化学电阻传感器示出了需要不同闪烁噪声测量方法的各种DC电阻。单独的噪声测量设置用于高达几百kΩ的电阻和具有高得多的值的电阻。高电阻材料中的噪声测量(例如,MoS2,WS2和ZrS3)易于受到外部干扰,但可以使用温度或光照射进行调制以增强感测。因此,这样的材料对于气体感测是相当感兴趣的。
    We discuss the implementation challenges of gas sensing systems based on low-frequency noise measurements on chemoresistive sensors. Resistance fluctuations in various gas sensing materials, in a frequency range typically up to a few kHz, can enhance gas sensing by considering its intensity and the slope of power spectral density. The issues of low-frequency noise measurements in resistive gas sensors, specifically in two-dimensional materials exhibiting gas-sensing properties, are considered. We present measurement setups and noise-processing methods for gas detection. The chemoresistive sensors show various DC resistances requiring different flicker noise measurement approaches. Separate noise measurement setups are used for resistances up to a few hundred kΩ and for resistances with much higher values. Noise measurements in highly resistive materials (e.g., MoS2, WS2, and ZrS3) are prone to external interferences but can be modulated using temperature or light irradiation for enhanced sensing. Therefore, such materials are of considerable interest for gas sensing.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:不参加医疗预约会对医疗保健系统及其客户产生重大不利影响。使用机器学习来预测不出现允许管理者实施策略,例如针对最有可能错过预约的患者的超额预订和提醒。优化资源使用。
    方法:在本研究中,我们提出了一个详细的分析框架,用于在解决不平衡数据集的同时预测不显示。该框架包括在建模过程中执行两次的z-fold交叉验证的新颖使用,以提高模型的鲁棒性和泛化性。我们还引入了符号回归(SR)作为分类算法和实例硬度阈值(IHT)作为重采样技术,并将其性能与其他分类算法进行了比较。如K近邻(KNN)和支持向量机(SVM),和重采样技术,例如随机抽样(RUS),合成少数过采样技术(SMOTE)和NearMiss-1。我们使用来自巴西医院的两个就诊数据集验证了该框架,未显示率为6.65%和19.03%。
    结果:从学术角度来看,我们的研究首次提出使用SR和IHT来预测患者的未出现.我们的发现表明,与其他技术相比,SR和IHT表现出优异的性能,特别是IHT,与所有分类算法结合使用时表现优异,并导致性能指标结果的低可变性。我们的结果也优于文献中报道的敏感性结果,两个数据集的值都高于0.94。
    结论:这是第一个使用SR和IHT方法预测患者未出现的研究,并且是第一个提出进行两次z折交叉验证的研究。我们的研究强调了避免对不平衡数据集进行少量验证运行的重要性,因为这可能会导致有偏见的结果以及对训练阶段获得的模型的泛化和稳定性的分析不足。
    BACKGROUND: No-show to medical appointments has significant adverse effects on healthcare systems and their clients. Using machine learning to predict no-shows allows managers to implement strategies such as overbooking and reminders targeting patients most likely to miss appointments, optimizing the use of resources.
    METHODS: In this study, we proposed a detailed analytical framework for predicting no-shows while addressing imbalanced datasets. The framework includes a novel use of z-fold cross-validation performed twice during the modeling process to improve model robustness and generalization. We also introduce Symbolic Regression (SR) as a classification algorithm and Instance Hardness Threshold (IHT) as a resampling technique and compared their performance with that of other classification algorithms, such as K-Nearest Neighbors (KNN) and Support Vector Machine (SVM), and resampling techniques, such as Random under Sampling (RUS), Synthetic Minority Oversampling Technique (SMOTE) and NearMiss-1. We validated the framework using two attendance datasets from Brazilian hospitals with no-show rates of 6.65% and 19.03%.
    RESULTS: From the academic perspective, our study is the first to propose using SR and IHT to predict the no-show of patients. Our findings indicate that SR and IHT presented superior performances compared to other techniques, particularly IHT, which excelled when combined with all classification algorithms and led to low variability in performance metrics results. Our results also outperformed sensitivity outcomes reported in the literature, with values above 0.94 for both datasets.
    CONCLUSIONS: This is the first study to use SR and IHT methods to predict patient no-shows and the first to propose performing z-fold cross-validation twice. Our study highlights the importance of avoiding relying on few validation runs for imbalanced datasets as it may lead to biased results and inadequate analysis of the generalization and stability of the models obtained during the training stage.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    研究人员普遍认为,大数据分析的应用有望减少人类偏见,并为司法程序提供科学和基于证据的方法。在这个数据集中,由尼日利亚最高法院(SCN)提出的上诉案件组成的历史数据是从在线存储库(PrimsolLawPavillion)收集的。从档案中收集了总共5585起上诉案件提交给SCN。该数据集包括提交给SCN的刑事和民事上诉案件。从相关文献中确定了与法院案件程序相关的变量,由法律专家验证,并用作根据非结构化数据生成存储为电子表格文件的数据集的电子结构化版本的基础。从收集的数据来看,用一个输出/决策变量确定了13个输入变量。数值变量的分布以最小的描述性统计摘要表示,最大值,mode,平均值和标准偏差。开发的数据集可以帮助研究人员通过训练他们的模型来构建预测系统。还可以在数据集上应用各种特征提取技术以去除不相关或冗余的特征,以提高预测法律案件结果所需的此类分类器的性能。
    It has been widely argued among researchers that the application of big data analytics promises to reduce human bias and provide a scientific and evidence-based approach to the judicial process. In this dataset, historical data consisting of appeal cases presented at the Supreme Court of Nigeria (SCN) were collected from an online repository (Primsol Law Pavillion). A total of 5585 appeal cases brought before the SCN were collected from the archive. The dataset consisted of both criminal and civil appeal cases brought before the SCN. Variables that are related to court case proceedings were identified from related literature, verified by legal experts and used as a basis for generating an electronic structured version of the dataset stored as a spreadsheet file from the unstructured data. From the collected data, thirteen input variables were identified with one output/decision variable. The distribution of the numerical variables was presented as a descriptive statistical summary in terms of the minimum, maximum, mode, mean and standard deviation. The developed dataset can assist researchers to build predictive systems by training their models. Various feature extraction techniques can also be applied on the dataset to remove irrelevant or redundant features for increased performance of such classifiers that are needed to predict the outcome of legal cases.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号