分类算法 Classification Algorithms-医云文献数字医云科研云海量医学决策数据服务

Classification Algorithms 关注

分类算法

文献(54篇)

百科

视频

1 Beehive Smart Detector Device for the Detection of Critical Conditions That Utilize Edge Device Computations and Deep Learning Inferences.

蜂巢智能检测器设备，用于检测利用边缘设备计算和深度学习推理的关键条件。影响指数 : 3.847
发表时间：Aug 2024 22
来源期刊：Sensors (Basel) PMID：39205138

DOI：10.3390/s24165444
文章类型： Journal Article

本文提出了一种在称为蜜蜂智能检测节点的嵌入式物联网设备中实现的新边缘检测过程，以检测灾难性的养蜂场事件。这些事件包括蜂拥而至，失去女王,以及对蜂群崩溃障碍(CCD)条件的检测。为此使用了两个深度学习子过程。第一种使用称为fuzzy-stranded-NN的可变深度的模糊多层神经网络，基于蜂箱内部的温度和湿度测量来检测CCD条件。第二个利用深度学习CNN模型来检测基于录音的蜂拥和女王丢失案例。所提出的过程已被实施到自主蜜蜂智能检测物联网设备中，这些设备通过Wi-Fi将其测量和检测结果传输到云。BeeSD设备已经过测试，易于使用的功能，自主运作,深度学习模型推理精度，和推理执行速度。作者介绍了用于检测临界条件的模糊链NN模型和用于检测蜂群和女王损失的深度学习CNN模型的实验结果。从给出的实验结果来看，绞合NN实现了高达95%的准确度结果，而ResNet-50模型在检测蜂群或女王丢失事件方面的准确率高达99％。ResNet-18模型也是ResNet-50模型的最快推理速度的替代品，实现高达93%的准确度结果。最后,深度学习模型与机器学习模型的交叉比较表明，深度学习模型可以提供至少3-5％的准确性结果。
This paper presents a new edge detection process implemented in an embedded IoT device called Bee Smart Detection node to detect catastrophic apiary events. Such events include swarming, queen loss, and the detection of Colony Collapse Disorder (CCD) conditions. Two deep learning sub-processes are used for this purpose. The first uses a fuzzy multi-layered neural network of variable depths called fuzzy-stranded-NN to detect CCD conditions based on temperature and humidity measurements inside the beehive. The second utilizes a deep learning CNN model to detect swarming and queen loss cases based on sound recordings. The proposed processes have been implemented into autonomous Bee Smart Detection IoT devices that transmit their measurements and the detection results to the cloud over Wi-Fi. The BeeSD devices have been tested for easy-to-use functionality, autonomous operation, deep learning model inference accuracy, and inference execution speeds. The author presents the experimental results of the fuzzy-stranded-NN model for detecting critical conditions and deep learning CNN models for detecting swarming and queen loss. From the presented experimental results, the stranded-NN achieved accuracy results up to 95%, while the ResNet-50 model presented accuracy results up to 99% for detecting swarming or queen loss events. The ResNet-18 model is also the fastest inference speed replacement of the ResNet-50 model, achieving up to 93% accuracy results. Finally, cross-comparison of the deep learning models with machine learning ones shows that deep learning models can provide at least 3-5% better accuracy results.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
2 The classification algorithms to support the management of the patient with femur fracture.

支持股骨骨折患者管理的分类算法。影响指数 : 4.612
发表时间：Jul 2024 16
来源期刊：BMC Med Res Methodol PMID：39014322

DOI：10.1186/s12874-024-02276-5
文章类型： Journal Article

医疗保健的有效性是每个干预措施和评估结果的特定特征。特别是在外科手术方面，组织,结构和过程在确定此参数中起着关键作用。此外,根据定义，医疗保健服务在资源有限的情况下运作，因此，服务组织的合理化成为医疗保健管理的首要目标。这个方面对于那些有大量的手术服务变得更加相关。因此,为了支持和优化接受外科手术的患者的管理，数据分析可以发挥重要作用。为此,在这项研究中，使用不同的分类算法来描述股骨颈骨折手术患者的过程。这些模型显示出显著的准确性，值为81%，和参数，如贫血和性别被证明是确定的危险因素，患者的住院时间。鉴于其支持股骨颈骨折住院过程的管理和优化的能力，对实施模型的预测能力进行了评估和讨论。并与不同的模型进行比较，以找出最有前途的算法。最后,人工智能算法的支持，为医疗从业者构建更准确的决策支持工具奠定基础。
Effectiveness in health care is a specific characteristic of each intervention and outcome evaluated. Especially with regard to surgical interventions, organization, structure and processes play a key role in determining this parameter. In addition, health care services by definition operate in a context of limited resources, so rationalization of service organization becomes the primary goal for health care management. This aspect becomes even more relevant for those surgical services for which there are high volumes. Therefore, in order to support and optimize the management of patients undergoing surgical procedures, the data analysis could play a significant role. To this end, in this study used different classification algorithms for characterizing the process of patients undergoing surgery for a femoral neck fracture. The models showed significant accuracy with values of 81%, and parameters such as Anaemia and Gender proved to be determined risk factors for the patient\'s length of stay. The predictive power of the implemented model is assessed and discussed in view of its capability to support the management and optimisation of the hospitalisation process for femoral neck fracture, and is compared with different model in order to identify the most promising algorithms. In the end, the support of artificial intelligence algorithms laying the basis for building more accurate decision-support tools for healthcare practitioners.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
3 Machine Learning Tools to Assist the Synthesis of Antibacterial Carbon Dots.

帮助合成抗菌碳点的机器学习工具。影响指数 : 暂无
发表时间：2024
来源期刊：Int J Nanomedicine PMID：38855729

DOI：10.2147/IJN.S451680
文章类型： Journal Article

■由抗生素的过度使用和生物膜的发展引起的多药耐药细菌（MRB）的出现和迅速传播，对全球公共卫生构成了越来越大的威胁。纳米颗粒作为抗生素的替代品被证明具有通过新的抗微生物机制应对MRB感染的实质性能力。特别是,具有独特（生物）物理化学特性的碳点（CD）在通过破坏细菌壁来对抗MRB方面受到了相当大的关注，与DNA或酶结合，局部诱导高温，或形成活性氧。
■这里，在机器学习（ML）工具的帮助下，研究了各种CD的物理化学特征如何影响其抗菌能力。
■首先收集来自121个样品的CD的合成条件和固有特性，以形成原始数据集，以最小抑制浓度（MIC）为输出。四种分类算法(KNN，SVM,射频,和XGBoost)用输入数据进行训练和验证。发现集成学习方法在我们的数据上是最好的。此外，开发了ε-聚（L-赖氨酸）CD（PL-CD），以验证经过良好训练的ML模型在实验室中的实际应用能力，该模型具有两个管理预测的集成模型。
■因此，我们的结果表明，基于ML的高通量理论计算可用于预测和解码CD特性与抗菌效果之间的关系，加速高性能纳米粒子的开发和潜在的临床翻译。
UNASSIGNED: The emergence and rapid spread of multidrug-resistant bacteria (MRB) caused by the excessive use of antibiotics and the development of biofilms have been a growing threat to global public health. Nanoparticles as substitutes for antibiotics were proven to possess substantial abilities for tackling MRB infections via new antimicrobial mechanisms. Particularly, carbon dots (CDs) with unique (bio)physicochemical characteristics have been receiving considerable attention in combating MRB by damaging the bacterial wall, binding to DNA or enzymes, inducing hyperthermia locally, or forming reactive oxygen species.
UNASSIGNED: Herein, how the physicochemical features of various CDs affect their antimicrobial capacity is investigated with the assistance of machine learning (ML) tools.
UNASSIGNED: The synthetic conditions and intrinsic properties of CDs from 121 samples are initially gathered to form the raw dataset, with Minimum inhibitory concentration (MIC) being the output. Four classification algorithms (KNN, SVM, RF, and XGBoost) are trained and validated with the input data. It is found that the ensemble learning methods turn out to be the best on our data. Also, ε-poly(L-lysine) CDs (PL-CDs) were developed to validate the practical application ability of the well-trained ML models in a laboratory with two ensemble models managing the prediction.
UNASSIGNED: Thus, our results demonstrate that ML-based high-throughput theoretical calculation could be used to predict and decode the relationship between CD properties and the anti-bacterial effect, accelerating the development of high-performance nanoparticles and potential clinical translation.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
4 Performances of Machine Learning Models for Diagnosis of Alzheimer's Disease.

用于诊断阿尔茨海默病的机器学习模型的性能。影响指数 : 暂无
发表时间：Oct 2022 17
来源期刊：Ann Data Sci PMID：38625305

DOI：10.1007/s40745-022-00452-2
文章类型： Journal Article

最近,各种机器学习方法已被广泛用于有效诊断和预测癌症等疾病，甲状腺，Covid-19等。同样，阿尔茨海默病（AD）也是一种进行性疾病，随着时间的推移会破坏记忆和认知功能。不幸的是,没有专门的基于AI的AD诊断解决方案与医疗诊断齐头并进，尽管多种因素有助于诊断，使AI成为非常可行的辅助诊断解决方案。本文报告了应用各种机器学习算法的努力，如SGD，k-最近的邻居,Logistic回归，决策树,随机森林,AdaBoost,神经网络,SVM,和朴素贝叶斯对受影响受害者的数据集进行诊断阿尔茨海默病。来自OASIS数据集的受试者的纵向集合已用于预测。此外，一些特征选择和降维方法，如信息增益，信息增益比，基尼系数,卡方，和PCA用于对不同因素进行排序，并从数据集中确定用于疾病诊断的最佳因素数。此外，根据ROC-AUC评估每个分类器的性能，准确度,F1得分,召回，和精度，以及包括算法之间的比较分析。我们的研究表明，在最高评级的四个功能CDR下观察到大约90%的分类准确率，SES,nWBV,和EDUC。
In recent times, various machine learning approaches have been widely employed for effective diagnosis and prediction of diseases like cancer, thyroid, Covid-19, etc. Likewise, Alzheimer\'s (AD) is also one progressive malady that destroys memory and cognitive function over time. Unfortunately, there are no dedicated AI-based solutions for diagnoses of AD to go hand in hand with medical diagnosis, even though multiple factors contribute to the diagnosis, making AI a very viable supplementary diagnostic solution. This paper reports an endeavor to apply various machine learning algorithms like SGD, k-Nearest Neighbors, Logistic Regression, Decision tree, Random Forest, AdaBoost, Neural Network, SVM, and Naïve Bayes on the dataset of affected victims to diagnose Alzheimer\'s disease. Longitudinal collections of subjects from OASIS dataset have been used for prediction. Moreover, some feature selection and dimension reduction methods like Information Gain, Information Gain Ratio, Gini index, Chi-Squared, and PCA are applied to rank different factors and identify the optimum number of factors from the dataset for disease diagnosis. Furthermore, performance is evaluated of each classifier in terms of ROC-AUC, accuracy, F1 score, recall, and precision as well as included comparative analysis between algorithms. Our study suggests that approximately 90% classification accuracy is observed under top-rated four features CDR, SES, nWBV, and EDUC.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
5 Genetic justification of COVID-19 patient outcomes using DERGA, a novel data ensemble refinement greedy algorithm.

使用 DERGA 对 COVID - 19 患者结局的遗传证明，一种新颖的数据集成细化贪婪算法。影响指数 : 5.295
发表时间：02 2024
来源期刊：J Cell Mol Med PMID：38339761

DOI：10.1111/jcmm.18105
文章类型： Journal Article

补体抑制在各种疾病中显示出希望，包括COVID-19。包括补体遗传变异的预测工具至关重要。这项研究旨在确定关键的补体相关变异，并确定准确预测疾病结果的最佳模式。使用基于人工智能的算法分析了2020年4月至2021年4月在三个转诊中心住院的204例COVID-19患者的遗传数据，以预测疾病结局（ICU与非ICU入院)。最近引入的α指数确定了30种最具预测性的遗传变异。DERGA算法,采用多种分类算法，确定了这些关键变体的最佳模式，预测疾病结果的准确率为97%。每个患者的个体差异从40到161个变异，检测到977种变体。这项研究证明了α指数在对大量遗传变异进行排名中的实用性。这种方法能够实现完善的分类算法，有效地确定遗传变异在高精度预测结果中的相关性。
Complement inhibition has shown promise in various disorders, including COVID-19. A prediction tool including complement genetic variants is vital. This study aims to identify crucial complement-related variants and determine an optimal pattern for accurate disease outcome prediction. Genetic data from 204 COVID-19 patients hospitalized between April 2020 and April 2021 at three referral centres were analysed using an artificial intelligence-based algorithm to predict disease outcome (ICU vs. non-ICU admission). A recently introduced alpha-index identified the 30 most predictive genetic variants. DERGA algorithm, which employs multiple classification algorithms, determined the optimal pattern of these key variants, resulting in 97% accuracy for predicting disease outcome. Individual variations ranged from 40 to 161 variants per patient, with 977 total variants detected. This study demonstrates the utility of alpha-index in ranking a substantial number of genetic variants. This approach enables the implementation of well-established classification algorithms that effectively determine the relevance of genetic variants in predicting outcomes with high accuracy.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
6 How effective is machine learning in stock market predictions?

机器学习在股市预测中的效果如何？影响指数 : 3.776
发表时间：Jan 2024 30
来源期刊：Heliyon PMID：38293519

DOI：10.1016/j.heliyon.2024.e24123
文章类型： Journal Article

在这项研究中,它旨在通过使用机器学习算法（MLM）预测发达国家股市指数的运动方向并确定最佳估计算法来比较算法的性能。为此,纽约证券交易所100指数(美国)等指数的运动方向，NIKKEI225(日本),FTSE100（英国），CAC40(法国),DAX30(德国),FTSEMIB（意大利），和TSX(加拿大)通过使用决策树进行估计，随机森林k-近邻，天真的贝叶斯,逻辑回归，支持向量机和人工神经网络算法。根据获得的结果，人工神经网络被发现是纽约证券交易所100、FTSE100、DAX30和FTSEMIB指数的最佳算法，而逻辑回归被确定为NIKKEI225，CAC40和TSX指数的最佳算法。人工神经网络，表现出最高的平均预测性能，已被确定为发达国家股市指数的最佳预测算法。人们还指出，人工神经网络，逻辑回归，和支持向量机算法能够预测所有指标的方向运动，准确率超过70％。
In this study, it is aimed to compare the performances of the algorithms by predicting the movement directions of stock market indexes in developed countries by employing machine learning algorithms (MLMs) and determining the best estimation algorithm. For this purpose, the movement directions of indexes such as the NYSE 100 (the USA), NIKKEI 225 (Japan), FTSE 100 (the UK), CAC 40 (France), DAX 30 (Germany), FTSE MIB (Italy), and TSX (Canada) were estimated by employing the decision tree, random forest k-nearest neighbor, naive Bayes, logistic regression, support vector machines and artificial neural network algorithms. According to the results obtained, artificial neural networks were found to be the best algorithm for NYSE 100, FTSE 100, DAX 30 and FTSE MIB indices, while logistic regression was determined to be the best algorithm for the NIKKEI 225, CAC 40, and TSX indices. The artificial neural networks, which exhibited the highest average prediction performance, have been determined as the best prediction algorithm for the stock market indices of developed countries. It was also noted that artificial neural networks, logistic regression, and support vector machines algorithms were capable of predicting the directional movements of all indices with an accuracy rate of over 70 %.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
7 Flicker Noise in Resistive Gas Sensors-Measurement Setups and Applications for Enhanced Gas Sensing.

阻性气体传感器中的闪烁噪声 - 增强型气体传感的测量设置和应用。影响指数 : 3.847
发表时间：Jan 2024 9
来源期刊：Sensors (Basel) PMID：38257498

DOI：10.3390/s24020405
文章类型： Journal Article

我们讨论了基于化学电阻传感器上的低频噪声测量的气体传感系统的实施挑战。各种气体传感材料的电阻波动，在通常高达几kHz的频率范围内，可以通过考虑其强度和功率谱密度的斜率来增强气体传感。电阻式气体传感器中的低频噪声测量问题,特别是在具有气体传感特性的二维材料中，被考虑。我们介绍了气体检测的测量设置和噪声处理方法。化学电阻传感器示出了需要不同闪烁噪声测量方法的各种DC电阻。单独的噪声测量设置用于高达几百kΩ的电阻和具有高得多的值的电阻。高电阻材料中的噪声测量(例如，MoS2，WS2和ZrS3）易于受到外部干扰，但可以使用温度或光照射进行调制以增强感测。因此,这样的材料对于气体感测是相当感兴趣的。
We discuss the implementation challenges of gas sensing systems based on low-frequency noise measurements on chemoresistive sensors. Resistance fluctuations in various gas sensing materials, in a frequency range typically up to a few kHz, can enhance gas sensing by considering its intensity and the slope of power spectral density. The issues of low-frequency noise measurements in resistive gas sensors, specifically in two-dimensional materials exhibiting gas-sensing properties, are considered. We present measurement setups and noise-processing methods for gas detection. The chemoresistive sensors show various DC resistances requiring different flicker noise measurement approaches. Separate noise measurement setups are used for resistances up to a few hundred kΩ and for resistances with much higher values. Noise measurements in highly resistive materials (e.g., MoS2, WS2, and ZrS3) are prone to external interferences but can be modulated using temperature or light irradiation for enhanced sensing. Therefore, such materials are of considerable interest for gas sensing.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
8 Decision analysis framework for predicting no-shows to appointments using machine learning algorithms.

使用机器学习算法预测未出席约会的决策分析框架。影响指数 : 2.908
发表时间：Jan 2024 5
来源期刊：BMC Health Serv Res PMID：38183029

DOI：10.1186/s12913-023-10418-6
文章类型： Journal Article

背景：不参加医疗预约会对医疗保健系统及其客户产生重大不利影响。使用机器学习来预测不出现允许管理者实施策略，例如针对最有可能错过预约的患者的超额预订和提醒。优化资源使用。
方法：在本研究中，我们提出了一个详细的分析框架，用于在解决不平衡数据集的同时预测不显示。该框架包括在建模过程中执行两次的z-fold交叉验证的新颖使用，以提高模型的鲁棒性和泛化性。我们还引入了符号回归（SR）作为分类算法和实例硬度阈值（IHT）作为重采样技术，并将其性能与其他分类算法进行了比较。如K近邻(KNN)和支持向量机(SVM)，和重采样技术，例如随机抽样(RUS)，合成少数过采样技术(SMOTE)和NearMiss-1。我们使用来自巴西医院的两个就诊数据集验证了该框架，未显示率为6.65％和19.03％。
结果：从学术角度来看，我们的研究首次提出使用SR和IHT来预测患者的未出现.我们的发现表明，与其他技术相比，SR和IHT表现出优异的性能，特别是IHT，与所有分类算法结合使用时表现优异，并导致性能指标结果的低可变性。我们的结果也优于文献中报道的敏感性结果，两个数据集的值都高于0.94。
结论：这是第一个使用SR和IHT方法预测患者未出现的研究，并且是第一个提出进行两次z折交叉验证的研究。我们的研究强调了避免对不平衡数据集进行少量验证运行的重要性，因为这可能会导致有偏见的结果以及对训练阶段获得的模型的泛化和稳定性的分析不足。
BACKGROUND: No-show to medical appointments has significant adverse effects on healthcare systems and their clients. Using machine learning to predict no-shows allows managers to implement strategies such as overbooking and reminders targeting patients most likely to miss appointments, optimizing the use of resources.
METHODS: In this study, we proposed a detailed analytical framework for predicting no-shows while addressing imbalanced datasets. The framework includes a novel use of z-fold cross-validation performed twice during the modeling process to improve model robustness and generalization. We also introduce Symbolic Regression (SR) as a classification algorithm and Instance Hardness Threshold (IHT) as a resampling technique and compared their performance with that of other classification algorithms, such as K-Nearest Neighbors (KNN) and Support Vector Machine (SVM), and resampling techniques, such as Random under Sampling (RUS), Synthetic Minority Oversampling Technique (SMOTE) and NearMiss-1. We validated the framework using two attendance datasets from Brazilian hospitals with no-show rates of 6.65% and 19.03%.
RESULTS: From the academic perspective, our study is the first to propose using SR and IHT to predict the no-show of patients. Our findings indicate that SR and IHT presented superior performances compared to other techniques, particularly IHT, which excelled when combined with all classification algorithms and led to low variability in performance metrics results. Our results also outperformed sensitivity outcomes reported in the literature, with values above 0.94 for both datasets.
CONCLUSIONS: This is the first study to use SR and IHT methods to predict patient no-shows and the first to propose performing z-fold cross-validation twice. Our study highlights the importance of avoiding relying on few validation runs for imbalanced datasets as it may lead to biased results and inadequate analysis of the generalization and stability of the models obtained during the training stage.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
9 A dataset for predicting Supreme Court judgments in Nigeria.

用于预测尼日利亚最高法院判决的数据集。影响指数 : 暂无
发表时间：Oct 2023
来源期刊：Data Brief PMID：37588617

DOI：10.1016/j.dib.2023.109483
文章类型： Journal Article

研究人员普遍认为，大数据分析的应用有望减少人类偏见，并为司法程序提供科学和基于证据的方法。在这个数据集中，由尼日利亚最高法院（SCN）提出的上诉案件组成的历史数据是从在线存储库（PrimsolLawPavillion）收集的。从档案中收集了总共5585起上诉案件提交给SCN。该数据集包括提交给SCN的刑事和民事上诉案件。从相关文献中确定了与法院案件程序相关的变量，由法律专家验证，并用作根据非结构化数据生成存储为电子表格文件的数据集的电子结构化版本的基础。从收集的数据来看,用一个输出/决策变量确定了13个输入变量。数值变量的分布以最小的描述性统计摘要表示，最大值,mode,平均值和标准偏差。开发的数据集可以帮助研究人员通过训练他们的模型来构建预测系统。还可以在数据集上应用各种特征提取技术以去除不相关或冗余的特征,以提高预测法律案件结果所需的此类分类器的性能。
It has been widely argued among researchers that the application of big data analytics promises to reduce human bias and provide a scientific and evidence-based approach to the judicial process. In this dataset, historical data consisting of appeal cases presented at the Supreme Court of Nigeria (SCN) were collected from an online repository (Primsol Law Pavillion). A total of 5585 appeal cases brought before the SCN were collected from the archive. The dataset consisted of both criminal and civil appeal cases brought before the SCN. Variables that are related to court case proceedings were identified from related literature, verified by legal experts and used as a basis for generating an electronic structured version of the dataset stored as a spreadsheet file from the unstructured data. From the collected data, thirteen input variables were identified with one output/decision variable. The distribution of the numerical variables was presented as a descriptive statistical summary in terms of the minimum, maximum, mode, mean and standard deviation. The developed dataset can assist researchers to build predictive systems by training their models. Various feature extraction techniques can also be applied on the dataset to remove irrelevant or redundant features for increased performance of such classifiers that are needed to predict the outcome of legal cases.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
10 Accelerating the Screening of Small Peptide Ligands by Combining Peptide-Protein Docking and Machine Learning.

通过结合肽 - 蛋白质对接和机器学习来加速小肽配体的筛选。影响指数 : 6.208
发表时间：Jul 2023 29
来源期刊：Int J Mol Sci PMID：37569520

DOI：10.3390/ijms241512144
文章类型： Journal Article

这项研究引入了一种耦合机器学习(ML)的新颖管道，和分子对接，通过肽-蛋白质对接的预测来加速小肽配体筛选的过程。分析了八种ML算法的潜力。值得注意的是,光梯度升压机（LightGBM），尽管F1分数和准确性与同行相当，展示了卓越的计算效率。使用LightGBM对160,000个肽配体的整个四肽文库针对四种病毒包膜蛋白的肽-蛋白质对接性能进行分类。图书馆分为两组，\'较好的执行者\'和\'较差的执行者\'。通过仅在1％的四肽文库上训练LightGBM算法，我们成功地对其余99%进行了分类,准确率范围为0.81-0.85,F1评分为0.58-0.67.使用三种不同的分子对接软件来证明该过程不依赖于软件。使用可调整的概率阈值（从0.5到0.95），该过程可以加速至少10倍，并且与没有ML的方法仍然获得90-95％的一致性。这项研究验证了机器学习与分子对接相结合，在不依赖高性能计算能力的情况下快速识别顶级肽的效率，使其成为筛选潜在生物活性化合物的有效工具。
This research introduces a novel pipeline that couples machine learning (ML), and molecular docking for accelerating the process of small peptide ligand screening through the prediction of peptide-protein docking. Eight ML algorithms were analyzed for their potential. Notably, Light Gradient Boosting Machine (LightGBM), despite having comparable F1-score and accuracy to its counterparts, showcased superior computational efficiency. LightGBM was used to classify peptide-protein docking performance of the entire tetrapeptide library of 160,000 peptide ligands against four viral envelope proteins. The library was classified into two groups, \'better performers\' and \'worse performers\'. By training the LightGBM algorithm on just 1% of the tetrapeptide library, we successfully classified the remaining 99%with an accuracy range of 0.81-0.85 and an F1-score between 0.58-0.67. Three different molecular docking software were used to prove that the process is not software dependent. With an adjustable probability threshold (from 0.5 to 0.95), the process could be accelerated by a factor of at least 10-fold and still get 90-95% concurrence with the method without ML. This study validates the efficiency of machine learning coupled to molecular docking in rapidly identifying top peptides without relying on high-performance computing power, making it an effective tool for screening potential bioactive compounds.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)

Classification Algorithms 关注

1 Beehive Smart Detector Device for the Detection of Critical Conditions That Utilize Edge Device Computations and Deep Learning Inferences.

2 The classification algorithms to support the management of the patient with femur fracture.

3 Machine Learning Tools to Assist the Synthesis of Antibacterial Carbon Dots.

4 Performances of Machine Learning Models for Diagnosis of Alzheimer's Disease.

5 Genetic justification of COVID-19 patient outcomes using DERGA, a novel data ensemble refinement greedy algorithm.

6 How effective is machine learning in stock market predictions?

7 Flicker Noise in Resistive Gas Sensors-Measurement Setups and Applications for Enhanced Gas Sensing.

8 Decision analysis framework for predicting no-shows to appointments using machine learning algorithms.

9 A dataset for predicting Supreme Court judgments in Nigeria.

10 Accelerating the Screening of Small Peptide Ligands by Combining Peptide-Protein Docking and Machine Learning.