Support vector machine (SVM)

支持向量机
  • 文章类型: Journal Article
    膀胱癌(BC)是一种常见的疾病,具有很高的发病风险,死亡,和费用。BC的主要危险因素包括在工作场所或环境中暴露于致癌物质,尤其是烟草。有几个困难,例如对BC分类中合格专家的要求。鹦鹉优化器(PO),是一种优化方法,灵感来自在训练有素的PyrrhuraMolinae鹦鹉中观察到的关键行为,但是PO算法卡在子区域中,准确性较低,和高错误率。所以,使用两种策略的组合开发了PO(IPO)算法的改进变体:(1)镜像反射学习(MRL)和(2)伯努利图(BernoulliMaps)。两种策略都通过避免局部最优并在收敛速度和解决方案多样性之间达成折衷来提高优化性能。根据2022年IEEE进化计算大会(CEC2022)的Friedman测试和Bonferroni-Dunn测试,在统计收敛性和其他指标方面,针对其他八种竞争对手算法评估了拟议IPO的性能。功能和来自官方存储库的九个BC数据集。IPO算法在最佳适应度中排名第一,并且比其他八个MH算法在CEC2022函数中更优。所提出的IPO算法与称为(IPO-SVM)方法的支持向量机(SVM)分类器集成以用于膀胱癌分类目的。然后使用9个BC数据集来确认所提出的IPO算法的有效性。实验表明,IPO-SVM方法优于最近提出的8种MH算法。使用九个BC数据集,IPO-SVM实现了84.11%的准确率(ACC),灵敏度(SE)98.10%,精度(PPV)为95.59%,特异性(SP)95.98%,F评分(F1)为94.15%。这证明了拟议的IPO方法如何有助于有效地对BCs进行分类。开源代码可在https://www上获得。mathworks.com/matlabcentral/fileexchange/169846-an-efficient-improved-parrot-optimizer.
    Bladder Cancer (BC) is a common disease that comes with a high risk of morbidity, death, and expense. Primary risk factors for BC include exposure to carcinogens in the workplace or the environment, particularly tobacco. There are several difficulties, such as the requirement for a qualified expert in BC classification. The Parrot Optimizer (PO), is an optimization method inspired by key behaviors observed in trained Pyrrhura Molinae parrots, but the PO algorithm becomes stuck in sub-regions, has less accuracy, and a high error rate. So, an Improved variant of the PO (IPO) algorithm was developed using a combination of two strategies: (1) Mirror Reflection Learning (MRL) and (2) Bernoulli Maps (BMs). Both strategies improve optimization performance by avoiding local optimums and striking a compromise between convergence speed and solution diversity. The performance of the proposed IPO is evaluated against eight other competitor algorithms in terms of statistical convergence and other metrics according to Friedman\'s test and Bonferroni-Dunn test on the IEEE Congress on Evolutionary Computation conducted in 2022 (CEC 2022) test suite functions and nine BC datasets from official repositories. The IPO algorithm ranked number one in best fitness and is more optimal than the other eight MH algorithms for CEC 2022 functions. The proposed IPO algorithm was integrated with the Support Vector Machine (SVM) classifier termed (IPO-SVM) approach for bladder cancer classification purposes. Nine BC datasets were then used to confirm the effectiveness of the proposed IPO algorithm. The experiments show that the IPO-SVM approach outperforms eight recently proposed MH algorithms. Using the nine BC datasets, IPO-SVM achieved an Accuracy (ACC) of 84.11%, Sensitivity (SE) of 98.10%, Precision (PPV) of 95.59%, Specificity (SP) of 95.98%, and F-score (F1) of 94.15%. This demonstrates how the proposed IPO approach can help to classify BCs effectively. The open-source codes are available at https://www.mathworks.com/matlabcentral/fileexchange/169846-an-efficient-improved-parrot-optimizer.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    裂缝是混凝土表面的常见问题。随着基于机器视觉的检测系统的不断优化,有效的裂纹检测与识别是整个系统的核心。在这项研究中,支持向量机(SVM)用于区分裂缝与其他区域。为了完成SVM的识别系统,提出了一个由图像处理和识别模型组成的框架。提出了一种将Prewitt算子与Otsu阈值相结合的图像分割算法。新算法结合数学形态学处理的二值图像可以得到更完整的裂纹区域和更少的干涉区域。初始参数提取后,大多数杂质区域都是通过初步区分来筛选的,去除99%的杂质。该处理步骤确保了样品的平衡和有效性。建立基于径向基函数支持向量机的自动识别模型,紧密度,入住率,在将这三个特征与裂缝的所有六个特征进行比较后,选择了长宽比作为输入参数。该系统的识别准确率达到97.14%,证明了所提出的方法是有效的,满足了实际需求。
    Cracks are a common problem in concrete surfaces. With the continuous optimization of machine vision-based inspection systems, effective crack detection and recognition is the core of the entire system. In this study, support vector machine (SVM) was used to distinguish cracks from other regions. To complete the recognition system of the SVM, a framework consisting of an image processing and recognition model was proposed. An algorithm combining the Prewitt operator with the Otsu threshold was proposed for image segmentation. The binary image processed by the new algorithm combined with mathematical morphology can result in a more complete crack zone and fewer interference regions. After the initial parameter extraction, most of the impurity areas were screened by preliminary discrimination, removing 99% of the impurities. This processing step ensured the balance and effectiveness of the samples. To establish an automatic identification model based on SVM with a radial basis function, compactness, occupancy rate, and length-width ratio were selected as input parameters after comparing these three features with all six features of the crack. The recognition accuracy of this system reaches 97.14%, demonstrating that the proposed method is effective and satisfies practical requirements.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    多因素疾病需要可以调节多个靶标以增强安全性和有效性的疗法。然而,多靶点药物的临床批准仍然很少。机器学习(ML)和深度学习(DL)在药物发现中的集成彻底改变了虚拟筛选。本研究调查了ML/DL方法之间的协同作用,分子表征,和数据增强策略。值得注意的是,我们发现SVM可以匹配甚至超越最先进的DL方法的性能。然而,传统的数据增强通常涉及真阳性率和假阳性率之间的权衡。为了解决这个问题,我们引入了负增强PU-bagging(NAPU-bagging)SVM,一种新颖的半监督学习框架。通过利用在包含阳性的重采样袋上训练的集成SVM分类器,负,和未标记的数据,我们的方法能够管理假阳性率,同时保持高召回率。我们将这种方法应用于多靶定向配体(MTDL)的鉴定,其中高召回率对于编制相互作用候选化合物列表至关重要。案例研究表明,NAPU-baggingSVM可以识别具有良好对接分数和结合模式的ALK-EGFR的结构新颖MTDL命中。以及多巴胺受体的泛激动剂。NAPU-baggingSVM方法应作为虚拟筛选的有希望的途径,特别是对于MTDL的发现。
    Multifactorial diseases demand therapeutics that can modulate multiple targets for enhanced safety and efficacy, yet the clinical approval of multitarget drugs remains rare. The integration of machine learning (ML) and deep learning (DL) in drug discovery has revolutionized virtual screening. This study investigates the synergy between ML/DL methodologies, molecular representations, and data augmentation strategies. Notably, we found that SVM can match or even surpass the performance of state-of-the-art DL methods. However, conventional data augmentation often involves a trade-off between the true positive rate and false positive rate. To address this, we introduce Negative-Augmented PU-bagging (NAPU-bagging) SVM, a novel semi-supervised learning framework. By leveraging ensemble SVM classifiers trained on resampled bags containing positive, negative, and unlabeled data, our approach is capable of managing false positive rates while maintaining high recall rates. We applied this method to the identification of multitarget-directed ligands (MTDLs), where high recall rates are critical for compiling a list of interaction candidate compounds. Case studies demonstrate that NAPU-bagging SVM can identify structurally novel MTDL hits for ALK-EGFR with favorable docking scores and binding modes, as well as pan-agonists for dopamine receptors. The NAPU-bagging SVM methodology should serve as a promising avenue to virtual screening, especially for the discovery of MTDLs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:我们旨在比较三种不同的影像组学模型的性能(逻辑回归(LR),随机森林(RF),和支持向量机(SVM))和临床列线图(Briganti,MSKCC,耶鲁,和Roach)用于预测前列腺癌(PCa)患者的淋巴结受累(LNI)。
    方法:回顾性研究包括95例接受了mp-MRI和根治性前列腺切除术并进行盆腔淋巴结清扫的患者。成像数据(T2中的强度,DWI,ADC,andPIRADS),临床数据(年龄和MRI前PSA),组织学数据(格里森评分,TNM分期,组织学类型,胶囊侵入,精囊侵入,和神经血管束受累),和临床列线图(耶鲁,罗奇,MSKCC,和Briganti)为每位患者收集。使用开源程序(3DSLICER)对每位患者进行索引病变的手动分割。使用Pyradiomics文库为每个序列(T2,DWI,和ADC)。然后选择这些特征,并用于训练和测试三种不同的影像组学模型(LR,射频,和SVM)独立使用ChatGPT软件(v4o)。计算每个特征的系数值(系数的显著值≥±0.5)。使用准确性和曲线下面积(AUC)(p≤0.05的显著性值)评估影像组学模型和临床列线图的预测性能。因此,比较了影像组学和临床模型之间的诊断准确性.
    结果:本研究确定每位患者343个特征(330个影像组学特征和13个临床特征)。最显着的特征是T2_nodulofirstordervariance和T2_nodulofirstorderkosis。具有DWI的RF模型实现了最高的预测性能(准确率86%,AUC0.89)和ADC(精度89%,AUC0.67)。与DWI序列中的RF模型相比,临床列线图显示出令人满意但较低的预测性能。
    结论:在使用集成数据(影像组学和语义)开发的预测模型中,与PCa淋巴结受累预测中的临床列线图相比,RF在AUC方面显示出略高的诊断准确性。
    OBJECTIVE: We aim to compare the performance of three different radiomics models (logistic regression (LR), random forest (RF), and support vector machine (SVM)) and clinical nomograms (Briganti, MSKCC, Yale, and Roach) for predicting lymph node involvement (LNI) in prostate cancer (PCa) patients.
    METHODS: The retrospective study includes 95 patients who underwent mp-MRI and radical prostatectomy for PCa with pelvic lymphadenectomy. Imaging data (intensity in T2, DWI, ADC, and PIRADS), clinical data (age and pre-MRI PSA), histological data (Gleason score, TNM staging, histological type, capsule invasion, seminal vesicle invasion, and neurovascular bundle involvement), and clinical nomograms (Yale, Roach, MSKCC, and Briganti) were collected for each patient. Manual segmentation of the index lesions was performed for each patient using an open-source program (3D SLICER). Radiomic features were extracted for each segmentation using the Pyradiomics library for each sequence (T2, DWI, and ADC). The features were then selected and used to train and test three different radiomics models (LR, RF, and SVM) independently using ChatGPT software (v 4o). The coefficient value of each feature was calculated (significant value for coefficient ≥ ±0.5). The predictive performance of the radiomics models and clinical nomograms was assessed using accuracy and area under the curve (AUC) (significant value for p ≤ 0.05). Thus, the diagnostic accuracy between the radiomics and clinical models were compared.
    RESULTS: This study identified 343 features per patient (330 radiomics features and 13 clinical features). The most significant features were T2_nodulofirstordervariance and T2_nodulofirstorderkurtosis. The highest predictive performance was achieved by the RF model with DWI (accuracy 86%, AUC 0.89) and ADC (accuracy 89%, AUC 0.67). Clinical nomograms demonstrated satisfactory but lower predictive performance compared to the RF model in the DWI sequences.
    CONCLUSIONS: Among the prediction models developed using integrated data (radiomics and semantics), RF shows slightly higher diagnostic accuracy in terms of AUC compared to clinical nomograms in PCa lymph node involvement prediction.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    洪水是近几十年来频率迅速增加的自然灾害之一。洪水造成的破坏,包括人员和财务损失,对人类生命构成严重威胁。本研究评估了伊朗Gamasyab流域的两种用于洪水敏感性映射(FSM)的机器学习(ML)技术。我们利用随机森林(RF),支持向量机(SVM),合奏模型,和地理信息系统(GIS)来预测FSM。这些模型的应用涉及洪水中的10个有效因素,以及整合到GIS中的82个洪水地点。对SVM和RF模型进行了训练和测试,然后在三个重复中使用引导和子采样方法实现重采样技术(RT)。结果强调了海拔的重要性,斜坡,和降水是影响洪水发生的主要因素。此外,集成模型的性能优于RF和SVM模型,在测试阶段,曲线下面积(AUC)为0.9,相关系数(COR)为0.79,真实技能统计量(TSS)为0.83,标准偏差(SD)为0.71。测试的模型适用于可用的输入数据,以绘制整个研究流域的FSM图。这些发现强调了将集成模型与GIS集成在一起作为洪水敏感性制图的有效工具的潜力。
    Floods are among the natural hazards that have seen a rapid increase in frequency in recent decades. The damage caused by floods, including human and financial losses, poses a serious threat to human life. This study evaluates two machine learning (ML) techniques for flood susceptibility mapping (FSM) in the Gamasyab watershed in Iran. We utilized random forest (RF), support vector machine (SVM), ensemble models, and a geographic information system (GIS) to predict FSM. The application of these models involved 10 effective factors in flooding, as well as 82 flood locations integrated into the GIS. The SVM and RF models were trained and tested, followed by the implementation of resampling techniques (RT) using bootstrap and subsampling methods in three repetitions. The results highlighted the importance of elevation, slope, and precipitation as primary factors influencing flood occurrence. Additionally, the ensemble model outperformed both the RF and SVM models, achieving an area under the curve (AUC) of 0.9, a correlation coefficient (COR) of 0.79, a true skill statistic (TSS) of 0.83, and a standard deviation (SD) of 0.71 in the test phase. The tested models were adapted to available input data to map the FSM across the study watershed. These findings underscore the potential of integrating an ensemble model with GIS as an effective tool for flood susceptibility mapping.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    中风由于其对死亡率和发病率的影响而构成重大的公共卫生问题。这项研究使用Suita研究的数据调查了机器学习算法在预测中风和识别关键风险因素方面的实用性。包括7389名参与者和53个变量。最初,无监督k原型聚类将参与者分类为风险聚类,而包括Logistic回归(LR)在内的五种监督模型,随机森林(RF),支持向量机(SVM)极端梯度提升(XGBoost),采用光梯度增强机(LightGBM)预测卒中结局。使用无监督k原型聚类方法确定的风险聚类之间的卒中发生率差异很大,根据调查结果。监督学习,特别是RF,是一个可取的选择,因为性能度量的级别更高。Shapley加法解释(SHAP)方法确定了年龄,收缩压,高血压,估计肾小球滤过率,代谢综合征,血糖水平是中风的关键预测因子,与高风险组中无监督聚类方法的结果一致。此外,先前未识别的风险因素,如肘关节厚度,果糖胺,血红蛋白,和钙水平显示出预测卒中的潜力。总之,机器学习促进了准确的中风风险预测,并突出了潜在的生物标志物,为风险评估和生物标志物发现提供数据驱动的框架。
    Stroke constitutes a significant public health concern due to its impact on mortality and morbidity. This study investigates the utility of machine learning algorithms in predicting stroke and identifying key risk factors using data from the Suita study, comprising 7389 participants and 53 variables. Initially, unsupervised k-prototype clustering categorized participants into risk clusters, while five supervised models including Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosted Machine (LightGBM) were employed to predict stroke outcomes. Stroke incidence disparities among identified risk clusters using the unsupervised k-prototype clustering method are substantial, according to the findings. Supervised learning, particularly RF, was a preferable option because of the higher levels of performance metrics. The Shapley Additive Explanations (SHAP) method identified age, systolic blood pressure, hypertension, estimated glomerular filtration rate, metabolic syndrome, and blood glucose level as key predictors of stroke, aligning with findings from the unsupervised clustering approach in high-risk groups. Additionally, previously unidentified risk factors such as elbow joint thickness, fructosamine, hemoglobin, and calcium level demonstrate potential for stroke prediction. In conclusion, machine learning facilitated accurate stroke risk predictions and highlighted potential biomarkers, offering a data-driven framework for risk assessment and biomarker discovery.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    机器学习和遥感技术被广泛认为是有价值的,岩性判别和矿物学调查中具有成本效益的工具。当前的研究代表了一种尝试,即使用机器学习分类以及将几种遥感技术应用于Landsat-8/9卫星数据,以区分埃及中部东部沙漠Duwi剪切带(DSB)地区的各种露头岩石群。多类机器学习分类,多种常规遥感测绘技术,基于Jeffries-Matusita(J-M)距离测度的光谱可分性分析,实地考察,和岩石学调查相结合,以增强DSB地区裸露岩石单元的岩性辨别能力。本研究采用了公认的机器学习分类器(SupportVectorMachine-SVM),使用基于增强从各种伪色复合材料(FCC)遥感技术获得的岩性区分而仔细确定的训练数据,主成分分析(PCA),和最小噪声分数(MNF),以及实地调查数据和以前出版的地质图。获得了较高的SVM分类整体精度,然而,对各个岩石单元类别的准确性检查显示,某些类型的岩石单元的准确性较低,这也与较低的可分性评分有关。在最不可分离的岩石单元中;与火山碎屑变质沉积物岩石表现出高度光谱相似性的变格布鲁岩石,蛇绿岩杂岩的metratamafics显示出与Hammamat火山沉积岩高度相关的光谱态度。实施了面向目标的颜色比复合材料(CRC)技术,以更好地区分这些难以分离的岩石单元。获得了最终的综合地质图,其中包括DSB区域的各种不同的新元古代基底岩石单元。成功映射的岩石单元包括;Meatiq组(闪石,片麻证类花岗岩,和mylonitized花岗岩),ophioliticmélange(metaultramafics,长柄闪石,和火山碎屑沉积),Dokhan火山,Hammamat沉积物,和花岗岩。根据进行的大量野外工作和岩相调查,还对这些岩石单元进行了充分的描述。
    Machine learning and remote sensing techniques are widely accepted as valuable, cost-effective tools in lithological discrimination and mineralogical investigations. The current study represents an attempt to use machine learning classification along with several remote sensing techniques being applied to Landsat-8/9 satellite data to discriminate the various outcropping lithological rock units at the Duwi Shear Belt (DSB) area in the Central Eastern Desert of Egypt. Multi-class machine learning classification, multiple conventional remote sensing mapping techniques, spectral separability analysis based on the Jeffries-Matusita (J-M) distance measure, fieldwork, and petrographic investigations were integrated to enhance the lithological discrimination of the exposed rock units at DSB area. The well-recognized machine learning classifier (Support Vector Machine-SVM) was adopted in this study, with training data determined carefully based on enhancing the lithological discrimination attained from various remote sensing techniques of False Color Composites (FCC), Principal Component Analysis (PCA), and Minimum Noise Fraction (MNF), along with the fieldwork data and the previously published geologic maps. High overall accuracy of the SVM classification was obtained, however, inspection of the individual rock unit classes\' accuracies revealed lower accuracy for certain types of rock units which were also found associated with lower separability scores as well. Among the least separable rock units were; metagabbro rocks that showed high spectral similarity with the volcaniclastic metasediments rocks, and the metaultramafics of the ophiolitic mélange showed spectral attitude of high correlation to that of the Hammamat volcanosedimentary rocks. Target-oriented Color Ratio Composites (CRC) technique was implemented to better discriminate these hardly separable rock units. A final integrated geological map was obtained comprising the various discriminated Neoproterozoic basement rock units of the DSB area. The successfully mapped litho-units include; Meatiq Group (amphibolites, gneissic granitoids, and mylonitized granitoids), ophiolitic mélange (metaultramafics, metagabbro-amphibolites, and volcaniclastic metasediments), Dokhan volcanics, Hammamat sediments, and granites. An adequate description of these rock units was also given in light of the conducted intense fieldwork and petrographic investigations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    植物病害显著影响作物产量和质量,对全球农业构成严重威胁。识别和分类这些疾病的过程通常是耗时的并且容易出错。本研究通过采用卷积神经网络和支持向量机(CNN-SVM)混合模型对四种经济上重要的农作物的疾病进行分类来解决这个问题:草莓,桃子,樱桃,和大豆。目标是对10类疾病进行分类,有六个患病班级和四个健康班级,对于这些作物,使用基于深度学习的CNN-SVM模型。几个预训练模型,包括VGG16、VGG19、DenseNet、盗梦空间,MobileNetV2,MobileNet,Xception,和ShuffleNet,也受过训练,实现精度范围从53.82%到98.8%。提出的模型,然而,平均准确率为99.09%。虽然所提出的模型的准确性与VGG16预训练模型相当,其显着较低的可训练参数数量使其更加高效和独特。这项研究证明了CNN-SVM模型在提高植物病害分类的准确性和效率方面的潜力。CNN-SVM模型由于其优越的性能指标而优于VGG16和其他模型。所提出的模型实现了99%的F1分数,曲线下面积(AUC)为99.98%,和99%的精度值,展示其功效。此外,使用梯度加权类激活图(Grad-CAM)技术生成类激活图,以提供检测到的疾病的视觉解释。创建了一个热图,以突出显示需要分类的区域,进一步验证模型的准确性和可解释性。
    Plant diseases significantly impact crop productivity and quality, posing a serious threat to global agriculture. The process of identifying and categorizing these diseases is often time-consuming and prone to errors. This research addresses this issue by employing a convolutional neural network and support vector machine (CNN-SVM) hybrid model to classify diseases in four economically important crops: strawberries, peaches, cherries, and soybeans. The objective is to categorize 10 classes of diseases, with six diseased classes and four healthy classes, for these crops using the deep learning-based CNN-SVM model. Several pre-trained models, including VGG16, VGG19, DenseNet, Inception, MobileNetV2, MobileNet, Xception, and ShuffleNet, were also trained, achieving accuracy ranges from 53.82% to 98.8%. The proposed model, however, achieved an average accuracy of 99.09%. While the proposed model\'s accuracy is comparable to that of the VGG16 pre-trained model, its significantly lower number of trainable parameters makes it more efficient and distinctive. This research demonstrates the potential of the CNN-SVM model in enhancing the accuracy and efficiency of plant disease classification. The CNN-SVM model was selected over VGG16 and other models due to its superior performance metrics. The proposed model achieved a 99% F1-score, a 99.98% Area Under the Curve (AUC), and a 99% precision value, demonstrating its efficacy. Additionally, class activation maps were generated using the Gradient Weighted Class Activation Mapping (Grad-CAM) technique to provide a visual explanation of the detected diseases. A heatmap was created to highlight the regions requiring classification, further validating the model\'s accuracy and interpretability.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    孟加拉国西北部的地下水,主要在Chapainawabganj区,已经被砷污染了.本研究利用机器学习技术记录了砷浓度的地理分布。该研究旨在通过精确识别地下水砷的发生来提高模型预测的准确性,实现有效的缓解行动,并产生更有益的结果。富含砷的氧化铁/氢氧化物的还原溶解被认为是导致砷从沉积物释放到地下水的主要机制。研究表明,在研究区域,随着砷浓度升高,显著水平的钠(Na),铁(Fe),锰(Mn),和钙(Ca)存在。统计分析用于特征选择,鉴定pH值,电导率(EC),硫酸盐(SO4),硝酸盐(NO3),Fe,Mn,Na,K,Ca,Mg,碳酸氢盐(HCO3),磷酸盐(PO4),作为与砷动员密切相关的特征。随后,各种机器学习模型,包括朴素贝叶斯,随机森林,支持向量机,决策树,和逻辑回归,被雇用。该模型利用归一化砷浓度分类为高浓度(HC)或低浓度(LC),连同生理化学性质作为特征,来预测砷的发生。在所有机器学习模型中,逻辑回归和支持向量机模型在准确性和混淆矩阵分析的基础上显示出高性能。在这项研究中,生成了空间分布预测图以识别砷易发区域.预测图还显示,Chapainawabganj市下的BaroghoriaUnion和Rajarampur地区是高风险地区,而MaharajpurUnion和BaliadangaUnion是研究区域的相对低风险地区。该地图将促进研究人员和立法者实施缓解战略。将使用逻辑回归(LR)和支持向量机(SVM)模型来连续监测砷浓度值。
    Groundwater in northwestern parts of Bangladesh, mainly in the Chapainawabganj District, has been contaminated by arsenic. This research documents the geographical distribution of arsenic concentrations utilizing machine learning techniques. The study aims to enhance the accuracy of model predictions by precisely identifying occurrences of groundwater arsenic, enabling effective mitigation actions and yielding more beneficial results. The reductive dissolution of arsenic-rich iron oxides/hydroxides is identified as the primary mechanism responsible for the release of arsenic from sediment into groundwater. The study reveals that in the research region, alongside elevated arsenic concentrations, significant levels of sodium (Na), iron (Fe), manganese (Mn), and calcium (Ca) were present. Statistical analysis was employed for feature selection, identifying pH, electrical conductivity (EC), sulfate (SO4), nitrate (NO3), Fe, Mn, Na, K, Ca, Mg, bicarbonate (HCO3), phosphate (PO4), and As as features closely associated with arsenic mobilization. Subsequently, various machine learning models, including Naïve Bayes, Random Forest, Support Vector Machine, Decision Tree, and logistic regression, were employed. The models utilized normalized arsenic concentrations categorized as high concentration (HC) or low concentration (LC), along with physiochemical properties as features, to predict arsenic occurrences. Among all machine learning models, the logistic regression and support vector machine models demonstrated high performance based on accuracy and confusion matrix analysis. In this study, a spatial distribution prediction map was generated to identify arsenic-prone areas. The prediction map also displays that Baroghoria Union and Rajarampur region under Chapainawabganj municipality are high-risk areas and Maharajpur Union and Baliadanga Union are comparatively low-risk areas of the research area. This map will facilitate researchers and legislators in implementing mitigation strategies. Logistic regression (LR) and support vector machine (SVM) models will be utilized to monitor arsenic concentration values continuously.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    该研究利用傅立叶变换红外(FTIR)光谱结合化学计量学研究真性红细胞增多症(PV)患者血清中的蛋白质组成和结构变化。主成分分析(PCA)揭示了不同的生化特性,强调磷脂的吸光度升高,酰胺,与健康对照相比,PV患者的血脂。酰胺I/酰胺II和酰胺I/酰胺III的比例表明蛋白质结构的改变。支持向量机分析和接收器工作特性曲线确定酰胺I是PV的关键预测因子,达到100%的准确度,灵敏度,和特异性,而酰胺III显示较低的预测值(70%)。PCA分析表明PV患者和对照组之间的有效区分,关键波数包括酰胺II,酰胺I,和CH脂质振动。这些发现强调了FTIR光谱用于诊断和监测PV的潜力。
    The study utilized Fourier transform infrared (FTIR) spectroscopy coupled with chemometrics to investigate protein composition and structural changes in the blood serum of patients with polycythemia vera (PV). Principal component analysis (PCA) revealed distinct biochemical properties, highlighting elevated absorbance of phospholipids, amides, and lipids in PV patients compared to healthy controls. Ratios of amide I/amide II and amide I/amide III indicated alterations in protein structures. Support vector machine analysis and receiver operating characteristic curves identified amide I as a crucial predictor of PV, achieving 100% accuracy, sensitivity, and specificity, while amide III showed a lower predictive value (70%). PCA analysis demonstrated effective differentiation between PV patients and controls, with key wavenumbers including amide II, amide I, and CH lipid vibrations. These findings underscore the potential of FTIR spectroscopy for diagnosing and monitoring PV.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号