CatBoost

CatBoost
  • 文章类型: Comparative Study
    针对妇女的家庭暴力在利比里亚很普遍,近一半的女性报告身体暴力。然而,对促成这一问题的生物社会因素的研究仍然有限。本研究旨在利用机器学习方法预测女性遭受家庭暴力的脆弱性。利用2019-2020年利比里亚人口与健康调查(LDHS)的数据。我们采用了七种机器学习算法来实现这一目标,包括ANN,KNN,射频,DT,XGBoost,LightGBM,和CatBoost。我们的分析显示,LightGBM和RF模型在预测利比里亚妇女对家庭暴力的脆弱性方面取得了最高的准确性,准确率为81%和82%,分别。在多种算法中确定的关键特征之一是经历过情感暴力的人数。这些发现为利比里亚与针对妇女的家庭暴力有关的基本特征和风险因素提供了重要见解。通过利用机器学习技术,我们可以更好地预测和理解这个复杂的问题,最终有助于制定更有效的预防和干预策略。
    Domestic violence against women is a prevalent in Liberia, with nearly half of women reporting physical violence. However, research on the biosocial factors contributing to this issue remains limited. This study aims to predict women\'s vulnerability to domestic violence using a machine learning approach, leveraging data from the Liberian Demographic and Health Survey (LDHS) conducted in 2019-2020. We employed seven machine learning algorithms to achieve this goal, including ANN, KNN, RF, DT, XGBoost, LightGBM, and CatBoost. Our analysis revealed that the LightGBM and RF models achieved the highest accuracy in predicting women\'s vulnerability to domestic violence in Liberia, with 81% and 82% accuracy rates, respectively. One of the key features identified across multiple algorithms was the number of people who had experienced emotional violence. These findings offer important insights into the underlying characteristics and risk factors associated with domestic violence against women in Liberia. By utilizing machine learning techniques, we can better predict and understand this complex issue, ultimately contributing to the development of more effective prevention and intervention strategies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    阿司匹林抵抗(AR)是当前缺血性中风护理中的紧迫问题。虽然遗传变异的作用被广泛认为,数据仍然存在争议。我们的目的是研究遗传特征对缺血性中风患者通过花生四烯酸(AA)和二磷酸腺苷(ADP)的血小板聚集测量的实验室AR的贡献。共纳入461例患者。血小板聚集通过光透射聚集测定法测量。ITGB3,GPIBA,TBXA2R,ITGA2,PLA2G7,HMOX1,PTGS1,PTGS2,ADRA2A,使用低密度生物芯片确定ABCB1和PEAR1基因以及基因间9p21.3区域。我们发现PTGS1基因中的rs1330344与AR和AA诱导的血小板聚集有关。ADRA2A基因中的Rs4311994也影响AA诱导的聚集,TBXA2R基因中的rs4523和PEAR1基因中的rs12041331影响了ADP诱导的聚集。此外,发现ITGA2基因rs1062535在治疗10天期间对NIHSS动力学的影响。基于临床和遗传因素的AR的最佳机器学习(ML)模型的特征为AUC=0.665,F1评分=0.628。总之,关联研究表明,PTGS1、ADRA2A、TBXA2R和PEAR1多态性可能影响实验室AR。然而,ML模型显示了临床特征的主要影响.
    Aspirin resistance (AR) is a pressing problem in current ischemic stroke care. Although the role of genetic variations is widely considered, the data still remain controversial. Our aim was to investigate the contribution of genetic features to laboratory AR measured through platelet aggregation with arachidonic acid (AA) and adenosine diphosphate (ADP) in ischemic stroke patients. A total of 461 patients were enrolled. Platelet aggregation was measured via light transmission aggregometry. Eighteen single-nucleotide polymorphisms (SNPs) in ITGB3, GPIBA, TBXA2R, ITGA2, PLA2G7, HMOX1, PTGS1, PTGS2, ADRA2A, ABCB1 and PEAR1 genes and the intergenic 9p21.3 region were determined using low-density biochips. We found an association of rs1330344 in the PTGS1 gene with AR and AA-induced platelet aggregation. Rs4311994 in ADRA2A gene also affected AA-induced aggregation, and rs4523 in the TBXA2R gene and rs12041331 in the PEAR1 gene influenced ADP-induced aggregation. Furthermore, the effect of rs1062535 in the ITGA2 gene on NIHSS dynamics during 10 days of treatment was found. The best machine learning (ML) model for AR based on clinical and genetic factors was characterized by AUC = 0.665 and F1-score = 0.628. In conclusion, the association study showed that PTGS1, ADRA2A, TBXA2R and PEAR1 polymorphisms may affect laboratory AR. However, the ML model demonstrated the predominant influence of clinical features.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:在COVID-19全球大流行的长期期间,人工智能和数字医疗保健在改善和加强医疗诊断和治疗方面取得了实质性进展。在这项研究中,我们讨论了使用机器学习技术开发多囊卵巢综合征(PCOS)自我诊断的预测模型。
    目的:我们的目标是在潜在患者和临床提供者中开发PCOS的自我诊断预测模型。对于潜在的患者,预测仅基于非侵入性措施,如拟人化措施,症状,年龄,和其他生活方式因素,以便可以方便地使用所提出的预测工具,而无需任何实验室或超声测试结果。对于可以访问患者医学检查结果的临床提供者,可以采用使用所有预测变量的预测模型来帮助医疗服务提供者诊断PCOS患者.我们使用各种误差指标比较了两种预测模型。我们称前者为病人模型,后者为病人模型,贯穿本文的提供者模型。
    方法:在这项回顾性研究中,从喀拉拉邦的10家不同医院收集的541名妇女健康信息的公开数据集,印度,包括PCOS状态,被收购并用于分析。我们采用了CatBoost方法进行分类,用于估计模型性能的K折交叉验证,和SHAP(Shapley加法解释)值来解释每个变量的重要性。在我们的亚组研究中,我们使用k-均值聚类和主成分分析将数据集分成2个不同的BMI亚组,并比较了2个亚组之间的预测结果和特征重要性.
    结果:我们在患者模型中,在没有任何侵入性措施的情况下,对PCOS状态的预测准确率达到了81%至82.5%,在提供者模型中使用非侵入性和侵入性预测变量,预测准确率达到了87.5%至90.1%。在非侵入性措施中,变量包括黑棘皮病,痤疮,多毛症,月经周期不规律,月经周期的长度,体重增加,快餐消费,年龄在模型中更为重要。在医学测试结果中,左右卵巢中的卵泡数量和抗苗勒管激素在特征重要性方面排名很高。我们还在一项亚组研究中报告了更详细的结果。
    结论:所提出的预测模型最终有望成为一个方便的数字平台,用户可以通过该平台获得PCOS风险的预诊断或自我诊断以及咨询。有或没有获得医学测试结果。它将使妇女在寻求进一步医疗之前,可以在家中方便地使用平台。临床提供者还可以使用拟议的预测工具来帮助诊断女性的PCOS。
    BACKGROUND: Artificial intelligence and digital health care have substantially advanced to improve and enhance medical diagnosis and treatment during the prolonged period of the COVID-19 global pandemic. In this study, we discuss the development of prediction models for the self-diagnosis of polycystic ovary syndrome (PCOS) using machine learning techniques.
    OBJECTIVE: We aim to develop self-diagnostic prediction models for PCOS in potential patients and clinical providers. For potential patients, the prediction is based only on noninvasive measures such as anthropomorphic measures, symptoms, age, and other lifestyle factors so that the proposed prediction tool can be conveniently used without any laboratory or ultrasound test results. For clinical providers who can access patients\' medical test results, prediction models using all predictor variables can be adopted to help health providers diagnose patients with PCOS. We compare both prediction models using various error metrics. We call the former model the patient model and the latter, the provider model throughout this paper.
    METHODS: In this retrospective study, a publicly available data set of 541 women\'s health information collected from 10 different hospitals in Kerala, India, including PCOS status, was acquired and used for analysis. We adopted the CatBoost method for classification, K-fold cross-validation for estimating the performance of models, and SHAP (Shapley Additive Explanations) values to explain the importance of each variable. In our subgroup study, we used k-means clustering and Principal Component Analysis to split the data set into 2 distinct BMI subgroups and compared the prediction results as well as the feature importance between the 2 subgroups.
    RESULTS: We achieved 81% to 82.5% prediction accuracy of PCOS status without any invasive measures in the patient models and achieved 87.5% to 90.1% prediction accuracy using both noninvasive and invasive predictor variables in the provider models. Among noninvasive measures, variables including acanthosis nigricans, acne, hirsutism, irregular menstrual cycle, length of menstrual cycle, weight gain, fast food consumption, and age were more important in the models. In medical test results, the numbers of follicles in the right and left ovaries and anti-Müllerian hormone were ranked highly in feature importance. We also reported more detailed results in a subgroup study.
    CONCLUSIONS: The proposed prediction models are ultimately expected to serve as a convenient digital platform with which users can acquire pre- or self-diagnosis and counsel for the risk of PCOS, with or without obtaining medical test results. It will enable women to conveniently access the platform at home without delay before they seek further medical care. Clinical providers can also use the proposed prediction tool to help diagnose PCOS in women.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Chlorophyll-a (Chl-a) is one of the most important indicators of the trophic status of inland waters, and its continued monitoring is essential. Recently, the operated Sentinel-2 MSI satellite offers high spatial resolution images for remote water quality monitoring. In this study, we tested the performance of the three well-known machine learning (ML) (random forest [RF], support vector machine [SVM], and Gaussian process [GP]) and the two novel ML (extreme gradient boost (XGB) and CatBoost [CB]) models for estimation a wide range of Chl-a concentration (10.1-798.7 μg/L) using the Sentinel-2 MSI data and in situ water quality measurement in the Tri An Reservoir (TAR), Vietnam. GP indicated the most reliable model for predicting Chl-a from water quality parameters (R2  = 0.85, root-mean-square error [RMSE] = 56.65 μg/L, Akaike\'s information criterion [AIC] = 575.10, and Bayesian information criterion [BIC] = 595.24). Regarding input model as water surface reflectance, CB was the superior model for Chl-a retrieval (R2  = 0.84, RMSE = 46.28 μg/L, AIC = 229.18, and BIC = 238.50). Our results indicated that GP and CB are the two best models for the prediction of Chl-a in TAR. Overall, the Sentinel-2 MSI coupled with ML algorithms is a reliable, inexpensive, and accurate instrument for monitoring Chl-a in inland waters. PRACTITIONER POINTS: Machine learning algorithms were used for both remote sensing data and in situ water quality measurements. The performance of five well-known machine learning models was tested Gaussian process was the most reliable model for predicting Chl-a from water quality parameters CatBoost was the best model for Chl-a retrieval from water surface reflectance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号