boosted trees

  • 文章类型: Journal Article
    通过侵入性膀胱镜检查诊断尿路上皮癌(UCa)是痛苦的,特别是在男人身上,会导致感染和出血.因为男性患者的UCa风险更高,泌尿非侵入性UCa生物标志物非常需要对男性进行侵入性膀胱镜检查。我们先前在尿液样品中鉴定了多个DNA甲基化位点,这些位点在男性中以高灵敏度和特异性检测UCa。这里,我们通过采用多种统计方法和机器学习(随机森林,提升的树木,LASSO)使用251名男性UCa患者和111名对照的数据集。所有方法都一致选择了位于ALOX5,TRPS1和16号染色体上的基因间区域中的三个CpG位点,并根据其各自的各个CpG阈值测试了其在临床使用的单个决策矩阵中的组合。ALOX5和TRPS1的组合在95%的预设特异性下产生最佳的总体灵敏度(61%)。这种组合超过了最敏感的生物信息学方法的诊断性能和最佳的单一CpG的诊断性能。总之,我们发现,多种统计学方法的重叠分析在男性群体中确定了UCa最可靠的生物标志物.结果可能有助于对男性进行膀胱镜检查。
    Diagnosing urothelial cancer (UCa) via invasive cystoscopy is painful, specifically in men, and can cause infection and bleeding. Because the UCa risk is higher for male patients, urinary non-invasive UCa biomarkers are highly desired to stratify men for invasive cystoscopy. We previously identified multiple DNA methylation sites in urine samples that detect UCa with a high sensitivity and specificity in men. Here, we identified the most relevant markers by employing multiple statistical approaches and machine learning (random forest, boosted trees, LASSO) using a dataset of 251 male UCa patients and 111 controls. Three CpG sites located in ALOX5, TRPS1 and an intergenic region on chromosome 16 have been concordantly selected by all approaches, and their combination in a single decision matrix for clinical use was tested based on their respective thresholds of the individual CpGs. The combination of ALOX5 and TRPS1 yielded the best overall sensitivity (61%) at a pre-set specificity of 95%. This combination exceeded both the diagnostic performance of the most sensitive bioinformatic approach and that of the best single CpG. In summary, we showed that overlap analysis of multiple statistical approaches identifies the most reliable biomarkers for UCa in a male collective. The results may assist in stratifying men for cystoscopy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:使用逻辑回归进行死产的先前预测模型没有利用复杂的机器学习方法中涉及的先进和细微的技术,例如对结果之间的非线性关系进行建模。
    目的:使用人口统计,利用存活前(22-24周)和整个妊娠期间的可用数据,创建和完善用于预测死产的机器学习模型,medical,和产前检查数据,包括超声和胎儿遗传学.
    方法:这是对死胎合作研究网络的二次分析,其中包括2006-2009年在美国5个不同地区的59家医院分娩的死产和活产婴儿的妊娠数据.主要目的是使用生存能力之前的可用数据创建用于预测死产的模型。次要目标包括完善整个怀孕期间可用变量的模型并确定变量的重要性。
    结果:在3000例活产和982例死产中,确定了101个感兴趣的变量。在包含可行之前可用数据的模型中,随机森林模型具有85.1%的准确度(曲线下面积[AUC])和高灵敏度(88.6%),特异性(85.3%),正预测值(85.3%),和负预测值(84.8%)。使用整个怀孕期间收集的数据的随机森林模型的准确率为85.0%;该模型的灵敏度为92.2%,77.9%的特异性,84.7%正预测值,和88.3%负预测值。存活前模型中的重要变量包括先前的死产,少数民族种族,最早产前检查和超声检查时的胎龄,和妊娠中期血清筛查。
    结论:将先进的机器学习技术应用于具有独特和临床相关变量的死产和活产的综合数据库,产生了一种算法,该算法可以准确识别85%的导致死产的怀孕。在他们达到生存能力之前。一旦在反映美国分娩人口的代表性数据库中得到验证,然后进行前瞻性验证,这些模型可以提供有效的风险分层和临床决策支持,从而更好地识别和监测有死产风险的人群.
    Previous predictive models using logistic regression for stillbirth do not leverage the advanced and nuanced techniques involved in sophisticated machine learning methods, such as modeling nonlinear relationships between outcomes.
    This study aimed to create and refine machine learning models for predicting stillbirth using data available before viability (22-24 weeks) and throughout pregnancy, as well as demographic, medical, and prenatal visit data, including ultrasound and fetal genetics.
    This is a secondary analysis of the Stillbirth Collaborative Research Network, which included data from pregnancies resulting in stillborn and live-born infants delivered at 59 hospitals in 5 diverse regions across the United States from 2006 to 2009. The primary aim was the creation of a model for predicting stillbirth using data available before viability. Secondary aims included refining models with variables available throughout pregnancy and determining variable importance.
    Among 3000 live births and 982 stillbirths, 101 variables of interest were identified. Of the models incorporating data available before viability, the random forests model had 85.1% accuracy (area under the curve) and high sensitivity (88.6%), specificity (85.3%), positive predictive value (85.3%), and negative predictive value (84.8%). A random forests model using data collected throughout pregnancy resulted in accuracy of 85.0%; this model had 92.2% sensitivity, 77.9% specificity, 84.7% positive predictive value, and 88.3% negative predictive value. Important variables in the previability model included previous stillbirth, minority race, gestational age at the earliest prenatal visit and ultrasound, and second-trimester serum screening.
    Applying advanced machine learning techniques to a comprehensive database of stillbirths and live births with unique and clinically relevant variables resulted in an algorithm that could accurately identify 85% of pregnancies that would result in stillbirth, before they reached viability. Once validated in representative databases reflective of the US birthing population and then prospectively, these models may provide effective risk stratification and clinical decision-making support to better identify and monitor those at risk of stillbirth.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:蒽吡唑是一类新的抗肿瘤剂,是蒽环类抗生素的后继药物,在各种模型肿瘤中具有广泛的抗肿瘤活性。
    目的:本研究引入新的QSAR模型来预测蒽吡唑类似物的抗肿瘤活性。
    方法:四种机器学习算法的预测性能,即人工神经网络,提升的树木,多元自适应回归样条,和随机森林,根据观测和预测数据的变化进行了研究,内部验证,可预测性,精度,和准确性。
    结果:ANN和增强树算法符合验证标准。这意味着这些程序可能能够预测所研究的蒽吡唑的抗癌作用。评估验证指标,为每种方法计算,指出了人工神经网络(ANN)过程作为选择的算法,特别是关于获得的可预测性以及平均绝对误差的最低值。设计的多层感知器(MLP)-15-7-1网络显示出训练的预测和实验pIC50值之间的高度相关性,test,和验证集。进行的敏感性分析可以指示所研究活动的最重要的结构特征。
    结论:ANN策略结合了拓扑和拓扑信息,可用于设计和开发作为抗癌分子的新型蒽吡唑类似物。
    BACKGROUND: Anthrapyrazoles are a new class of antitumor agents and successors to anthracyclines possessing a broad range of antitumor activity in various model tumors.
    OBJECTIVE: The present study introduces novel QSAR models for the prediction of antitumor activity of anthrapyrazole analogues.
    METHODS: The predictive performance of four machine learning algorithms, namely artificial neural networks, boosted trees, multivariate adaptive regression splines, and random forest, was studied in terms of variation of the observed and predicted data, internal validation, predictability, precision, and accuracy.
    RESULTS: ANN and boosted trees algorithms met the validation criteria. It means that these procedures may be able to forecast the anticancer effects of the anthrapyrazoles studied. Evaluation of validation metrics, calculated for each approach, indicated the artificial neural network (ANN) procedure as the algorithm of choice, especially with regard to the obtained predictability as well as the lowest value of mean absolute error. The designed multilayer perceptron (MLP)-15-7-1 network displayed a high correlation between the predicted and the experimental pIC50 value for the training, test, and validation set. A conducted sensitivity analysis enabled an indication of the most important structural features of the studied activity.
    CONCLUSIONS: The ANN strategy combines topographical and topological information and can be used for the design and development of novel anthrapyrazole analogues as anticancer molecules.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    情绪调节是心理健康的核心结构,情绪调节能力的缺陷会导致心理障碍。重新评价和抑制是两种被广泛研究的情绪调节策略,但是,可能是由于以前研究的方法学限制,与习惯性使用中的个体差异相关的神经相关性的一致图片仍然难以捉摸。为了解决这些问题,本研究将无监督和监督机器学习算法的组合应用于128例个体的结构性MRI扫描.首先,无监督机器学习被用来将大脑分成自然分组的灰质电路。然后,应用有监督的机器学习来预测使用不同情绪调节策略的个体差异。两个预测模型,包括大脑结构特征和心理特征,进行了测试。结果表明,颞海马旁-眶额网络成功地预测了使用重新评估的个体差异。不同的是,岛状和前颞小脑网络成功预测了抑制。在这两个预测模型中,焦虑,相反的策略,和特定的情绪智力因素在预测重新评估和抑制的使用中起作用。这项工作提供了有关从结构特征和其他心理相关变量中解码个体差异的新见解,同时扩展了先前对情绪调节策略神经基础的观察。
    Emotion regulation is a core construct of mental health and deficits in emotion regulation abilities lead to psychological disorders. Reappraisal and suppression are two widely studied emotion regulation strategies but, possibly due to methodological limitations in previous studies, a consistent picture of the neural correlates related to the individual differences in their habitual use remains elusive. To address these issues, the present study applied a combination of unsupervised and supervised machine learning algorithms to the structural MRI scans of 128 individuals. First, unsupervised machine learning was used to separate the brain into naturally grouping grey matter circuits. Then, supervised machine learning was applied to predict individual differences in the use of different strategies of emotion regulation. Two predictive models, including structural brain features and psychological ones, were tested. Results showed that a temporo-parahippocampal-orbitofrontal network successfully predicted the individual differences in the use of reappraisal. Differently, insular and fronto-temporo-cerebellar networks successfully predicted suppression. In both predictive models, anxiety, the opposite strategy, and specific emotional intelligence factors played a role in predicting the use of reappraisal and suppression. This work provides new insights regarding the decoding of individual differences from structural features and other psychologically relevant variables while extending previous observations on the neural bases of emotion regulation strategies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    BACKGROUND: Using Food Chain Information data to objectively identify high-risk animals entering abattoirs can represent an important step forward towards improving on-farm animal welfare. We aimed to develop and evaluate the performance of classification models, using Gradient Boosting Machine algorithms that utilise accurate longitudinal on-farm data on pig health and welfare to predict condemnations, pluck lesions and low cold carcass weight at slaughter.
    RESULTS: The accuracy of the models was assessed using the area under the receiver operating characteristics (ROC) curve (AUC). The AUC for the prediction models for pneumonia, dorsocaudal pleurisy, cranial pleurisy, pericarditis, partial and total condemnations, and low cold carcass weight varied from 0.54 for pneumonia and 0.67 for low cold carcass weight. For dorsocaudal pleurisy, ear lesions assessed on pigs aged 12 weeks and antimicrobial treatments (AMT) were the most important prediction variables. Similarly, the most important variable for the prediction of cranial pleurisy was the number of AMT. In the case of pericarditis, ear lesions assessed both at week 12 and 14 were the most important variables and accounted for 33% of the Bernoulli loss reduction. For predicting partial and total condemnations, the presence of hernias on week 18 and lameness on week 12 accounted for 27% and 14% of the Bernoulli loss reduction, respectively. Finally, AMT (37%) and ear lesions assessed on week 12 (15%) were the most important variables for predicting pigs with low cold carcass weight.
    CONCLUSIONS: The findings from our study show that on farm assessments of animal-based welfare outcomes and information on antimicrobial treatments have a modest predictive power in relation to the different meat inspection outcomes assessed. New research following the same group of pigs longitudinally from a larger number of farms supplying different slaughterhouses is required to confirm that on farm assessments can add value to Food Chain Information reports.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Air pollution in large cities produces numerous diseases and even millions of deaths annually according to the World Health Organization. Pollen exposure is related to allergic diseases, which makes its prediction a valuable tool to assess the risk level to aeroallergens. However, airborne pollen concentrations are difficult to predict due to the inherent complexity of the relationships among both biotic and environmental variables. In this work, a stochastic approach based on supervised machine learning algorithms was performed to forecast the daily Olea pollen concentrations in the Community of Madrid, central Spain, from 1993 to 2018. Firstly, individual Light Gradient Boosting Machine (LightGBM) and artificial neural network (ANN) models were applied to predict the day of the year (DOY) when the peak of the pollen season occurs, resulting the estimated average peak date 149.1 ± 9.3 and 150.1 ± 10.8 DOY for LightGBM and ANN, respectively, close to the observed value (148.8 ± 9.8). Secondly, the daily pollen concentrations during the entire pollen season have been calculated using an ensemble of two-step GAM followed by LightGBM and ANN. The results of the prediction of daily pollen concentrations showed a coefficient of determination (r2) above 0.75 (goodness of the model following cross-validation). The predictors included in the ensemble models were meteorological variables, phenological metrics, specific site-characteristics, and preceding pollen concentrations. The models are state-of-the-art in machine learning and their potential has been shown to be used and deployed to understand and to predict the pollen risk levels during the main olive pollen season.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    许多珊瑚礁生物与光合微藻共生。这种共生扩展了珊瑚礁生物可用的能量资源,从而可能影响生物多样性。在八珊瑚中,大约一半的类群含有光合共生体,而其余的则没有,因此,八珊瑚是评估生物多样性之间关系的理想模型,空间和环境因素,和光合共生体。从大堡礁的1106个地点收集的数据,澳大利亚,在12°和24°S之间表明,具有光合共生体(光养生物)的类群具有较高的丰度,更宽的范围,与没有共生生物(异养生物)的分类单元相比,位置分布范围更广。在光养组合中,空间周转包括类群的交换和损失,在广泛的环境条件下,它们的丰富程度很高。相比之下,异养生物并不常见,射程很短,位于能源供应最高、干扰最低的地方。异养组合之间的周转包括分类学损失,而不是分类单元的交换。光养和异养八角珊瑚之间的生物多样性模式和差异与在更多空间有限的光养海绵和硬珊瑚研究中记录的相似,和异养海绵.因此,这项研究表明,或者不是,与光合共生体,与能源供应和干扰相关的空间和环境因素是生物多样性的主要驱动因素,社区组成,和珊瑚礁底栖动物的范围。
    Many coral reef organisms live in symbiotic relationships with photosynthetic microalgae. This symbiosis extends the energy resources available to reef organisms, thereby potentially influencing biodiversity. In octocorals, about one-half of the taxa contain photosynthetic symbionts while the rest do not, and thus octocorals are an ideal model to assess the relationships between biodiversity, spatial and environmental factors, and photosynthetic symbionts. Data collected from 1106 sites on the Great Barrier Reef, Australia, between 12° and 24° S showed that taxa with photosynthetic symbionts (phototrophs) had higher abundances, wider ranges, and a wider spread of locations than taxa without symbionts (heterotrophs). In phototrophic assemblages, spatial turnover comprised both exchange and loss of taxa, and their richness was high across a broad range of environmental conditions. In contrast, heterotrophs were uncommon, had short ranges, and were located where energy supply was highest and disturbance lowest. Turnover between heterotrophic assemblages comprised taxonomic loss rather than exchange of taxa. The biodiversity patterns and differences between phototrophic and heterotrophic octocorals are similar to those recorded in more spatially limited studies of phototrophic sponges and hard corals, and heterotrophic sponges. This study therefore suggests that the association, or not, with photosynthetic symbionts, and spatial and environmental factors related to energy supply and disturbance are principal drivers of biodiversity, community composition, and ranges of coral reef benthos.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    In pig production, efficiency is benefiting from uniform growth in pens resulting in single deliveries from a pen of possibly all animals in the targeted weight range. Abnormalities, like pneumonia or aberrant growth, reduce production efficiency as it reduces the uniformity and might cause multiple deliveries per batch and pigs delivered with a low meat yield or outside the targeted weight range. Early identification of pigs prone to develop these abnormalities, for example, at the onset of the growing-finishing phase, would help to prevent heterogeneous pens through management interventions. Data about previous production cycles at the farm combined with data from the piglet\'s own history may help in identifying these abnormalities. The aim of this study, therefore, was to predict at the onset of the growing-finishing phase, that is, at 3 mo in advance, deviant pigs at slaughter with a machine-learning technique called boosted trees. The dataset used was extracted from the farm management system of a research center. It contained over 70,000 records of individual pigs born between 2004 and 2016, including information on, for example, offspring, litter size, transfer dates between production stages, their respective locations within the barns, and individual live-weights at several production stages. Results obtained on an independent test set showed that at a 90% specificity rate, the sensitivity was 16% for low meat percentage, 20% for pneumonia and 36% for low lifetime growth rate. For low lifetime growth rate, this meant an almost three times increase in positive predictive value compared to the current situation. From these results, it was concluded that routine performance information available at the onset of the growing-finishing phase combined with data about previous production cycles formed a moderate base to identify pigs prone to develop pneumonia (AUC > 0.60) and a good base to identify pigs prone to develop growth aberrations (AUC > 0.70) during the growing-finishing phase. The mentioned information, however, was not a sufficient base to identify pigs prone to develop low meat percentage (AUC < 0.60). The shown ability to identify growth aberrations and pneumonia can be considered a good first step towards the development of an early warning system for pigs in the growing-finishing phase.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Foodborne pathogens such as Listeria spp. contain the ability to survive and multiply in poultry farming environments, which provides a route of contamination for poultry processing environments and final poultry products. An understanding of the effect of meteorological variables on the prevalence of Listeria spp. in the farming environment is lacking. Soil and feces samples were collected from 11 pastured poultry farms from 2014 to 2017. Random forest (RF) and gradient boosting machine (GBM) predictive models were generated to describe and predict Listeria spp. prevalence in feces and soil samples based on meteorological factors at the farming location. This study attempted to demonstrate the use of GBM models in a food safety context and compare their use to RF models. Both feces models performed very well, with area under the curve (AUC) values of 0.905 and 0.855 for the RF and GBM models, respectively. The soil GBM model outperformed the RF model with AUCs of 0.873 and 0.700, respectively. The developed models can be used to predict the prevalence of Listeria spp. in pastured poultry farm environments and should be of great use to poultry farmers, producers, and risk managers.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号