k-means

K - means
  • 文章类型: Journal Article
    随着患者复杂性的增加,其数据存储在零散的健康信息系统中,有效的临床决策需要从患者病史中收集重要信息的自动化和时效性方法。使用COVID-19作为案例研究,我们开发了一个带有用户反馈的查询机器人信息检索系统,使临床医生能够提出自然问题,从而从患者笔记中检索数据.
    我们应用了临床BERT,预先训练的上下文语言模型,到我们的病人笔记数据集来获得句子嵌入,使用K均值来减少实时交互的计算时间。然后采用Rocchio算法来合并用户反馈并提高检索性能。
    在迭代反馈循环实验中,最终迭代的MAP为0.93/0.94,而普通和1的初始MAP为0.66/0.52。/1.与COVID-19特定查询的0.79/0.83相比,确认上下文模型处理自然语言查询和反馈中的歧义有助于提高检索性能。用户在环实验也优于自动伪相关反馈方法。此外,假设在初始检索和相关性反馈之间具有相同精度的零假设被拒绝,具有很高的统计意义(p<0.05)。与Word2Vec相比,TF-IDF和biobert模型,clinicalBERT工作最佳考虑响应精度和用户反馈之间的平衡。
    我们的模型适用于通用和COVID-19特定的查询。然而,一些通用查询没有回答以及其他,因为聚类会降低查询性能,并且查询和句子之间的模糊关系被认为是不相关的。我们还针对具有相同含义但不同表达式的查询测试了我们的模型,并证明了这些查询变体在合并用户反馈后产生了类似的性能。
    总之,我们开发了一个基于NLP的查询机器人,它可以处理同义词和自然语言歧义,以便从患者图表中检索相关信息。用户反馈对于提高模型性能至关重要。
    With increasing patient complexity whose data are stored in fragmented health information systems, automated and time-efficient ways of gathering important information from the patients\' medical history are needed for effective clinical decision making. Using COVID-19 as a case study, we developed a query-bot information retrieval system with user-feedback to allow clinicians to ask natural questions to retrieve data from patient notes.
    We applied clinicalBERT, a pre-trained contextual language model, to our dataset of patient notes to obtain sentence embeddings, using K-Means to reduce computation time for real-time interaction. Rocchio algorithm was then employed to incorporate user-feedback and improve retrieval performance.
    In an iterative feedback loop experiment, MAP for final iteration was 0.93/0.94 as compared to initial MAP of 0.66/0.52 for generic and 1./1. compared to 0.79/0.83 for COVID-19 specific queries confirming that contextual model handles the ambiguity in natural language queries and feedback helps to improve retrieval performance. User-in-loop experiment also outperformed the automated pseudo relevance feedback method. Moreover, the null hypothesis which assumes identical precision between initial retrieval and relevance feedback was rejected with high statistical significance (p ≪ 0.05). Compared to Word2Vec, TF-IDF and bioBERT models, clinicalBERT works optimally considering the balance between response precision and user-feedback.
    Our model works well for generic as well as COVID-19 specific queries. However, some generic queries are not answered as well as others because clustering reduces query performance and vague relations between queries and sentences are considered non-relevant. We also tested our model for queries with the same meaning but different expressions and demonstrated that these query variations yielded similar performance after incorporation of user-feedback.
    In conclusion, we develop an NLP-based query-bot that handles synonyms and natural language ambiguity in order to retrieve relevant information from the patient chart. User-feedback is critical to improve model performance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    The energy use analysis of coal-fired power plant units is of significance for energy conservation and consumption reduction. One of the most serious problems attributed to Chinese coal-fired power plants is coal waste. Several units in one plant may experience a practical rated output situation at the same time, which may increase the coal consumption of the power plant. Here, we propose a new hybrid methodology for plant-level load optimization to minimize coal consumption for coal-fired power plants. The proposed methodology includes two parts. One part determines the reference value of the controllable operating parameters of net coal consumption under typical load conditions, based on an improved K-means algorithm and the Hadoop platform. The other part utilizes a support vector machine to determine the sensitivity coefficients of various operating parameters for the net coal consumption under different load conditions. Additionally, the fuzzy rough set attribute reduction method was employed to obtain the minimalist properties reduction method parameters to reduce the complexity of the dataset. This work is based on continuously-measured information system data from a 600 MW coal-fired power plant in China. The results show that the proposed strategy achieves high energy conservation performance. Taking the 600 MW load optimization value as an example, the optimized power supply coal consumption is 307.95 g/(kW·h) compared to the actual operating value of 313.45 g/(kW·h). It is important for coal-fired power plants to reduce their coal consumption.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在本文中,我们采用综合方法对贝尼·哈伦(BH)大坝的水质进行综合评价,阿尔及利亚最大的地表水资源。在同一框架下采用了几种技术,包括加拿大理事会部长环境水质指数(CCME-WQI),主成分分析和因子分析(PCA/FA),K均值聚类,和普通最小二乘(OLS)分析。收集了22个物理化学参数的数据集,在11年的时间里,来自三个采样站:AinSmara(ST1)和Menia(ST2),都位于瓦迪·鲁梅尔的上游,\"和BH大坝站(ST3),位于大坝现场。PCA/FA可以识别出显着影响BH大坝水质的七个关键因素。BH大坝的CCME指数平均值分别为17、40、42和32,灌溉,工业,和水生生物的目的,分别,这表明水质差,根据CCME分类方案。此外,K-means算法已被证明是一种非常有用的机器学习工具,可以检测BH大坝污染的主要来源是“WadiRhumel”。“最后,OLS分析,还有Mann-Kendall测试,强调了BH大坝水质的积极趋势。
    In this paper, we use an integrated approach to carry out a comprehensive evaluation of water quality in the Beni Haroun (BH) dam, the largest surface water resource in Algeria. Several techniques have been employed under the same framework, including the Canadian Council Ministers Environment Water Quality Index (CCME-WQI), principal component analysis and factor analysis (PCA/FA), the K-means clustering, and the ordinary least square (OLS) analysis. A data set of 22 physicochemical parameters has been collected, over a period of 11 years, from three sampling stations: Ain Smara (ST1) and Menia (ST2), both located upstream of \"Wadi Rhumel,\" and BH dam station (ST3), located at the dam site. The PCA/FA enables the identification of seven key factors that influence significantly BH dam water quality. The average values of CCME indices at the BH dam were 17, 40, 42, and 32 for drinking, irrigation, industry, and aquatic life purposes, respectively, which indicate poor water quality, according to the CCME categorization scheme. Besides, the K-means algorithm has been proven to be a very useful machine learning tool to detect that the major source of BH dam pollution is \"Wadi Rhumel.\" Finally, OLS analysis, along with the Mann-Kendall test, highlighted the positive trend of BH dam\'s water quality.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    Speaking and presenting in public are critical skills for academic and professional development. These skills are demanded across society, and their development and evaluation are a challenge faced by higher education institutions. There are some challenges to evaluate objectively, as well as to generate valuable information to professors and appropriate feedback to students. In this paper, in order to understand and detect patterns in oral student presentations, we collected data from 222 Computer Engineering (CE) fresh students at three different times, over two different years (2017 and 2018). For each presentation, using a developed system and Microsoft Kinect, we have detected 12 features related to corporal postures and oral speaking. These features were used as input for the clustering and statistical analysis that allowed for identifying three different clusters in the presentations of both years, with stronger patterns in the presentations of the year 2017. A Wilcoxon rank-sum test allowed us to evaluate the evolution of the presentations attributes over each year and pointed out a convergence in terms of the reduction of the number of features statistically different between presentations given at the same course time. The results can further help to give students automatic feedback in terms of their postures and speech throughout the presentations and may serve as baseline information for future comparisons with presentations from students coming from different undergraduate courses.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    已经开发了多种方法来从动物运动数据中推断行为状态,但是很少有独立证据评估它们的准确性,特别是对于以高时间分辨率采样的位置数据。在这里,我们使用监视猎物捕获尝试的声音记录来评估行为分割方法的性能。
    我们在11只墨西哥食鱼蝙蝠的觅食之旅中记录了GPS位置和超声波音频,MyotisVivesi,使用微型生物记录器。然后,我们应用了五种不同的分割算法(k均值聚类,期望最大化和二元聚类,第一次通过时间,隐马尔可夫模型,和相关的速度变化点分析)来推断两种行为状态,觅食和通勤,从GPS数据。为了评估推断,我们独立地确定了在录音中觅食期间发生的Biosonar叫声(“喂食蜂鸣声”)的特征模式。然后,我们比较了分割方法,以确定它们正确识别这两种行为的程度,以及它们对觅食运动参数的估计是否与有嗡嗡声的位置相匹配。
    虽然五种方法在预测的觅食事件期间发生的嗡嗡声的中位数百分比不同,或真阳性率(44-75%),两状态隐马尔可夫模型的中值平衡准确率最高(67%).隐马尔可夫模型和首次通过时间预测的觅食飞行速度和转弯角度与在有觅食蜂鸣的位置测得的速度和转弯角度相似,并且预测的觅食事件的数量或持续时间没有差异。
    隐马尔可夫模型方法在识别食鱼蝙蝠觅食段方面表现最好;然而,首次传代时间没有显著差异,并给出了相似的参数估计.这是首次尝试评估回声定位蝙蝠的分割方法,并提供了可用于其他物种的评估框架。
    UNASSIGNED: Multiple methods have been developed to infer behavioral states from animal movement data, but rarely has their accuracy been assessed from independent evidence, especially for location data sampled with high temporal resolution. Here we evaluate the performance of behavioral segmentation methods using acoustic recordings that monitor prey capture attempts.
    UNASSIGNED: We recorded GPS locations and ultrasonic audio during the foraging trips of 11 Mexican fish-eating bats, Myotis vivesi, using miniature bio-loggers. We then applied five different segmentation algorithms (k-means clustering, expectation-maximization and binary clustering, first-passage time, hidden Markov models, and correlated velocity change point analysis) to infer two behavioral states, foraging and commuting, from the GPS data. To evaluate the inference, we independently identified characteristic patterns of biosonar calls (\"feeding buzzes\") that occur during foraging in the audio recordings. We then compared segmentation methods on how well they correctly identified the two behaviors and if their estimates of foraging movement parameters matched those for locations with buzzes.
    UNASSIGNED: While the five methods differed in the median percentage of buzzes occurring during predicted foraging events, or true positive rate (44-75%), a two-state hidden Markov model had the highest median balanced accuracy (67%). Hidden Markov models and first-passage time predicted foraging flight speeds and turn angles similar to those measured at locations with feeding buzzes and did not differ in the number or duration of predicted foraging events.
    UNASSIGNED: The hidden Markov model method performed best at identifying fish-eating bat foraging segments; however, first-passage time was not significantly different and gave similar parameter estimates. This is the first attempt to evaluate segmentation methodologies in echolocating bats and provides an evaluation framework that can be used on other species.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号