k-means

K - means
  • 文章类型: Journal Article
    随着大型储能电池系统的广泛应用,对电池安全的需求正在上升。研究如何及早发现电池异常并减少热失控(TR)事故的发生变得尤为重要。现有关于电池TR预警算法的研究主要可分为两类:模型驱动和数据驱动。然而,常见的模型驱动方法通常很复杂,通用性差,预警能力低;常见的数据驱动方法大多基于神经网络,需要大量的培训费用,具有更好的预警能力,但误报警概率更高。为了解决现有工程的局限性,本文提出了一种基于数据驱动和基于模型的组合算法,用于准确的电池TR警告。具体来说,K-Means算法作为数据驱动模块,捕获电池数据中的异常值,Bernardi方程作为用于评估电池温度的模型驱动模块。最终,将加权模型驱动模块和数据驱动模块的输出进行组合,综合评估电池是否异常。所提出的算法结合了模型驱动和数据驱动方法的优点,实现25分钟的热失控提前警告,误报的概率大大降低。
    With the increasingly widespread application of large-scale energy storage battery systems, the demand for battery safety is rising. Research on how to detect battery anomalies early and reduce the occurrence of thermal runaway (TR) accidents has become particularly important. Existing research on battery TR warning algorithms can be mainly divided into two categories: model-driven and data-driven methods. However, the common model-driven methods are often of high complexity, with poor versatility and low early warning capability; and the common data-driven methods are mostly based on neural networks, requiring substantial training costs, with better early warning capabilities but higher false alarm probabilities. To address the limitations of existing works, this paper proposes a combined data-driven and model-based algorithm for accurate battery TR warnings. Specifically, the K-Means algorithm serves as the data-driven module, capturing outliers in battery data, and the Bernardi equation serves as the model-driven module used to evaluate battery temperature. Ultimately, the outputs of the weighted model-driven module and data-driven module are combined to comprehensively assess whether the battery is abnormal. The proposed algorithm combines the advantages of model-driven and data-driven approaches, achieving a 25 min advance warning for thermal runaway, with a significantly reduced probability of false alarms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    慢性暴露于高原低压低氧环境可能会影响人类的认知行为,这受到动态大脑连接状态的支持。直到现在,大脑网络的功能连接(FC)如何随海拔变化尚不清楚.在这篇文章中,我们使用了渭南(347m)和林芝(2950m)的Go/NoGo范例的EEG数据。动态FC(dFC)和K均值聚类的组合用于提取动态FC状态,这些状态后来通过图度量进行区分。此外,网络的时间属性,如分数窗口(FW),计算过渡数(TN)和平均停留时间(MDT)。最后,我们成功地从dFC矩阵中提取了两个不同的状态,其中状态1被验证具有更高的功能集成和隔离。在Go/NoGo任务期间,dFC状态动态切换,状态1的FW显示高空参与者人数上升。此外,在区域分析中,我们发现额顶叶皮质的状态偏差较高,枕叶的FC强度增强。这些结果表明,长期暴露于高海拔环境可能导致大脑网络重组为网络间和网络内信息传递效率较高的网络,这可以归因于高原环境导致大脑功能受损的补偿机制。本研究为思考高原如何影响认知障碍提供了一个新的视角。
    Chronic exposure to the hypobaric hypoxia environment of plateau could influence human cognitive behaviours which are supported by dynamic brain connectivity states. Until now, how functional connectivity (FC) of the brain network changes with altitudes is still unclear. In this article, we used EEG data of the Go/NoGo paradigm from Weinan (347 m) and Nyingchi (2950 m). A combination of dynamic FC (dFC) and the K-means cluster was employed to extract dynamic FC states which were later distinguished by graph metrics. Besides, temporal properties of networks such as fractional windows (FW), transition numbers (TN) and mean dwell time (MDT) were calculated. Finally, we successfully extracted two different states from dFC matrices where State 1 was verified to have higher functional integration and segregation. The dFC states dynamically switched during the Go/NoGo tasks and the FW of State 1 showed a rise in the high-altitude participants. Also, in the regional analysis, we found higher state deviation in the fronto-parietal cortices and enhanced FC strength in the occipital lobe. These results demonstrated that long-term exposure to the high-altitude environment could lead brain networks to reorganize as networks with higher inter- and intra-networks information transfer efficiency, which could be attributed to a compensatory mechanism to the compromised brain function due to the plateau environment. This study provides a new perspective in considering how the plateau impacted cognitive impairment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:血管生成在结肠癌(CC)进展中起重要作用。
    目的:研究肿瘤微环境(TME)和肿瘤内血管生成亚型(AGS)的微生物,探索CC抗血管生成治疗的潜在靶点。
    方法:数据来自癌症基因组图谱数据库和基因表达综合数据库。K均值聚类用于构建AGS。基于两种亚型之间的差异基因构建了预后模型。单细胞分析用于分析SLC2A3在CC中不同细胞的表达水平。通过免疫荧光验证。其生物学功能在HUVECs中得到进一步探索。
    结果:CC样本分为两个AGS(AGS-A和AGS-B)组,AGS-B组患者预后不良。进一步分析发现AGS-B组有较高的TME免疫细胞浸润,但也表现出高度的免疫逃逸。两种亚型之间的肿瘤内微生物也不同。一个方便的6基因血管生成相关标记(ARS),建立识别AGS并预测CC患者的预后。选择SLC2A3作为ARS的代表基因,在内皮细胞中表达较高,并促进HUVECs的迁移。
    结论:我们的研究确定了两种预后不同的AGS,TME,和肿瘤内微生物组成,这可以为CC对预后的影响提供潜在的解释。进一步构建了可靠的ARS模型,这可以指导个性化治疗。SLC2A3可能是抗血管生成治疗的潜在靶点。
    BACKGROUND: Angiogenesis plays an important role in colon cancer (CC) progression.
    OBJECTIVE: To investigate the tumor microenvironment (TME) and intratumor microbes of angiogenesis subtypes (AGSs) and explore potential targets for antiangiogenic therapy in CC.
    METHODS: The data were obtained from The Cancer Genome Atlas database and Gene Expression Omnibus database. K-means clustering was used to construct the AGSs. The prognostic model was constructed based on the differential genes between two subtypes. Single-cell analysis was used to analyze the expression level of SLC2A3 on different cells in CC, which was validated by immunofluorescence. Its biological functions were further explored in HUVECs.
    RESULTS: CC samples were grouped into two AGSs (AGS-A and AGS-B) groups and patients in the AGS-B group had poor prognosis. Further analysis revealed that the AGS-B group had high infiltration of TME immune cells, but also exhibited high immune escape. The intratumor microbes were also different between the two subtypes. A convenient 6-gene angiogenesis-related signature (ARS), was established to identify AGSs and predict the prognosis in CC patients. SLC2A3 was selected as the representative gene of ARS, which was higher expressed in endothelial cells and promoted the migration of HUVECs.
    CONCLUSIONS: Our study identified two AGSs with distinct prognoses, TME, and intratumor microbial compositions, which could provide potential explanations for the impact on the prognosis of CC. The reliable ARS model was further constructed, which could guide the personalized treatment. The SLC2A3 might be a potential target for antiangiogenic therapy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:先前的分型方法无法为肝外胆总管囊肿(ECC)的手术复杂性提供预测性见解。本研究旨在通过对成像结果的聚类,建立一种新的ECC分类系统。此外,它旨在比较已确定的ECC类型之间的差异,并评估手术难度的水平。
    方法:通过K均值聚类分析对124例患者的影像学数据进行自动分组。根据新分组的特点,进行了纠正和干预,以建立新的分类。人口统计数据,临床表现,手术参数,并发症,再操作,并根据不同类型对预后指标进行分析。还评估了导致手术时间延长的因素。
    结果:ECC的新分类系统:类型A(上段),B型(中段),C型(下段),和D型(整个胆管)。合并症(结石或感染)的发生率差异有统计学意义(P=0.000,P=0.002)。此外,术后胆管炎发生率差异有统计学意义(P=0.046).两组手术时间差异有统计学意义(P=0.001)。年龄,BMI>30,分类,合并结石的存在与手术时间延长显著相关(P=0.002,P=0.000,P=0.011,P=0.011)。
    结论:结论:我们利用机器学习驱动的聚类分析,创造了一种新颖的肝外胆管扩张类型学.这个分类,结合年龄等因素,联合结石发生,肥胖,显著影响腹腔镜胆总管囊肿手术的复杂性,为改进手术治疗提供有价值的见解。
    BACKGROUND: Prior typing methods fail to provide predictive insights into surgical complexities for extrahepatic choledochal cyst (ECC). This study aims to establish a new classification system for ECC through clustering of imaging results. Additionally, it seeks to compare the differences among the identified ECC types and assess the levels of surgical difficulty.
    METHODS: The imaging data of 124 patients were automatically grouped through a K-means clustering analysis. According to the characteristics of the new grouping, corrections and interventions were carried out to establish a new classification. Demographic data, clinical presentations, surgical parameters, complications, reoperation, and prognostic indicators were analyzed according to different types. Factors contributing to prolonged surgical time were also evaluated.
    RESULTS: A new classification system of ECC: Type A (upper segment), Type B (middle segment), Type C (lower segment), and Type D (entire bile duct). The incidences of comorbidities (calculus or infection) were significantly different (P = 0.000, P = 0.002). Additionally, variations in the incidence of postoperative biliary stricture were statistically significant (P = 0.046). The operative time was significantly different between groups (P = 0.001). Age, BMI > 30, classification, and the presence of combined stones exhibit a significant association with prolonged operative time (P = 0.002, P = 0.000, P = 0.011, P = 0.011).
    CONCLUSIONS: In conclusion, our utilization of machine learning-driven cluster analysis has enabled the creation of a novel extrahepatic biliary dilatation typology. This classification, in conjunction with factors like age, combined stone occurrence, and obesity, significantly influences the complexity of laparoscopic choledochal cyst surgery, offering valuable insights for improved surgical treatment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    工业园区的水回用设施面临着管理越来越多的废水源作为其入口水的挑战。通常,这种聚类结果是由具有广泛专业知识的工程师设计的。本文介绍了无监督学习方法在中国中水回用站进水分类中的创新应用,旨在减少对工程师经验的依赖。“水质距离”的概念被纳入三种无监督学习聚类算法(K-means,DBSCAN,和AGNES),通过六个案例研究进行了验证。在这六个案例中,三个被用来说明无监督学习聚类算法的可行性。结果表明,与人工聚类和基于ChatGPT的聚类相比,该聚类算法具有更大的稳定性和优越性。其余三个案例用于展示三种聚类算法的可靠性。研究结果表明,AGNES算法显示出优越的潜在应用能力。6例K-means的平均纯度,DBSCAN,和AGNES分别为0.947、0.852和0.955。
    The water reuse facilities of industrial parks face the challenge of managing a growing variety of wastewater sources as their inlet water. Typically, this clustering outcome is designed by engineers with extensive expertise. This paper presents an innovative application of unsupervised learning methods to classify inlet water in Chinese water reuse stations, aiming to reduce reliance on engineer experience. The concept of \'water quality distance\' was incorporated into three unsupervised learning clustering algorithms (K-means, DBSCAN, and AGNES), which were validated through six case studies. Of the six cases, three were employed to illustrate the feasibility of the unsupervised learning clustering algorithm. The results indicated that the clustering algorithm exhibited greater stability and excellence compared to both artificial clustering and ChatGPT-based clustering. The remaining three cases were utilized to showcase the reliability of the three clustering algorithms. The findings revealed that the AGNES algorithm demonstrated superior potential application ability. The average purity in six cases of K-means, DBSCAN, and AGNES were 0.947, 0.852, and 0.955, respectively.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    被生物心理社会框架告知,我们的研究使用中国纵向健康长寿调查(CLHLS)数据集来检查年龄最大(80岁以上)人群的认知功能轨迹.采用K均值聚类,我们确定了两个潜在的群体:高稳定性(HS)和低稳定性(LS)。HS组保持满意的认知功能,而LS组表现出一贯的低功能。Lasso回归揭示了预测因素,包括社会经济地位,生物条件,心理健康,生活方式,心理,和行为因素。这种数据驱动的方法揭示了认知衰老模式,并为健康衰老提供了政策。我们的研究在这种情况下开创了非参数机器学习方法。
    Informed by the biopsychosocial framework, our study uses the Chinese Longitudinal Healthy Longevity Survey (CLHLS) dataset to examine cognitive function trajectories among the oldest-old (80+). Employing K-means clustering, we identified two latent groups: High Stability (HS) and Low Stability (LS). The HS group maintained satisfactory cognitive function, while the LS group exhibited consistently low function. Lasso regression revealed predictive factors, including socioeconomic status, biological conditions, mental health, lifestyle, psychological, and behavioral factors. This data-driven approach sheds light on cognitive aging patterns and informs policies for healthy aging. Our study pioneers non-parametric machine learning methods in this context.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    随着产业的转型升级,工业残留污染场地引起的环境问题日益突出。根据实际调查案例,本研究分析了铜和锌轧制行业剩余站点的土壤污染状况,发现超过筛选值的污染物包括铜,Ni,Zn,Pb,总石油烃和6种多环芳烃单体。基于传统的相关系数和空间分布等分析方法,结合SOM+K-means等机器学习方法,推测重金属Zn/Pb可能主要与锌轧制的生产历史有关。Cu/Ni可能主要来自铜轧制的生产历史。PAHs主要是由于熔融设备中化石燃料的不完全燃烧。据推测,TPH污染与工业使用期间和车辆停放后期的漏油有关。结果表明,传统分析方法能够快速识别场地污染物之间的相关性,而SOM+K-means机器学习方法可以进一步有效提取数据中复杂的隐藏关系,实现对现场监测数据的深度挖掘。
    With the transformation and upgrading of industries, the environmental problems caused by industrial residual contaminated sites are becoming increasingly prominent. Based on actual investigation cases, this study analyzed the soil pollution status of a remaining sites of the copper and zinc rolling industry, and found that the pollutants exceeding the screening values included Cu, Ni, Zn, Pb, total petroleum hydrocarbons and 6 polycyclic aromatic hydrocarbon monomers. Based on traditional analysis methods such as the correlation coefficient and spatial distribution, combined with machine learning methods such as SOM + K-means, it is inferred that the heavy metal Zn/Pb may be mainly related to the production history of zinc rolling. Cu/Ni may be mainly originated from the production history of copper rolling. PAHs are mainly due to the incomplete combustion of fossil fuels in the melting equipment. TPH pollution is speculated to be related to oil leakage during the industrial use period and later period of vehicle parking. The results showed that traditional analysis methods can quickly identify the correlation between site pollutants, while SOM + K-means machine learning methods can further effectively extract complex hidden relationships in data and achieve in-depth mining of site monitoring data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    基因表达数据通常是高维的,具有有限数量的样品,并且包含与感兴趣的疾病无关的许多特征。现有的无监督特征选择算法主要关注特征在维护数据结构中的重要性,而不考虑特征之间的冗余。确定重要特征的适当数量是另一个挑战。
    在本文中,我们提出了一种针对基因表达数据的聚类指导的无监督特征选择(CGUFS)算法,以解决这些问题。我们提出的算法对现有算法进行了三项改进。对于现有聚类算法需要人为指定聚类数量的问题,我们提出了一种自适应k值策略,通过迭代更新变化函数为每个样本分配适当的伪标签。对于现有算法未能考虑特征间冗余的问题,我们提出了一种特征分组策略来对高度冗余的特征进行分组。针对现有算法无法过滤冗余特征的问题,我们提出了一种自适应过滤策略,通过计算每个特征组的潜在有效特征和潜在冗余特征来确定要保留的特征组合。
    实验结果表明,C4.5分类器对CGUFS算法选择的最优特征的平均准确率(ACC)和matthews相关系数(MCC)指标分别达到74.37%和63.84%,分别,显著优于现有算法。
    同样,Adaboost分类器在CGUFS算法选择的最优特征上的平均ACC和MCC指数明显优于现有算法。此外,统计实验结果表明CGUFS算法与现有算法存在显著差异。
    UNASSIGNED: Gene expression data is typically high dimensional with a limited number of samples and contain many features that are unrelated to the disease of interest. Existing unsupervised feature selection algorithms primarily focus on the significance of features in maintaining the data structure while not taking into account the redundancy among features. Determining the appropriate number of significant features is another challenge.
    UNASSIGNED: In this paper, we propose a clustering-guided unsupervised feature selection (CGUFS) algorithm for gene expression data that addresses these problems. Our proposed algorithm introduces three improvements over existing algorithms. For the problem that existing clustering algorithms require artificially specifying the number of clusters, we propose an adaptive k-value strategy to assign appropriate pseudo-labels to each sample by iteratively updating a change function. For the problem that existing algorithms fail to consider the redundancy among features, we propose a feature grouping strategy to group highly redundant features. For the problem that the existing algorithms cannot filter the redundant features, we propose an adaptive filtering strategy to determine the feature combinations to be retained by calculating the potentially effective features and potentially redundant features of each feature group.
    UNASSIGNED: Experimental results show that the average accuracy (ACC) and matthews correlation coefficient (MCC) indexes of the C4.5 classifier on the optimal features selected by the CGUFS algorithm reach 74.37% and 63.84%, respectively, significantly superior to the existing algorithms.
    UNASSIGNED: Similarly, the average ACC and MCC indexes of the Adaboost classifier on the optimal features selected by the CGUFS algorithm are significantly superior to the existing algorithms. In addition, statistical experiment results show significant differences between the CGUFS algorithm and the existing algorithms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:先前研究了无监督机器学习方法与传统方法在评估饮食模式及其与高血压相关方面的有效性,结果相互矛盾。因此,我们的目的是探索高血压发病率与使用无监督机器学习技术提取的总体饮食模式之间的相关性.
    方法:数据来自2008年8月至2010年8月参加前瞻性队列研究的日本男性参与者。447名男性参与者的最终数据集用于分析。使用均匀流形近似和投影(UMAP)和随后的K均值聚类进行降维,以得出饮食模式。此外,多变量logistic回归用于评估饮食模式与高血压发病率之间的关系。
    结果:我们确定了四种饮食模式:低蛋白/纤维高糖,\'\'乳制品/蔬菜为主,\'\'肉类,\'和\'海鲜和酒精。\'与\'海鲜和酒精\'作为参考相比,在调整潜在的混杂因素后,高血压的保护性膳食模式为“乳制品/蔬菜为主”(OR0.39,95%CI0.19-0.80,P=0.013)和“肉类为主”(OR0.37,95%CI0.16-0.86,P=0.022),包括年龄,身体质量指数,吸烟,教育,身体活动,血脂异常,和糖尿病。年龄匹配的敏感性分析证实了这一发现。
    结论:这项研究发现,相对于“海鲜和酒精”模式,“乳制品/蔬菜为主”和“肉类为主”的饮食模式与男性高血压风险较低相关。
    OBJECTIVE: The previous studies that examined the effectiveness of unsupervised machine learning methods versus traditional methods in assessing dietary patterns and their association with incident hypertension showed contradictory results. Consequently, our aim is to explore the correlation between the incidence of hypertension and overall dietary patterns that were extracted using unsupervised machine learning techniques.
    METHODS: Data were obtained from Japanese male participants enrolled in a prospective cohort study between August 2008 and August 2010. A final dataset of 447 male participants was used for analysis. Dimension reduction using uniform manifold approximation and projection (UMAP) and subsequent K-means clustering was used to derive dietary patterns. In addition, multivariable logistic regression was used to evaluate the association between dietary patterns and the incidence of hypertension.
    RESULTS: We identified four dietary patterns: \'Low-protein/fiber High-sugar,\' \'Dairy/vegetable-based,\' \'Meat-based,\' and \'Seafood and Alcohol.\' Compared with \'Seafood and Alcohol\' as a reference, the protective dietary patterns for hypertension were \'Dairy/vegetable-based\' (OR 0.39, 95% CI 0.19-0.80, P = 0.013) and the \'Meat-based\' (OR 0.37, 95% CI 0.16-0.86, P = 0.022) after adjusting for potential confounding factors, including age, body mass index, smoking, education, physical activity, dyslipidemia, and diabetes. An age-matched sensitivity analysis confirmed this finding.
    CONCLUSIONS: This study finds that relative to the \'Seafood and Alcohol\' pattern, the \'Dairy/vegetable-based\' and \'Meat-based\' dietary patterns are associated with a lower risk of hypertension among men.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    With the increased use of online English courses, the quality of the course directly determines its efficacy. Recently, various industries have continuously employed Internet of Things (IoT) technology, which has considerable scene adaptability. To better supervise the specific content of English courses, we discuss how to apply multi-source mobile Internet of Things information technology to the practical evaluation system of English courses to boost the performance of English learning evaluation. Therefore, by analyzing the problems of existing English course evaluation and the characteristics of multi-source mobile Internet of Things information technology, this article designs an English course practical evaluation system based on multi-source data collection, processing, and analysis. The system can collect real-time student voices, behavior, and other data through mobile devices. Then, analyze the data using cloud computing and data mining technology and provide real-time learning progress and feedback. We can demonstrate that the accuracy of the evaluation system can reach 80.23%, which can effectively improve the efficiency of English learning evaluation, provide a new method for English teaching evaluation, and further improve and optimize the English education teaching content to meet the needs of the actual teaching environment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号