k-means

K - means
  • 文章类型: Journal Article
    泥岩和页岩是各种地球能源应用中的天然屏障岩。尽管许多研究已经调查了它们的机械性能,由于它们的细粒度性质和对样品制备过程中引入的微观结构损伤的敏感性,在微观尺度上表征这些参数仍然具有挑战性。本研究旨在通过结合高速纳米压痕映射和机器学习数据分析来研究泥岩中粘土基复合材料的微观力学性能。纳米压痕方法有效地捕获了高分辨率机械性能图中的异质性。利用基于机器学习的k均值聚类,基质粘土的力学特性,脆性矿物,以及对晶界和结构不连续性的测量(例如,裂缝)被成功区分。通过与宽离子束扫描电子显微镜图像的相关性验证了分类结果。粘土基质的平均还原弹性模量(Er)和硬度(H)值确定为16.2±6.2和0.5±0.5GPa,分别,显示不同测试设置和压头提示的一致性。此外,研究了压痕测量对各种因素的敏感性,揭示对压痕深度和尖端几何形状的有限敏感性(在较小范围的压痕深度变化中比较Cube角和Berkovich尖端时),但在较低的加载速率下稳定性下降。应用盒计数和自举方法来评估为粘土基质确定的参数的代表性。需要一个相对较小的数据集(缩进数=60)来实现代表性,而主要挑战是覆盖粘土基质表征的代表性绘图区域。总的来说,这项研究证明了高速纳米压痕映射与数据分析相结合的可行性,用于泥岩中粘土基质的微观力学表征,为类似细粒沉积岩的高效分析铺平了道路。
    在线版本包含补充材料,可在10.1007/s40948-024-00864-9获得。
    Mudstones and shales serve as natural barrier rocks in various geoenergy applications. Although many studies have investigated their mechanical properties, characterizing these parameters at the microscale remains challenging due to their fine-grained nature and susceptibility to microstructural damage introduced during sample preparation. This study aims to investigate the micromechanical properties of clay matrix composite in mudstones by combining high-speed nanoindentation mapping and machine learning data analysis. The nanoindentation approach effectively captured the heterogeneity in high-resolution mechanical property maps. Utilizing machine learning-based k-means clustering, the mechanical characteristics of matrix clay, brittle minerals, as well as measurements on grain boundaries and structural discontinuities (e.g., cracks) were successfully distinguished. The classification results were validated through correlation with broad ion beam-scanning electron microscopy images. The resulting average reduced elastic modulus (E r ) and hardness (H) values for the clay matrix were determined to be 16.2 ± 6.2 and 0.5 ± 0.5 GPa, respectively, showing consistency across different test settings and indenter tips. Furthermore, the sensitivity of indentation measurements to various factors was investigated, revealing limited sensitivity to indentation depth and tip geometry (when comparing Cube corner and Berkovich tip in a small range of indentation depth variations), but decreased stability at lower loading rates. Box counting and bootstrapping methods were applied to assess the representativeness of parameters determined for the clay matrix. A relatively small dataset (indentation number = 60) is needed to achieve representativeness, while the main challenges is to cover a representative mapping area for clay matrix characterization. Overall, this study demonstrates the feasibility of high-speed nanoindentation mapping combined with data analysis for micromechanical characterization of the clay matrix in mudstones, paving the way for efficient analysis of similar fine-grained sedimentary rocks.
    UNASSIGNED: The online version contains supplementary material available at 10.1007/s40948-024-00864-9.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    每个工作环境都包含不同类型的风险以及风险之间的相互作用。因此,进行风险评估时使用的方法非常重要。在确定使用哪种风险评估方法(RAM)时,有许多因素,例如工作环境中的风险类型,这些风险之间的相互作用,以及他们与员工的距离。虽然有许多RAM可用,没有适合所有工作场所的RAM,选择哪种方法是最大的问题。在这个问题上没有国际公认的规模或趋势。在研究中,26个部门,确定了10种不同的RAM和10种标准。设计了一种混合方法,通过使用k均值聚类和支持向量机(SVM)分类算法来确定最适合扇区的RAM,这是机器学习(ML)算法。首先,使用k-means算法将数据集划分为子集。然后,SVM算法在具有不同特征的所有子集上运行。最后,将所有子集的结果合并,得到整个数据集的结果。因此,而不是为影响整个集群的单个和大型集群确定的阈值,并且对所有集群都是强制性的,通过根据每个子集群的特征为其确定单独的阈值来创建灵活的结构.这样,通过为部门选择最合适的RAM,并从人力中消除选择阶段的行政和软件问题,提供了机器支持。该方法的第一个比较结果是混合方法:96.63%,k-means:90.63和SVM:94.68%。在与五种不同的ML算法进行的第二次比较中,人工神经网络(ANN)的结果:87.44%,天真贝叶斯(NB):91.29%,决策树(DT):89.25%,随机森林(RF):81.23%,k近邻(KNN):85.43%。
    Every work environment contains different types of risks and interactions between risks. Therefore, the method to be used when making a risk assessment is very important. When determining which risk assessment method (RAM) to use, there are many factors such as the types of risks in the work environment, the interactions of these risks with each other, and their distance from the employees. Although there are many RAMs available, there is no RAM that will suit all workplaces and which method to choose is the biggest question. There is no internationally accepted scale or trend on this subject. In the study, 26 sectors, 10 different RAMs and 10 criteria were determined. A hybrid approach has been designed to determine the most suitable RAMs for sectors by using k-means clustering and support vector machine (SVM) classification algorithms, which are machine learning (ML) algorithms. First, the data set was divided into subsets with the k-means algorithm. Then, the SVM algorithm was run on all subsets with different characteristics. Finally, the results of all subsets were combined to obtain the result of the entire dataset. Thus, instead of the threshold value determined for a single and large cluster affecting the entire cluster and being made mandatory for all of them, a flexible structure was created by determining separate threshold values for each sub-cluster according to their characteristics. In this way, machine support was provided by selecting the most suitable RAMs for the sectors and eliminating the administrative and software problems in the selection phase from the manpower. The first comparison result of the proposed method was found to be the hybrid method: 96.63%, k-means: 90.63 and SVM: 94.68%. In the second comparison made with five different ML algorithms, the results of the artificial neural networks (ANN): 87.44%, naive bayes (NB): 91.29%, decision trees (DT): 89.25%, random forest (RF): 81.23% and k-nearest neighbours (KNN): 85.43% were found.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:由于临床,功能,和结构参数。虽然这个群体存在显著的变异性,特别是在全膝关节置换术的候选人中,矫形外科医师对膝关节运动学的兴趣日益增加,其目的是寻求更个性化的方法来获得更好的结果和满意度。这项研究的主要目的是鉴定全膝关节置换术候选人中不同的运动学表型,并比较鉴定这些表型的不同方法。
    方法:使用从临床跑步机步行期间的膝关节运动成像检查获得的三维运动学数据。对聚类过程的各个方面进行了评估和比较,以实现最佳聚类,包括数据准备,改造,和表示方法。
    结果:K-Means聚类算法,使用欧几里德距离执行,结合主成分分析应用于标准化转化的数据,是最佳方法。在80名全膝关节置换术候选人中鉴定出两种独特的运动学表型。两种不同的表型将在膝关节运动学表现和临床结果方面均存在显着差异的患者分开。在77.33%的步态周期中,包括63.3%的额叶平面特征和81.8%的横向平面特征的显著变化,以及疼痛突变量表的差异,强调这些运动学变化对患者疼痛和功能的影响。
    结论:这项研究的结果为临床医生提供了有价值的见解,以开发基于患者表型的个性化治疗方法,最终有助于改善全膝关节置换术的结果。
    BACKGROUND: Characterizing the condition of patients suffering from knee osteoarthritis is complex due to multiple associations between clinical, functional, and structural parameters. While significant variability exists within this population, especially in candidates for total knee arthroplasty, there is increasing interest in knee kinematics among orthopedic surgeons aiming for more personalized approaches to achieve better outcomes and satisfaction. The primary objective of this study was to identify distinct kinematic phenotypes in total knee arthroplasty candidates and to compare different methods for the identification of these phenotypes.
    METHODS: Three-dimensional kinematic data obtained from a Knee Kinesiography exam during treadmill walking in the clinic were used. Various aspects of the clustering process were evaluated and compared to achieve optimal clustering, including data preparation, transformation, and representation methods.
    RESULTS: A K-Means clustering algorithm, performed using Euclidean distance, combined with principal component analysis applied on data transformed by standardization, was the optimal approach. Two unique kinematic phenotypes were identified among 80 total knee arthroplasty candidates. The two distinct phenotypes divided patients who significantly differed both in terms of knee kinematic representation and clinical outcomes, including a notable variation in 63.3% of frontal plane features and 81.8% of transverse plane features across 77.33% of the gait cycle, as well as differences in the Pain Catastrophizing Scale, highlighting the impact of these kinematic variations on patient pain and function.
    CONCLUSIONS: Results from this study provide valuable insights for clinicians to develop personalized treatment approaches based on patients\' phenotype affiliation, ultimately helping to improve total knee arthroplasty outcomes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    随着大型储能电池系统的广泛应用,对电池安全的需求正在上升。研究如何及早发现电池异常并减少热失控(TR)事故的发生变得尤为重要。现有关于电池TR预警算法的研究主要可分为两类:模型驱动和数据驱动。然而,常见的模型驱动方法通常很复杂,通用性差,预警能力低;常见的数据驱动方法大多基于神经网络,需要大量的培训费用,具有更好的预警能力,但误报警概率更高。为了解决现有工程的局限性,本文提出了一种基于数据驱动和基于模型的组合算法,用于准确的电池TR警告。具体来说,K-Means算法作为数据驱动模块,捕获电池数据中的异常值,Bernardi方程作为用于评估电池温度的模型驱动模块。最终,将加权模型驱动模块和数据驱动模块的输出进行组合,综合评估电池是否异常。所提出的算法结合了模型驱动和数据驱动方法的优点,实现25分钟的热失控提前警告,误报的概率大大降低。
    With the increasingly widespread application of large-scale energy storage battery systems, the demand for battery safety is rising. Research on how to detect battery anomalies early and reduce the occurrence of thermal runaway (TR) accidents has become particularly important. Existing research on battery TR warning algorithms can be mainly divided into two categories: model-driven and data-driven methods. However, the common model-driven methods are often of high complexity, with poor versatility and low early warning capability; and the common data-driven methods are mostly based on neural networks, requiring substantial training costs, with better early warning capabilities but higher false alarm probabilities. To address the limitations of existing works, this paper proposes a combined data-driven and model-based algorithm for accurate battery TR warnings. Specifically, the K-Means algorithm serves as the data-driven module, capturing outliers in battery data, and the Bernardi equation serves as the model-driven module used to evaluate battery temperature. Ultimately, the outputs of the weighted model-driven module and data-driven module are combined to comprehensively assess whether the battery is abnormal. The proposed algorithm combines the advantages of model-driven and data-driven approaches, achieving a 25 min advance warning for thermal runaway, with a significantly reduced probability of false alarms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    慢性暴露于高原低压低氧环境可能会影响人类的认知行为,这受到动态大脑连接状态的支持。直到现在,大脑网络的功能连接(FC)如何随海拔变化尚不清楚.在这篇文章中,我们使用了渭南(347m)和林芝(2950m)的Go/NoGo范例的EEG数据。动态FC(dFC)和K均值聚类的组合用于提取动态FC状态,这些状态后来通过图度量进行区分。此外,网络的时间属性,如分数窗口(FW),计算过渡数(TN)和平均停留时间(MDT)。最后,我们成功地从dFC矩阵中提取了两个不同的状态,其中状态1被验证具有更高的功能集成和隔离。在Go/NoGo任务期间,dFC状态动态切换,状态1的FW显示高空参与者人数上升。此外,在区域分析中,我们发现额顶叶皮质的状态偏差较高,枕叶的FC强度增强。这些结果表明,长期暴露于高海拔环境可能导致大脑网络重组为网络间和网络内信息传递效率较高的网络,这可以归因于高原环境导致大脑功能受损的补偿机制。本研究为思考高原如何影响认知障碍提供了一个新的视角。
    Chronic exposure to the hypobaric hypoxia environment of plateau could influence human cognitive behaviours which are supported by dynamic brain connectivity states. Until now, how functional connectivity (FC) of the brain network changes with altitudes is still unclear. In this article, we used EEG data of the Go/NoGo paradigm from Weinan (347 m) and Nyingchi (2950 m). A combination of dynamic FC (dFC) and the K-means cluster was employed to extract dynamic FC states which were later distinguished by graph metrics. Besides, temporal properties of networks such as fractional windows (FW), transition numbers (TN) and mean dwell time (MDT) were calculated. Finally, we successfully extracted two different states from dFC matrices where State 1 was verified to have higher functional integration and segregation. The dFC states dynamically switched during the Go/NoGo tasks and the FW of State 1 showed a rise in the high-altitude participants. Also, in the regional analysis, we found higher state deviation in the fronto-parietal cortices and enhanced FC strength in the occipital lobe. These results demonstrated that long-term exposure to the high-altitude environment could lead brain networks to reorganize as networks with higher inter- and intra-networks information transfer efficiency, which could be attributed to a compensatory mechanism to the compromised brain function due to the plateau environment. This study provides a new perspective in considering how the plateau impacted cognitive impairment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的: 用于立体定向放射外科(SRS)的单等中心多目标技术可以缩短治疗时间,但由于潜在的旋转误差而有可能损害剂量覆盖率。将目标聚类为两组可以减少等中心-目标距离,减轻旋转不确定性。然而,缺乏对SRS聚类算法的综合评估。本研究通过引入SRS目标聚类框架(Framework)来解决这一差距,一个综合工具,利用常用的聚类算法来生成有效的集群配置。 方法。该框架基于两个关键指标结合了四个不同的优化目标:等中心-目标距离以及该距离与目标半径的比率。对于minimax和加权minimax目标,采用凝聚和加权凝聚聚类,分别。K均值和加权k均值用于平方和和加权平方和目标。我们将框架应用于126个SRS计划,将结果与通过蛮力算法获得的地面实况解进行比较。 主要结果。 对于minimax目标,聚集聚类的平均最大等中心-目标距离(4.8cm)略高于地面实况(4.6cm)。同样,加权聚集聚类的平均最大比率为15.1,而实际情况为14.6。值得注意的是,k-means和加权k-means聚类显示与平均均方根目标等中心距离和比值(分别为3.6cm和11.1)的地面实况非常一致(精度在0.1以内)。 意义。 这些结果证明了框架在为SRS目标生成集群方面的有效性。所提出的方法有可能成为SRS治疗计划中的有价值的工具。此外,这项研究首次研究了用于最小化SRS中最大和平方和不确定性的聚类算法. .
    Objective. Single-isocenter-multiple-target technique for stereotactic radiosurgery (SRS) can reduce treatment duration but risks compromised dose coverage due to potential rotational errors. Clustering targets into two groups can reduce isocenter-target distances, mitigating the impact of rotational uncertainty. However, a comprehensive evaluation of clustering algorithms for SRS is absent. This study addresses this gap by introducing the SRS Target Clustering Framework (Framework), a comprehensive tool that utilizes commonly used clustering algorithms to generate efficient cluster configurations.Approach. The Framework incorporates four distinct optimization objectives based on two key metrics: the isocenter-target distance and the ratio of this distance to the target radius. Agglomerative and weighted agglomerative clustering are employed for minimax and weighted minimax objectives, respectively. K-means and weighted k-means are utilized for sum-of-squares and weighted sum-of-squares objectives. We applied the Framework to 126 SRS plans, comparing results to ground truth solutions obtained through a brute force algorithm.Main results. For the minimax objective, the average maximum isocenter-target distance from agglomerative clustering (4.8 cm) was slightly higher than the ground truth (4.6 cm). Similarly, the weighted agglomerative clustering achieved an average maximum ratio of 15.1 compared to the ground truth of 14.6. Notably, both k-means and weighted k-means clustering showed close agreement (within a precision of 0.1) with the ground truth for average root-mean-square target-isocenter distance and ratio (3.6 cm and 11.1, respectively).Significance. These results demonstrate the Framework\'s effectiveness in generating clusters for SRS targets. The proposed approach has the potential to become a valuable tool in SRS treatment planning. Furthermore, this study is the first to investigate clustering algorithms for both minimizing maximum and sum-of-squares uncertainty in SRS.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:血管生成在结肠癌(CC)进展中起重要作用。
    目的:研究肿瘤微环境(TME)和肿瘤内血管生成亚型(AGS)的微生物,探索CC抗血管生成治疗的潜在靶点。
    方法:数据来自癌症基因组图谱数据库和基因表达综合数据库。K均值聚类用于构建AGS。基于两种亚型之间的差异基因构建了预后模型。单细胞分析用于分析SLC2A3在CC中不同细胞的表达水平。通过免疫荧光验证。其生物学功能在HUVECs中得到进一步探索。
    结果:CC样本分为两个AGS(AGS-A和AGS-B)组,AGS-B组患者预后不良。进一步分析发现AGS-B组有较高的TME免疫细胞浸润,但也表现出高度的免疫逃逸。两种亚型之间的肿瘤内微生物也不同。一个方便的6基因血管生成相关标记(ARS),建立识别AGS并预测CC患者的预后。选择SLC2A3作为ARS的代表基因,在内皮细胞中表达较高,并促进HUVECs的迁移。
    结论:我们的研究确定了两种预后不同的AGS,TME,和肿瘤内微生物组成,这可以为CC对预后的影响提供潜在的解释。进一步构建了可靠的ARS模型,这可以指导个性化治疗。SLC2A3可能是抗血管生成治疗的潜在靶点。
    BACKGROUND: Angiogenesis plays an important role in colon cancer (CC) progression.
    OBJECTIVE: To investigate the tumor microenvironment (TME) and intratumor microbes of angiogenesis subtypes (AGSs) and explore potential targets for antiangiogenic therapy in CC.
    METHODS: The data were obtained from The Cancer Genome Atlas database and Gene Expression Omnibus database. K-means clustering was used to construct the AGSs. The prognostic model was constructed based on the differential genes between two subtypes. Single-cell analysis was used to analyze the expression level of SLC2A3 on different cells in CC, which was validated by immunofluorescence. Its biological functions were further explored in HUVECs.
    RESULTS: CC samples were grouped into two AGSs (AGS-A and AGS-B) groups and patients in the AGS-B group had poor prognosis. Further analysis revealed that the AGS-B group had high infiltration of TME immune cells, but also exhibited high immune escape. The intratumor microbes were also different between the two subtypes. A convenient 6-gene angiogenesis-related signature (ARS), was established to identify AGSs and predict the prognosis in CC patients. SLC2A3 was selected as the representative gene of ARS, which was higher expressed in endothelial cells and promoted the migration of HUVECs.
    CONCLUSIONS: Our study identified two AGSs with distinct prognoses, TME, and intratumor microbial compositions, which could provide potential explanations for the impact on the prognosis of CC. The reliable ARS model was further constructed, which could guide the personalized treatment. The SLC2A3 might be a potential target for antiangiogenic therapy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    这项研究检查了美国国立卫生研究院(NIH)资助对皮肤科医生的出版物选择的影响,特别是在期刊层和付费发布(P2P)与免费发布(F2P)模式方面。利用k均值聚类进行基于SCImago期刊排名的期刊排名,h-index,和影响因子,期刊分为三层,分析了2021年至2023年的54,530份皮肤病学出版物。根据蓝岭医学研究所的排名,作者被列为NIH资助最高或非NIH资助最高。研究发现出版模式存在显著差异,美国国立卫生研究院资助的顶级研究人员在一级期刊上展示了P2P和F2P模型的平衡使用,而他们更喜欢二级和三级期刊上的F2P模型。非顶级NIH资助作者,然而,在所有层更频繁地选择P2P模型。这些数据表明,美国国立卫生研究院的资助允许研究人员更大的灵活性,在更高层次的期刊上发表,尽管有出版费用,同时在较低级别的期刊中优先考虑F2P模型。这种模式表明,资金状况在战略出版决策中起着至关重要的作用,潜在影响研究可见性和后续资金。该研究的皮肤病学重点限制了更广泛的适用性,保证进一步研究以探索其他因素,如地理位置,作者性别,和研究设计。
    This study examines the influence of National Institutes of Health (NIH) funding on the publication choices of dermatologists, particularly in terms of journal tiers and pay-to-publish (P2P) versus free-to-publish (F2P) models. Utilizing k-means clustering for journal ranking based on SCImago Journal Rank, h-index, and Impact Factor, journals were categorized into three tiers and 54,530 dermatology publications from 2021 to 2023 were analyzed. Authors were classified as Top NIH Funded or Non-Top NIH Funded according to Blue Ridge Institute for Medical Research rankings. The study finds significant differences in publication patterns, with Top NIH Funded researchers in Tier I journals demonstrating a balanced use of P2P and F2P models, while they preferred F2P models in Tier II and III journals. Non-Top NIH Funded authors, however, opted for P2P models more frequently across all tiers. These data suggest NIH funding allows researchers greater flexibility to publish in higher-tier journals despite publication fees, while prioritizing F2P models in lower-tier journals. Such a pattern indicates that funding status plays a critical role in strategic publication decisions, potentially impacting research visibility and subsequent funding. The study\'s dermatology focus limits broader applicability, warranting further research to explore additional factors like geographic location, author gender, and research design.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    随着在线物业租售平台的快速增长,假房地产上市的盛行已经成为一个重要的问题。这些欺骗性的清单浪费了买卖双方的时间和精力,并带来了潜在的风险。因此,开发区分真假上市的有效方法至关重要。准确识别虚假房地产列表是一个关键的挑战,聚类分析可以显著改善这一过程。虽然聚类已被广泛用于检测各个领域的欺诈,它在房地产领域的应用受到了一定的限制,主要集中在拍卖和财产评估上。这项研究旨在通过使用聚类来填补这一空白,根据行业专家策划的数据集将属性分类为虚假和真实列表。这项研究开发了一个K均值模型,将属性分组为集群,明确区分虚假和真实的清单。为了保证训练数据的质量,在原始数据集上执行数据预处理程序.使用了几种技术来确定K均值模型的每个参数的最佳值。使用轮廓系数确定聚类,Calinski-Harabasz指数,和戴维斯-博尔丁指数。发现与重叠相似性和Jaccard距离相比,聚类2的值是最好的,而Camberra技术是最好的方法。使用两种机器学习算法评估聚类结果:随机森林和决策树。观测结果表明,优化后的K-means显著提高了随机森林分类模型的准确性,将其提高了令人印象深刻的96%。此外,这项研究表明,聚类有助于创建一个包含虚假和真实聚类的平衡数据集。这个平衡的数据集为未来的调查提供了希望,特别是对于需要平衡数据才能最佳执行的深度学习模型。本研究通过利用聚类分析的力量,提出了一种实用有效的方法来识别虚假房地产列表,最终有助于建立一个更值得信赖和安全的房地产市场。
    With the rapid growth of online property rental and sale platforms, the prevalence of fake real estate listings has become a significant concern. These deceptive listings waste time and effort for buyers and sellers and pose potential risks. Therefore, developing effective methods to distinguish genuine from fake listings is crucial. Accurately identifying fake real estate listings is a critical challenge, and clustering analysis can significantly improve this process. While clustering has been widely used to detect fraud in various fields, its application in the real estate domain has been somewhat limited, primarily focused on auctions and property appraisals. This study aims to fill this gap by using clustering to classify properties into fake and genuine listings based on datasets curated by industry experts. This study developed a K-means model to group properties into clusters, clearly distinguishing between fake and genuine listings. To assure the quality of the training data, data pre-processing procedures were performed on the raw dataset. Several techniques were used to determine the optimal value for each parameter of the K-means model. The clusters are determined using the Silhouette coefficient, the Calinski-Harabasz index, and the Davies-Bouldin index. It was found that the value of cluster 2 is the best and the Camberra technique is the best method when compared to overlapping similarity and Jaccard for distance. The clustering results are assessed using two machine learning algorithms: Random Forest and Decision Tree. The observational results have shown that the optimized K-means significantly improves the accuracy of the Random Forest classification model, boosting it by an impressive 96%. Furthermore, this research demonstrates that clustering helps create a balanced dataset containing fake and genuine clusters. This balanced dataset holds promise for future investigations, particularly for deep learning models that require balanced data to perform optimally. This study presents a practical and effective way to identify fake real estate listings by harnessing the power of clustering analysis, ultimately contributing to a more trustworthy and secure real estate market.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:先前的分型方法无法为肝外胆总管囊肿(ECC)的手术复杂性提供预测性见解。本研究旨在通过对成像结果的聚类,建立一种新的ECC分类系统。此外,它旨在比较已确定的ECC类型之间的差异,并评估手术难度的水平。
    方法:通过K均值聚类分析对124例患者的影像学数据进行自动分组。根据新分组的特点,进行了纠正和干预,以建立新的分类。人口统计数据,临床表现,手术参数,并发症,再操作,并根据不同类型对预后指标进行分析。还评估了导致手术时间延长的因素。
    结果:ECC的新分类系统:类型A(上段),B型(中段),C型(下段),和D型(整个胆管)。合并症(结石或感染)的发生率差异有统计学意义(P=0.000,P=0.002)。此外,术后胆管炎发生率差异有统计学意义(P=0.046).两组手术时间差异有统计学意义(P=0.001)。年龄,BMI>30,分类,合并结石的存在与手术时间延长显著相关(P=0.002,P=0.000,P=0.011,P=0.011)。
    结论:结论:我们利用机器学习驱动的聚类分析,创造了一种新颖的肝外胆管扩张类型学.这个分类,结合年龄等因素,联合结石发生,肥胖,显著影响腹腔镜胆总管囊肿手术的复杂性,为改进手术治疗提供有价值的见解。
    BACKGROUND: Prior typing methods fail to provide predictive insights into surgical complexities for extrahepatic choledochal cyst (ECC). This study aims to establish a new classification system for ECC through clustering of imaging results. Additionally, it seeks to compare the differences among the identified ECC types and assess the levels of surgical difficulty.
    METHODS: The imaging data of 124 patients were automatically grouped through a K-means clustering analysis. According to the characteristics of the new grouping, corrections and interventions were carried out to establish a new classification. Demographic data, clinical presentations, surgical parameters, complications, reoperation, and prognostic indicators were analyzed according to different types. Factors contributing to prolonged surgical time were also evaluated.
    RESULTS: A new classification system of ECC: Type A (upper segment), Type B (middle segment), Type C (lower segment), and Type D (entire bile duct). The incidences of comorbidities (calculus or infection) were significantly different (P = 0.000, P = 0.002). Additionally, variations in the incidence of postoperative biliary stricture were statistically significant (P = 0.046). The operative time was significantly different between groups (P = 0.001). Age, BMI > 30, classification, and the presence of combined stones exhibit a significant association with prolonged operative time (P = 0.002, P = 0.000, P = 0.011, P = 0.011).
    CONCLUSIONS: In conclusion, our utilization of machine learning-driven cluster analysis has enabled the creation of a novel extrahepatic biliary dilatation typology. This classification, in conjunction with factors like age, combined stone occurrence, and obesity, significantly influences the complexity of laparoscopic choledochal cyst surgery, offering valuable insights for improved surgical treatment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号