k-means clustering

k - 均值聚类
  • 文章类型: Journal Article
    Water ecological restoration zoning, which involves articulating goals for restoring water ecosystems upwards and guiding the spatial layout of restoration projects downwards, is key to achieving systematic restoration of water resource elements. There are many challenges in water ecological restoration zoning, including disparate hierarchical systems, incomplete indicators, and vague boundaries. With Guangxi Hechi, a karst ecologically fragile region, as a case, we developed a multidimensional zoning system framework based on \"watershed natural unit-dominant ecological function-ecological stress risk\". The first-level zoning employed river systems and geomorphic types as indicators and delineated the sub-watershed unit as the boundary. The second-level zoning adopted a \"top-down\" division method to clarify the goal of water ecological restoration based on watershed natural geography and select three indicators (water conservation, biodiversity, and landscape cultural services) for evaluation. We used the K-means clustering method to identify dominant ecological functions in spatial units, with the sub-watershed unit demarcating second-level zoning boundaries. The third-level zoning was the specific implementation unit for ecological restoration projects. We used three indicators (soil erosion, flooding risk, and human interference) to characterize water ecosystem risk from external coercion, and defined the third-level zoning. We delineated 11 primary water ecological zones, four secondary zones, and three tertiary zones. Synthesizing tertiary zoning results accounted for spatial differentiation characteristics of watershed natural geography, dominant ecological functions, and ecological coercion risks, and combining sub-watershed and township administrative units determined zoning boundaries, water ecological restoration zoning was comprehensively classified into five categories and 32 sub-ecological zones. Corresponding ecological restoration strategies were proposed based on zoning and classification.
    水生态修复分区向上衔接水生态系统修复目标,向下引导修复工程的空间布局,是实现水资源要素系统修复的关键。当前,水生态修复分区面临层级体系不一、指标不全面、边界模糊等问题,因此,本研究以喀斯特生态脆弱区广西河池市为例,构建“流域自然单元-主导生态功能-生态胁迫风险”的多维分区体系框架。一级分区选取河流水系和地貌类型作为划分指标,识别子流域单元作为一级分区边界;二级分区采取“自上而下”的划分方法,在承接流域自然地理特征的基础上,明确水生态修复的目标,选取水源涵养、生物多样性和景观文化服务3项指标进行评价,利用K-means聚类方法对空间单元内的主导生态功能进行识别,以子流域单元作为二级分区边界;三级分区是生态修复工程的具体实施单元,选取水土流失、洪涝风险和人类干扰3项指标表征水生态系统面临的外部胁迫风险,进行三级分区划定。共划定11个水生态一级分区、4个二级分区和3个三级分区,综合流域自然地理格局、主导生态功能、生态胁迫风险的空间分异特征的三级分区结果,并结合子流域单元和乡镇行政单元确定分区边界,最终将水生态修复分区综合划分为5类、32个子生态分区,并分区、分类提出相应的生态修复策略。.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在过去的几年里,COVID-19疫情已经在全球蔓延。人们已经习惯了小说的标准,这包括在家工作,网上聊天,保持自己的清洁,阻止COVID-19的传播。由于这个原因,许多公共场所努力确保他们的访客佩戴适当的口罩,并保持彼此的安全距离。监控人员不可能确保每个人都戴着口罩;自动化解决方案是识别和监控口罩的更好选择,以帮助控制公众行为并减少COVID-19的流行。开发这项技术的动机是需要识别那些露出面孔的人。以前发表的大多数研究出版物都集中在各种方法上。这项研究建立了新的方法,即K-medoids,K-means,和模糊K均值(FKM)来使用图像预处理来获得更好的人脸质量并减少噪声数据。此外,这项研究调查了各种机器学习模型卷积神经网络(CNN)与预训练(DenseNet201,VGG-16和VGG-19)模型,和支持向量机(SVM)的人脸检测。所提出的方法K-medoids与预训练模型DenseNet201的实验结果取得了97.7%的准确率最佳结果。我们的研究结果表明,图像的分割可以提高识别的准确性。更重要的是,当面罩识别工具能够以侧面方式识别面罩时,该面罩识别工具是更有益的。
    Over the last several years, the COVID-19 epidemic has spread over the globe. People have become used to the novel standard, which involves working from home, chatting online, and keeping oneself clean, to stop the spread of COVID-19. Due to this, many public spaces make an effort to make sure that their visitors wear proper face masks and maintain a safe distance from one another. It is impossible for monitoring workers to ensure that everyone is wearing a face mask; automated solutions are a far better option for face mask identification and monitoring to assist control public conduct and reduce the COVID-19 epidemic. The motivation for developing this technology was the need to identify those individuals who uncover their faces. Most of the previously published research publications focused on various methodologies. This study built new methods namely K-medoids, K-means, and Fuzzy K-Means(FKM) to use image pre-processing to get the better quality of the face and reduce the noise data. In addition, this study investigates various machine learning models Convolutional neural networks (CNN) with pre-trained (DenseNet201, VGG-16, and VGG-19) models, and Support Vector Machine (SVM) for the detection of face masks. The experimental results of the proposed method K-medoids with pre-trained model DenseNet201 achieved the 97.7 % accuracy best results for face mask identification. Our research results indicate that the segmentation of images may improve the identification of accuracy. More importantly, the face mask identification tool is more beneficial when it can identify the face mask in a side-on approach.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    消费者经常面临缺乏有关超市中苹果质量的信息。一般外观因素,比如颜色,机械损伤,或者微生物攻击,影响消费者是否购买或拒绝苹果的决定。最近,被称为电子鼻的设备基于水果排放的挥发性有机化合物(VOC)提供了易于使用和非破坏性的成熟阶段评估。在这项研究中,\'金色美味\'苹果,在环境温度下存储和监控,在2022年和2023年进行了分析,以收集来自四个金属氧化物半导体(MOS)传感器(MQ3、MQ135、MQ136和MQ138)的数据。三个成熟阶段(不太成熟,成熟,和过熟)是使用主成分分析(PCA)和K均值聚类方法从四个实验中基于传感器测量的各种数据集中识别的。在应用K最近邻(KNN)模型后,结果显示苹果对特定数据集的成功分类,达到75%以上的精度。对于所有实验测量的数据集,在特定的测试集和新的评估集上实现了令人印象深刻的100%准确率,完全独立的实验。此外,相关性和PCA分析表明,选择两个或三个传感器可以提供同样成功的结果。总的来说,电子鼻结果强调了分析苹果收获后较长时间内进行的几个实验数据的重要性。在研究的VOC参数(乙烯,酯类,酒精,和醛)对于秋季或春季分析的较成熟或较成熟的苹果,这可以提高对成熟阶段的确定,对春季调查的苹果具有更高的预测成功率。
    Consumers often face a lack of information regarding the quality of apples available in supermarkets. General appearance factors, such as color, mechanical damage, or microbial attack, influence consumer decisions on whether to purchase or reject the apples. Recently, devices known as electronic noses provide an easy-to-use and non-destructive assessment of ripening stages based on Volatile Organic Compounds (VOCs) emitted by the fruit. In this study, the \'Golden Delicious\' apples, stored and monitored at the ambient temperature, were analyzed in the years 2022 and 2023 to collect data from four Metal Oxide Semiconductor (MOS) sensors (MQ3, MQ135, MQ136, and MQ138). Three ripening stages (less ripe, ripe, and overripe) were identified using Principal Component Analysis (PCA) and the K-means clustering approach from various datasets based on sensor measurements in four experiments. After applying the K-Nearest Neighbors (KNN) model, the results showed successful classification of apples for specific datasets, achieving an accuracy higher than 75%. For the dataset with measurements from all experiments, an impressive accuracy of 100% was achieved on specific test sets and on the evaluation set from new, completely independent experiments. Additionally, correlation and PCA analysis showed that choosing two or three sensors can provide equally successful results. Overall, the e-nose results highlight the importance of analyzing data from several experiments performed over a longer period after the harvest of apples. There are similarities and differences in investigated VOC parameters (ethylene, esters, alcohols, and aldehydes) for less or more mature apples analyzed during autumn or spring, which can improve the determination of the ripening stage with higher predicting success for apples investigated in the spring.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    考虑到在食品中发现的环氧乙烷(EtO)残留物的频率,EtO对健康的影响已成为一个问题。在2022年至2023年之间,在台湾使用目的抽样法检查了489种产品,发现9种不合格产品进口;随后,边境管制措施得到加强。为确保所有进口食品的安全,目前的研究使用K-均值聚类方法来识别食品中的EtO残留.收集国际舆论公告中含有EtO残留物的成品和原材料数据进行分析。与台湾美食云匹配后,筛选了90个含有EtO残留物的高风险食品和1388个制造商。台湾食品药品监督管理局建立了边境管制,并在无监督学习算法中使用K-means聚类对制造商进行了分组。对于这项研究,选择了37家具有优先检查的制造商以及52家具有残留EtO的高风险成品和原材料进行检查。虽然未检测到EtO,研究结论如下:1.利用国际食品安全警报加强边境管控,可有效保障国内食品安全;2.K-means聚类可以验证基于风险的目的性抽样结果,以确保食品安全和降低成本。
    Considering the frequency of ethylene oxide (EtO) residues found in food, the health effects of EtO have become a concern. Between 2022 and 2023, 489 products were inspected using the purposive sampling method in Taiwan, and nine unqualified products were found to have been imported; subsequently, border control measures were enhanced. To ensure the safety of all imported foods, the current study used the K-means clustering method for identifying EtO residues in food. Data on finished products and raw materials with EtO residues from international public opinion bulletins were collected for analysis. After matching them with the Taiwan Food Cloud, 90 high-risk food items with EtO residues and 1388 manufacturers were screened. The Taiwan Food and Drug Administration set up border controls and grouped the manufacturers using K-means clustering in the unsupervised learning algorithm. For this study, 37 manufacturers with priority inspections and 52 high-risk finished products and raw materials with residual EtO were selected for inspection. While EtO was not detected, the study concluded the following: 1. Using international food safety alerts to strengthen border control can effectively ensure domestic food safety; 2. K-means clustering can validate the risk-based purposive sampling results to ensure food safety and reduce costs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    人类对海洋环境和相关生物多样性的需求增加威胁到生态系统产品和服务的可持续交付,特别是对于浅海陆架栖息地。因此,人们越来越重视量化容易受到人类压力的海底生境和基石物种的地理范围和分布。在这项研究中,我们开发了一个基于无监督K-Means分类单元和广义线性模型建立多频反向散射分析(95,300kHz)的工作流程,测深和测深导数(斜率),以预测汉普顿的Turbot银行特殊保护区(SAC)中不同水平的sandeel密度。对于Hyperopluslanceolatus密度,比较了单频与多频模型的性能。注意到K-Means聚类输出(来自95kHz和多频率模型)与地面真实的sandeel密度之间的一致性相对较高。此外,在这种情况下,均方根误差(RMSE)值表明,就预测能力而言,单频模型优于多频模型。这主要与物种对沉积环境的强亲和力有关,这些物种的变异性可以通过较低频率的系统更好地捕获。一般来说,这些结果提供了有关物种与栖息地关系的重要信息,并指出了可能发现sandeels的床形特征,并且其变异性可能与测深范围有关。本研究中开发的工作流程还提供了概念证明,以支持在海洋保护区中设计强大的特定物种监测计划。最重要的是,我们强调在抽样过程中如何做出决定,数据处理,分析可能会影响物种分布模型和底栖生境图的最终输出和解释。
    Increased human demand on the marine environment and associated biodiversity threatens sustainable delivery of ecosystem goods and services, particularly for shallow shelf-sea habitats. As a result, more attention is being paid to quantifying the geographical range and distribution of seabed habitats and keystone species vulnerable to human pressures. In this study, we develop a workflow based on unsupervised K-Means classification units and Generalized Linear Models built from multi-frequency backscatter analyses (95, 300 kHz), bathymetry and bathymetry derivatives (slope) to predict different levels of sandeel densities in Hempton\'s Turbot Bank Special Area of Conservation (SAC). For Hyperoplus lanceolatus densities, the performance of single frequency verses multi-frequency models is compared. Relatively high agreement between K-Means clustering outputs (from 95 kHz and multi-frequency models) and ground-truthed sandeel densities is noted. Moreover, Root Mean Squared Error (RMSE) values in this instance demonstrate that single-frequency models are favoured over the multi-frequency model in terms of predictive ability. This is mostly linked to the species strong affinity for sedimentary environments whose variability is better captured by the lower frequency system. Generally, these results provide important information about species-habitat relationships and pinpoint bedform features where sandeels are likely to be found and whose variability is potentially linked to the bathymetry domain. The workflow developed in this study also provides a proof of concept to support the design of a robust species-specific monitoring plan in marine protected areas. Most importantly, we highlight how decisions made during sampling, data handling, analysis could impact the final outputs and interpretation of Species Distribution Models and benthic habitat mapping.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:本研究的目的是通过多序列磁共振成像(MRI)确定肿瘤内生境区域,并评估这些区域对预测鼻咽癌(NPC)患者对新辅助化疗(NAC)反应的价值。
    方法:纳入了二百九十七名鼻咽癌患者。多序列MRI数据用于勾勒出整个肿瘤的感兴趣的三维体积(VOI)。将原始影像资料分为两组,将其重新采样至1×1×1mm3(组_1mm)和3×3×3mm3(组_3mm)的各向同性分辨率。对3组中3个序列的每个体素计算了19个影像组学特征,在肿瘤区域内提取局部信息。然后,实施k-均值聚类以将整个肿瘤区域分成两组。在提取影像组学特征并降维之后,采用多层感知器(MLP)算法建立生境模型。
    结果:仅包括T分期作为临床模型。habitat3mm模型,其中包括10个影像组学特征,在训练和验证队列中实现了0.752和0.724的AUC,分别。鉴于habitat3mm模型的结果略好,列线图与habitat3mm模型和T分期相结合,在训练和验证队列中AUC分别为0.749和0.738。决策曲线分析提供了列线图临床实用性的进一步证据。
    结论:基于肿瘤内生境的列线图可预测NAC在NPC患者中的疗效,提供改善治疗计划和患者预后的潜力。
    OBJECTIVE: The aim of this study is to determine intratumoral habitat regions from multi-sequences magnetic resonance imaging (MRI) and to assess the value of those regions for prediction of patient response to neoadjuvant chemotherapy (NAC) in nasopharyngeal carcinoma (NPC).
    METHODS: Two hundred and ninety seven patients with NPC were enrolled. Multi-sequences MRI data were used to outline three-dimensional volumes of interest (VOI) of the whole tumor. The original imaging data were divided into two groups, which were resampled to an isotropic resolution of 1 × 1 × 1 mm3 (group_1mm) and 3 × 3 × 3 mm3 (group_3mm). Nineteen radiomics features were computed for each voxel of three sequences in group_3mm, within the tumor region to extract local information. Then, k-means clustering was implemented to segment the whole tumor regions in two groups. After radiomics features were extracted and dimension reduction, habitat models were built using Multi-Layer Perceptron (MLP) algorithm.
    RESULTS: Only T stage was included as the clinical model. The habitat3mm model, which included 10 radiomics features, achieved AUCs of 0.752 and 0.724 in the training and validation cohorts, respectively. Given the slightly better outcome of habitat3mm model, nomogram was developed in combination with habitat3mm model and T stage with the AUC of 0.749 and 0.738 in the training and validation cohorts. The decision curve analysis provides further evidence of the nomogram\'s clinical practicality.
    CONCLUSIONS: A nomogram based on intratumoral habitat predicts the efficacy of NAC in NPC patients, offering the potential to improve both the treatment plan and patient outcomes.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:胰岛素抵抗与虚弱风险增加有关,然而,甘油三酯葡萄糖-体重指数(TyG-BMI)之间的综合关系,这反映了体重,和脆弱,尚不清楚。在这项研究中调查了这种关系。
    方法:分析了中国健康与退休纵向研究(2011-2020)9135名参与者的数据。基线TyG-BMI,基线至2015年之间的TyG-BMI和累积TyG-BMI的变化,以及九年来的虚弱指数(FI),被计算。使用K均值聚类,根据TyG-BMI变化将参与者分为不同类别。使用基于组的轨迹模型评估FI轨迹。使用Logistic和Cox回归模型来分析TyG-BMI和FI轨迹与虚弱发生率之间的关联。使用受限三次样条探索了非线性关系,并使用线性混合效应模型来评估FI发展速度。加权分位数回归用于确定主要影响因素。
    结果:确定了TyG-BMI的四类变化和两种FI轨迹。基线TyG-BMI的第三(OR=1.25,95%CI:1.10-1.42)和第四(OR=1.83,95%CI:1.61-2.09)四分位数的个体,那些始终排名第二(OR=1.49,95%CI:1.32-1.70)和最高(OR=2.17,95%CI:1.84-2.56)的TyG-BMI变化,而在累积TyG-BMI的第3(OR=1.20,95%CI:1.05-1.36)和第4(OR=1.94,95%CI:1.70-2.22)四分位数的患者经历快速FI轨迹的可能性更大.在基线TyG-BMI的第四个四分位数中,观察到较高的脆弱风险(HR=1.42,95%CI:1.28-1.58),始终排名第二(HR=1.23,95%CI:1.12-1.34),TyG-BMI变化最高(HR=1.58,95%CI:1.42-1.77),以及累积TyG-BMI的第三四分位数(HR=1.10,95%CI:1.00-1.21)和第四四分位数(HR=1.46,95%CI:1.33-1.60)。TyG-BMI变化持续第二低到最高的参与者(分别为β=0.15、0.38和0.76)和经历第三至第四累积TyG-BMI(分别为β=0.25和0.56)的参与者表现出加速的FI进展。在TyG-BMI水平与快速FI轨迹和较高的虚弱风险之间观察到U形关联。BMI是主要因素。
    结论:较高的TyG-BMI与FI轨迹的快速发展和较高的虚弱风险相关。然而,过低的TyG-BMI水平似乎也有助于虚弱的发育。保持健康的TyG-BMI,尤其是健康的BMI,可能有助于预防或延缓虚弱的发作。
    BACKGROUND: Insulin resistance is linked to an increased risk of frailty, yet the comprehensive relationship between the triglyceride glucose-body mass index (TyG-BMI), which reflects weight, and frailty, remains unclear. This relationship is investigated in this study.
    METHODS: Data from 9135 participants in the China Health and Retirement Longitudinal Study (2011-2020) were analysed. Baseline TyG-BMI, changes in the TyG-BMI and cumulative TyG-BMI between baseline and 2015, along with the frailty index (FI) over nine years, were calculated. Participants were grouped into different categories based on TyG-BMI changes using K-means clustering. FI trajectories were assessed using a group-based trajectory model. Logistic and Cox regression models were used to analyse the associations between the TyG-BMI and FI trajectory and frail incidence. Nonlinear relationships were explored using restricted cubic splines, and a linear mixed-effects model was used to evaluate FI development speed. Weighted quantile regression was used to identify the primary contributing factors.
    RESULTS: Four classes of changes in the TyG-BMI and two FI trajectories were identified. Individuals in the third (OR = 1.25, 95% CI: 1.10-1.42) and fourth (OR = 1.83, 95% CI: 1.61-2.09) quartiles of baseline TyG-BMI, those with consistently second to highest (OR = 1.49, 95% CI: 1.32-1.70) and the highest (OR = 2.17, 95% CI: 1.84-2.56) TyG-BMI changes, and those in the third (OR = 1.20, 95% CI: 1.05-1.36) and fourth (OR = 1.94, 95% CI: 1.70-2.22) quartiles of the cumulative TyG-BMI had greater odds of experiencing a rapid FI trajectory. Higher frail risk was noted in those in the fourth quartile of baseline TyG-BMI (HR = 1.42, 95% CI: 1.28-1.58), with consistently second to highest (HR = 1.23, 95% CI: 1.12-1.34) and the highest TyG-BMI changes (HR = 1.58, 95% CI: 1.42-1.77), and those in the third (HR = 1.10, 95% CI: 1.00-1.21) and fourth quartile of cumulative TyG-BMI (HR = 1.46, 95% CI: 1.33-1.60). Participants with persistently second-lowest to the highest TyG-BMI changes (β = 0.15, 0.38 and 0.76 respectively) and those experiencing the third to fourth cumulative TyG-BMI (β = 0.25 and 0.56, respectively) demonstrated accelerated FI progression. A U-shaped association was observed between TyG-BMI levels and both rapid FI trajectory and higher frail risk, with BMI being the primary factor.
    CONCLUSIONS: A higher TyG-BMI is associated with the rapid development of FI trajectory and a greater frail risk. However, excessively low TyG-BMI levels also appear to contribute to frail development. Maintaining a healthy TyG-BMI, especially a healthy BMI, may help prevent or delay the frail onset.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    该数据集展示了使用基于计算片段和机器学习辅助的药物发现来生成用于治疗高血压的新的前导分子。具体来说,重点是针对肾素-血管紧张素-醛固酮系统(RAAS)的药物,通常分为血管紧张素转换酶抑制剂(ACEI)和血管紧张素II受体阻滞剂(ARB)。初步的数据集是一个特定的目标,从ChEMBL和DrugBank分子数据库获得的26个批准的ACEI和ARB分子的63个分子片段的用户生成的片段文库.该片段库提供了初级输入数据集,以生成数据集中呈现的新前导分子。筛选新产生的分子以检查它们是否满足口服药物的标准并且包含ACEI或ARB核心官能团标准。使用无监督机器学习,符合标准的分子根据其官能团分配分为药物类别簇.这个过程导致了三个最终的输出数据集,其中含有新的ACEI分子,另一个是新的ARB分子,最后一个是新的未分配的类分子。这些数据可以帮助及时有效地设计新型抗高血压药物。它也可以用于治疗抵抗患者的精准高血压药物,无反应或合并症。尽管该数据集特定于抗高血压药,该模型可以在最小的变化下重复使用,为其他健康状况产生新的铅分子。
    This dataset demonstrates the use of computational fragmentation-based and machine learning-aided drug discovery to generate new lead molecules for the treatment of hypertension. Specifically, the focus is on agents targeting the renin-angiotensin-aldosterone system (RAAS), commonly classified as Angiotensin-Converting Enzyme Inhibitors (ACEIs) and Angiotensin II Receptor Blockers (ARBs). The preliminary dataset was a target-specific, user-generated fragment library of 63 molecular fragments of the 26 approved ACEI and ARB molecules obtained from the ChEMBL and DrugBank molecular databases. This fragment library provided the primary input dataset to generate the new lead molecules presented in the dataset. The newly generated molecules were screened to check whether they met the criteria for oral drugs and comprised the ACEI or ARB core functional group criterion. Using unsupervised machine learning, the molecules that met the criterion were divided into clusters of drug classes based on their functional group allocation. This process led to three final output datasets, one containing the new ACEI molecules, another for the new ARB molecules, and the last for the new unassigned class molecules. This data can aid in the timely and efficient design of novel antihypertensive drugs. It can also be used in precision hypertension medicine for patients with treatment resistance, non-response or co-morbidities. Although this dataset is specific to antihypertensive agents, the model can be reused with minimal changes to produce new lead molecules for other health conditions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    由于多种因素,言语理解可能具有挑战性,给演讲者和听众带来不便。在这种情况下,使用人形机器人,Pepper,可以是有益的,因为它可以在其屏幕上显示相应的文本。然而,在此之前,仔细评估Pepper捕获的录音的准确性至关重要。因此,在这项研究中,对八名参与者进行了一项实验,其主要目的是借助Mel-FrequencyCepstral系数等音频特征来检查Pepper的语音识别系统,光谱质心,光谱平整度,零交叉率,螺距,和能量。此外,K-means算法用于创建基于这些特征的聚类,目的是在语音到文本转换工具Whisper的帮助下选择最合适的聚类。最佳聚类的选择是通过找到位于聚类中的最大精度数据点来实现的。为了实现这一点,施加丢弃WER值大于0.3的数据点的标准。这项研究的结果表明,与人形机器人Pepper相距一米的距离适合捕获最佳语音记录。相比之下,年龄和性别不影响语音记录的准确性。拟议的系统将在需要字幕以提高对口语陈述的理解的环境中提供显着的优势。
    Speech comprehension can be challenging due to multiple factors, causing inconvenience for both the speaker and the listener. In such situations, using a humanoid robot, Pepper, can be beneficial as it can display the corresponding text on its screen. However, prior to that, it is essential to carefully assess the accuracy of the audio recordings captured by Pepper. Therefore, in this study, an experiment is conducted with eight participants with the primary objective of examining Pepper\'s speech recognition system with the help of audio features such as Mel-Frequency Cepstral Coefficients, spectral centroid, spectral flatness, the Zero-Crossing Rate, pitch, and energy. Furthermore, the K-means algorithm was employed to create clusters based on these features with the aim of selecting the most suitable cluster with the help of the speech-to-text conversion tool Whisper. The selection of the best cluster is accomplished by finding the maximum accuracy data points lying in a cluster. A criterion of discarding data points with values of WER above 0.3 is imposed to achieve this. The findings of this study suggest that a distance of up to one meter from the humanoid robot Pepper is suitable for capturing the best speech recordings. In contrast, age and gender do not influence the accuracy of recorded speech. The proposed system will provide a significant strength in settings where subtitles are required to improve the comprehension of spoken statements.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在筛选益生菌菌株的过程中,没有明确确定的细菌表型标记可用于预测其体内作用机制。在这项工作中,我们首次证明,机器学习(ML)方法可用于使用蜗牛宿主-微生物相互作用模型,根据益生菌菌株的细胞表面表型特征,准确预测其体内免疫调节活性.广泛的蜗牛肠道推定益生菌,包括240个新的乳酸菌菌株(乳酸菌,明串珠菌,乳球菌,和肠球菌),根据它们承受蜗牛胃肠道防御屏障的能力进行隔离和表征,比如踏板粘液,胃粘液,胃液,和酸性pH值,与它们的细胞表面疏水性有关,自动聚合,和生物膜形成能力。实施的ML管道以高精度(88%)预测菌株,具有增强蜗牛血淋巴细胞趋化性和吞噬活性的能力,同时还揭示了细菌自身聚集和细胞表面疏水性是显着影响宿主免疫反应的最重要参数。结果表明,ML方法可能有助于获得对宿主-益生菌相互作用的预测性理解,同时还强调了使用蜗牛作为一种有效的动物模型,根据它们与细胞先天免疫反应的相互作用来筛选推定的益生菌菌株。
    In the process of screening for probiotic strains, there are no clearly established bacterial phenotypic markers which could be used for the prediction of their in vivo mechanism of action. In this work, we demonstrate for the first time that Machine Learning (ML) methods can be used for accurately predicting the in vivo immunomodulatory activity of probiotic strains based on their cell surface phenotypic features using a snail host-microbe interaction model. A broad range of snail gut presumptive probiotics, including 240 new lactic acid bacterial strains (Lactobacillus, Leuconostoc, Lactococcus, and Enterococcus), were isolated and characterized based on their capacity to withstand snails\' gastrointestinal defense barriers, such as the pedal mucus, gastric mucus, gastric juices, and acidic pH, in association with their cell surface hydrophobicity, autoaggregation, and biofilm formation ability. The implemented ML pipeline predicted with high accuracy (88 %) strains with a strong capacity to enhance chemotaxis and phagocytic activity of snails\' hemolymph cells, while also revealed bacterial autoaggregation and cell surface hydrophobicity as the most important parameters that significantly affect host immune responses. The results show that ML approaches may be useful to derive a predictive understanding of host-probiotic interactions, while also highlighted the use of snails as an efficient animal model for screening presumptive probiotic strains in the light of their interaction with cellular innate immune responses.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号