Clustering

聚类
  • 文章类型: Journal Article
    无线传感器网络(WSN)通常由大量离散的传感器节点组成,每个都需要有限的资源,包括记忆,计算能力,和能量。要延长网络生存期,这些有限的资源必须得到有效利用。在WSN中,聚类是优化网络寿命和节能的最佳方法之一。在这项工作中,提出了一种基于合作博弈论(CGT)的能量和吞吐量感知自适应路由(ETAAR)算法。为了在WSN中实现节能和改进的数据速率路由,我们应用了CGT和联盟博弈两种博弈理论。这种路由机制的主要部分是簇头选择和集群节点,以执行节点之间的能量高效和吞吐量有效的通信。在第一阶段,同时采用能量和吞吐量的基于CGT的效用函数被用于手工挑选CH节点。在第二阶段,随着能量和吞吐量,自适应时隙传输考虑了平均端到端延迟,以避免联盟博弈方法中的冲突。MATLAB工具用于仿真。仿真结果表明,所提出的ETAAR协议在剩余能量方面优于早期的路由,PDR,能量到期比,平均端到端延迟,死节点。48%的网络寿命扩展,ETAAR实现了60%的节能和52.5%的延迟短缺。
    A Wireless Sensor Network (WSN) is usually made up of a large number of discrete sensor nodes, each of which requires restricted resources, including memory, computing power, and energy. To extend the network lifetime, these limited resources must be used effectively. In WSN, clustering constitutes one of the best methods for optimizing network longevity and energy conservation. In this work, we proposed a novel Energy and Throughput Aware Adaptive Routing (ETAAR) algorithm based on Cooperative Game Theory (CGT). To achieve the energy efficient and improved data rate routing in WSN, we are applied two game theories of CGT and coalition game. The main part of this routing mechanism is cluster head selection and clustering the nodes to perform energy efficient and throughput effective communication between the nodes. In first stage, CGT based utility function which adopts both energy and throughput is utilized to handpick the CH nodes. In the second stage, along with the energy and throughput, average end-to-end delay is considered for the adaptive time slot transmission to avoid collision in the coalition game approach. MATLAB tool is used for simulation. The simulation results shows that the proposed ETAAR protocol is outperforms than earlier works of routing in terms of residual energy, PDR, energy due ratio, average end-to-end delay, dead nodes. The network lifetime of 48% extension, energy saving of 60% and 52.5% of delay shortage attained in ETAAR.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在奶牛场,奶牛通常被随意饲喂饲料和浓缩物的混合物。为了提高奶牛的能量状态和生产力,已经提出了个性化喂养策略。这种策略之一是提供补充浓缩物,以根据个体产奶量或计算的能量平衡等因素调整饲料与浓缩物的比例。这种策略会影响牛奶产量和奶牛健康,尽管缺乏一致的调整规则。这项研究的目的是评估个性化喂养策略的效果,根据奶牛的体重增加每周调整,生产性能;并确定是否可以在泌乳早期预测奶牛的代谢状态,以将其纳入策略的决策规则。共有40头多胎荷斯坦奶牛参与了4个月的试验。奶牛在产牛后分别进入实验,最初的8天(平均)每天以固定的3千克额外浓缩物饲喂标准日粮。然后根据产牛日期对母牛进行配对,奇偶校验(2或3),在最初的一周里体重增加。每对中的一头母牛被分配到标准喂养(SF)策略,它继续接受固定的口粮,而另一个被分配到精确进给(PF)策略,根据体重增加,每周接受可变数量的额外浓缩物调整。测量包括每周体重,每日产奶量,每天摄入精矿和饲料。收集血样以测量代谢物(葡萄糖,BHB,NEFA)用于代谢谱分析。结果显示总体体重增加没有显着差异,牛奶产量,或摄入(浓缩物和/或饲料)。根据血液代谢物确定了两个代谢簇(葡萄糖,BHB,NEFA),预测奶牛的代谢状态,准确率为90%。平衡的集群有更高的牛奶产量,饲料摄入量,比不平衡的集群失去更多的体重。替代变量,如体重增加和总采食量可用于预测代谢簇,达到70%的准确率。最后,饲喂这种精确喂养策略的奶牛与饲喂标准喂养策略的奶牛具有相似的性能。应该研究这一战略的长期效果。代谢谱分析预测了奶牛的代谢状态,表明其具有增强个性化喂养决策的潜力。
    In dairy farms, cows are commonly fed a mixture of forages and concentrates ad libitum. To improve the energetic status and productivity of dairy cows, individualized feeding strategies have been proposed. One of this strategy is providing supplemental concentrates to adjust the forage-to-concentrate ratio based on factors like individual milk yield or calculated energy balance. This strategy can affect milk production and cow health, though consistent rules for adjustment are lacking. The objectives of this study were to evaluate the effects of an individualized feeding strategy, adjusted weekly based on the body weight gain of dairy cows, on production performance; and to determine if the metabolic status of the cows could be predicted early in lactation to take it into account into the decisions rules of the strategy. A total of 40 multiparous Holstein cows were involved in a 4-mo trial. The cows entered the experiment individually after calving and were initially fed a standard ration with a fixed 3 kg of extra concentrate per day for the first 8 d (on average). The cows were then paired based on calving date, parity (2 or 3), and body weight gain over the initial week. One cow from each pair was assigned to the Standard Feeding (SF) strategy, which continued receiving the fixed ration, while the other was assigned to the Precision Feeding (PF) strategy, which received a variable amount of extra concentrate adjusted weekly based on body weight gain. Measurements included weekly body weight, daily milk yield, and daily intakes of concentrates and forages. Blood samples were collected to measure metabolites (glucose, BHB, NEFA) for metabolic profiling. The results showed no significant differences in overall body weight gain, milk yield, or intakes (concentrates and/or forages). Two metabolic clusters were identified based on blood metabolites (glucose, BHB, NEFA), predicting cows\' metabolic status with 90% accuracy. The balanced cluster had higher milk production, feed intake, and lost more body weight than the imbalanced cluster. Alternative variables like body weight gain and total feed intake can be used to predict metabolic clusters, achieving up to 70% accuracy. To conclude, cows fed this precision feeding strategy had similar performances than those fed the standard feeding strategy. Long-term effect of this strategy should be studied. Metabolic profiling predicted cows\' metabolic status suggesting its potential for enhancing individualized feeding decisions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    识别数据中有意义的模式对于理解复杂的生物过程至关重要。特别是在转录组学中,其中具有相关表达的基因通常共享功能或有助于疾病机制。传统的相关系数,主要捕捉线性关系,可能会忽略重要的非线性模式。我们引入了簇匹配相关系数(CCC),一个不仅线性系数,利用聚类有效地检测线性和非线性关联。CCC通过揭示仅线性系数错过并且比诸如最大信息系数之类的最新系数更快的生物学有意义的模式来优于标准方法。当应用于来自基因型组织表达(GTEx)的人类基因表达数据时,CCC确定了稳健的线性关系和非线性模式,例如性别差异,用标准方法检测不到。在由蛋白质-蛋白质相互作用构建的整合网络中,高排名的基因对被富集,转录因子调节,以及化学和遗传扰动,这表明CCC可以检测纯线性方法错过的函数关系。CCC是一种高效的,下一代,基因组尺度数据的不仅线性相关系数。补充信息中包含了本文透明的同行评审过程的记录。
    Identifying meaningful patterns in data is crucial for understanding complex biological processes, particularly in transcriptomics, where genes with correlated expression often share functions or contribute to disease mechanisms. Traditional correlation coefficients, which primarily capture linear relationships, may overlook important nonlinear patterns. We introduce the clustermatch correlation coefficient (CCC), a not-only-linear coefficient that utilizes clustering to efficiently detect both linear and nonlinear associations. CCC outperforms standard methods by revealing biologically meaningful patterns that linear-only coefficients miss and is faster than state-of-the-art coefficients such as the maximal information coefficient. When applied to human gene expression data from genotype-tissue expression (GTEx), CCC identified robust linear relationships and nonlinear patterns, such as sex-specific differences, that are undetectable by standard methods. Highly ranked gene pairs were enriched for interactions in integrated networks built from protein-protein interactions, transcription factor regulation, and chemical and genetic perturbations, suggesting that CCC can detect functional relationships missed by linear-only approaches. CCC is a highly efficient, next-generation, not-only-linear correlation coefficient for genome-scale data. A record of this paper\'s transparent peer review process is included in the supplemental information.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    卵巢癌患者对一线治疗的耐药性提出了重大挑战,大约70%的人经历复发并对紫杉醇等一线化疗产生强烈的耐药性。
    在预测的框架内,预防性,和个性化医疗(3PM),这项研究旨在利用人工智能来发现单个细胞的耐药性特征,并进一步构建基于这些抗性性状的分类策略和深度学习预后模型,这可以更好地促进和执行3PM。
    这项研究采用了“Beyondcell,“一种能够预测细胞药物反应的算法,计算来自卵巢癌样本的21,937个细胞的表达模式与5201个药物的特征之间的相似性,以鉴定耐药细胞。使用耐药性特征在TCGA训练集上执行10个多组学聚类,以识别具有差异药物反应的患者亚组。同时,针对该训练集构建了具有KAN架构的深度学习预后模型,该模型具有灵活的激活函数,可以更好地拟合模型.使用来自GEO的三个外部验证集评估构建的患者亚型分类器和预后模型:GSE17260、GSE26712和GSE51088。
    这项研究发现内皮细胞对紫杉醇有抗性,阿霉素,和多西他赛,提示它们作为卵巢癌患者细胞治疗靶点的潜力。基于耐药特点,10多组学聚类确定了对四种化疗药物具有不同反应的四种患者亚型,其中CS2亚型对所有四种药物的敏感性最高。其他亚型也表现出富集在不同的生物学途径和免疫浸润,允许根据他们的特点进行有针对性的治疗。此外,这项研究应用了人工智能中最新的KAN架构来取代DeepSurv预后模型中的MLP结构,最后证明了对患者预后预测的稳健表现。
    这项研究,通过对患者进行分类并根据对一线药物的耐药特征构建预后模型,有效地将多组学数据应用于3PM领域。
    在线版本包含补充材料,可在10.1007/s13167-024-00374-4获得。
    UNASSIGNED: Ovarian cancer patients\' resistance to first-line treatment posed a significant challenge, with approximately 70% experiencing recurrence and developing strong resistance to first-line chemotherapies like paclitaxel.
    UNASSIGNED: Within the framework of predictive, preventive, and personalized medicine (3PM), this study aimed to use artificial intelligence to find drug resistance characteristics at the single cell, and further construct the classification strategy and deep learning prognostic models based on these resistance traits, which can better facilitate and perform 3PM.
    UNASSIGNED: This study employed \"Beyondcell,\" an algorithm capable of predicting cellular drug responses, to calculate the similarity between the expression patterns of 21,937 cells from ovarian cancer samples and the signatures of 5201 drugs to identify drug-resistance cells. Drug resistance features were used to perform 10 multi-omics clustering on the TCGA training set to identify patient subgroups with differential drug responses. Concurrently, a deep learning prognostic model with KAN architecture which had a flexible activation function to better fit the model was constructed for this training set. The constructed patient subtype classifier and prognostic model were evaluated using three external validation sets from GEO: GSE17260, GSE26712, and GSE51088.
    UNASSIGNED: This study identified that endothelial cells are resistant to paclitaxel, doxorubicin, and docetaxel, suggesting their potential as targets for cellular therapy in ovarian cancer patients. Based on drug resistance features, 10 multi-omics clustering identified four patient subtypes with differential responses to four chemotherapy drugs, in which subtype CS2 showed the highest drug sensitivity to all four drugs. The other subtypes also showed enrichment in different biological pathways and immune infiltration, allowing for targeted treatment based on their characteristics. Besides, this study applied the latest KAN architecture in artificial intelligence to replace the MLP structure in the DeepSurv prognostic model, finally demonstrating robust performance on patients\' prognosis prediction.
    UNASSIGNED: This study, by classifying patients and constructing prognostic models based on resistance characteristics to first-line drugs, has effectively applied multi-omics data into the realm of 3PM.
    UNASSIGNED: The online version contains supplementary material available at 10.1007/s13167-024-00374-4.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:不良儿童经历(ACE)影响到一半的普通人群,已知它们共同发生,在经历贫困的人中尤其常见。然而,考虑到其发展时机,目前研究ACE共现的具体模式的研究有限.
    目的:为了研究儿童和青少年时期ACE的纵向共现模式,并研究贫困在预测这些方面的作用。
    方法:样本是来自雅芳父母和儿童纵向研究的8859名儿童,纵向前瞻性基于人口的英国出生队列。
    方法:可重复测量10种ACE,发生在儿童早期(出生-5年),儿童中期(6-10岁),和青春期(11-16岁)。潜在类别分析用于识别具有相似ACE发育模式的儿童组。多项回归用于检查怀孕期间贫困与ACE类别之间的关联。
    结果:16%的父母经历过贫困。选择了五类潜在模型:“低ACE”(72·0%),“早中期家庭不和谐”(10%),“持续的父母心理健康问题”(9·7%),“幼儿虐待和父母心理健康问题”(5·0%),和“儿童中期和青春期ACE”(2·6%)。与低ACE参考类相比,贫困与每个ACE类的可能性更高相关。“早期和中期儿童家庭不和谐”类别的影响最大(OR4·70,95%CI3·68-6·00)。
    结论:需要采取多因素方法来预防ACE-包括为面临经济和物质困难的父母提供支持,有风险的家庭,并及时干预那些经历ACE的人。
    BACKGROUND: Adverse childhood experiences (ACEs) affect up to half the general population, they are known to co-occur, and are particularly common among those experiencing poverty. Yet, there are limited studies examining specific patterns of ACE co-occurrence considering their developmental timing.
    OBJECTIVE: To examine the longitudinal co-occurrence patterns of ACEs across childhood and adolescence, and to examine the role of poverty in predicting these.
    METHODS: The sample was 8859 children from the Avon Longitudinal Study of Parents and Children, a longitudinal prospective population-based UK birth cohort.
    METHODS: Repeated measures of ten ACEs were available, occurring in early childhood (birth-5 years), mid-childhood (6-10 years), and adolescence (11-16 years). Latent class analysis was used to identify groups of children with similar developmental patterns of ACEs. Multinomial regression was used to examine the association between poverty during pregnancy and ACE classes.
    RESULTS: Sixteen percent of parents experienced poverty. A five-class latent model was selected: \"Low ACEs\" (72·0 %), \"Early and mid-childhood household disharmony\" (10·6 %), \"Persistent parental mental health problems\" (9·7 %), \"Early childhood abuse and parental mental health problems\" (5·0 %), and \"Mid-childhood and adolescence ACEs\" (2·6 %). Poverty was associated with a higher likelihood of being in each of the ACE classes compared to the low ACEs reference class. The largest effect size was seen for the \"Early and mid-childhood household disharmony\" class (OR 4·70, 95 % CI 3·68-6·00).
    CONCLUSIONS: A multifactorial approach to preventing ACEs is needed - including support for parents facing financial and material hardship, at-risk families, and timely interventions for those experiencing ACEs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:每个医院管理者的目标都是建立和谐,互惠互利,和稳态部门。因此,探索基于客观医院数据的医院科室发展评价模型具有重要意义。
    目的:本研究旨在使用一种新颖的机器学习算法来确定医院科室的关键评估指标,为医院管理中的战略规划和资源分配提供见解。
    方法:从各种医院信息系统中提取与医院科室过去3年发展相关的数据。使用神经机器算法挖掘所得数据集,以评估医院部门在医院发展中的可能作用。采用问卷调查法咨询熟悉医院的资深专家,评估医院各科室的实际工作情况以及各科室的发展对医院整体学科的影响。我们使用这份问卷的结果来验证机器学习算法计算的部门风险分数的准确性。
    结果:在医院系统训练数据集上执行和建模深度机器学习。该模型成功地利用医院的训练数据集来学习,预测,评估医院科室的工作和发展。使用余弦相似度算法和Pearson相关性分析将问卷结果与来自部门机器学习算法的风险排序集进行比较,显示出良好的匹配性。这表明基于医院系统客观数据的科室发展评估模型和风险评分相对准确、客观。
    结论:这项研究表明,我们的机器学习算法为医院科室发展提供了一个准确客观的评估模型。模型的风险评估与专家意见的强烈一致性,通过统计分析验证,强调其可靠性和潜力,以指导战略医院管理决策。
    BACKGROUND: Every hospital manager aims to build harmonious, mutually beneficial, and steady-state departments. Therefore, it is important to explore a hospital department development assessment model based on objective hospital data.
    OBJECTIVE: This study aims to use a novel machine learning algorithm to identify key evaluation indexes for hospital departments, offering insights for strategic planning and resource allocation in hospital management.
    METHODS: Data related to the development of a hospital department over the past 3 years were extracted from various hospital information systems. The resulting data set was mined using neural machine algorithms to assess the possible role of hospital departments in the development of a hospital. A questionnaire was used to consult senior experts familiar with the hospital to assess the actual work in each hospital department and the impact of each department\'s development on overall hospital discipline. We used the results from this questionnaire to verify the accuracy of the departmental risk scores calculated by the machine learning algorithm.
    RESULTS: Deep machine learning was performed and modeled on the hospital system training data set. The model successfully leveraged the hospital\'s training data set to learn, predict, and evaluate the working and development of hospital departments. A comparison of the questionnaire results with the risk ranking set from the departments machine learning algorithm using the cosine similarity algorithm and Pearson correlation analysis showed a good match. This indicates that the department development assessment model and risk score based on the objective data of hospital systems are relatively accurate and objective.
    CONCLUSIONS: This study demonstrated that our machine learning algorithm provides an accurate and objective assessment model for hospital department development. The strong alignment of the model\'s risk assessments with expert opinions, validated through statistical analysis, highlights its reliability and potential to guide strategic hospital management decisions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:运动困难在许多地方都很常见,但不是全部,自闭症患者。这些困难可能与其他问题同时发生,比如语言的延迟,知识分子,和适应性功能。支撑这种困难的生物机制不太清楚。在携带高度渗透的罕见基因突变的个体中,运动技能差往往更常见。这种机制可能具有改变神经生理兴奋-抑制平衡的下游后果,并导致行为运动噪声增强。
    方法:这项研究结合了自闭症患者的公开数据集和内部数据集(n=156),典型的发展(TD,n=149),和发育协调障碍(DCD,n=23)儿童(3-16岁)。根据《儿童运动评估电池》第2版测量的运动能力模式,确定了自闭症运动亚型。基于稳定性的相对聚类验证用于识别自闭症运动亚型并评估保留数据中的泛化准确性。自闭症电机亚型进行了电机噪声的差异测试,操作为在简单的触地任务中记录的重复运动运动轨迹之间的不相似程度。
    结果:可以检测到相对的“高”(n=87)与“低”(n=69)自闭症运动亚型,并且在保留数据中以89%的准确率进行推广。相对“低”亚型的一般智力较低,在独立行走年龄较大,但在第一个单词的年龄或自闭症特征或症状学上没有差异。与“高”(科恩的d=0.77)或TD儿童(科恩的d=0.85)相比,“低”亚型的电机噪声要高得多,但自闭症儿童和TD儿童之间相似(科恩的d=0.08)。在到达动作的前馈阶段,\'低\'亚型中增强的电动机噪声也最为明显。
    结论:这项工作的样本量有限。未来在较大样本中的工作以及独立复制非常重要。仅在一个特定的电机任务上测量电机噪声。因此,需要对许多其他电机任务中的电机噪声进行更全面的评估。
    结论:自闭症可以分为至少两种离散的运动亚型,其特征是运动噪声水平不同。这表明自闭症运动亚型可能受到不同生物学机制的支持。
    BACKGROUND: Motor difficulties are common in many, but not all, autistic individuals. These difficulties can co-occur with other problems, such as delays in language, intellectual, and adaptive functioning. Biological mechanisms underpinning such difficulties are less well understood. Poor motor skills tend to be more common in individuals carrying highly penetrant rare genetic mutations. Such mechanisms may have downstream consequences of altering neurophysiological excitation-inhibition balance and lead to enhanced behavioral motor noise.
    METHODS: This study combined publicly available and in-house datasets of autistic (n = 156), typically-developing (TD, n = 149), and developmental coordination disorder (DCD, n = 23) children (age 3-16 years). Autism motor subtypes were identified based on patterns of motor abilities measured from the Movement Assessment Battery for Children 2nd edition. Stability-based relative clustering validation was used to identify autism motor subtypes and evaluate generalization accuracy in held-out data. Autism motor subtypes were tested for differences in motor noise, operationalized as the degree of dissimilarity between repeated motor kinematic trajectories recorded during a simple reach-to-drop task.
    RESULTS: Relatively \'high\' (n = 87) versus \'low\' (n = 69) autism motor subtypes could be detected and which generalize with 89% accuracy in held-out data. The relatively \'low\' subtype was lower in general intellectual ability and older at age of independent walking, but did not differ in age at first words or autistic traits or symptomatology. Motor noise was considerably higher in the \'low\' subtype compared to \'high\' (Cohen\'s d = 0.77) or TD children (Cohen\'s d = 0.85), but similar between autism \'high\' and TD children (Cohen\'s d = 0.08). Enhanced motor noise in the \'low\' subtype was also most pronounced during the feedforward phase of reaching actions.
    CONCLUSIONS: The sample size of this work is limited. Future work in larger samples along with independent replication is important. Motor noise was measured only on one specific motor task. Thus, a more comprehensive assessment of motor noise on many other motor tasks is needed.
    CONCLUSIONS: Autism can be split into at least two discrete motor subtypes that are characterized by differing levels of motor noise. This suggests that autism motor subtypes may be underpinned by different biological mechanisms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    微电极阵列(MEAs)可以同时测量来自众多神经元的尖峰序列,由于微加工技术的进步。这些探针对于理解神经元网络的复杂动力学非常有价值。尖峰分选是从细胞外记录中全面分析神经元网络活动的关键步骤。然而,由于MEAs中尖峰的密集采样,尖峰分选的准确性相对较低。这里,我们提出了一种名为UMAP-COM的无监督管道方法,它利用组合功能来解决这个问题。这些组合特征包括通过均匀流形逼近和投影(UMAP)提取的主要尖峰形状特征,以及由质心(COM)估计的尖峰位置。我们在来自不同类型探针的公开数据集上验证了UMAP-COM方法,证明它比其他尖峰排序方法更准确。此外,我们分别对尖峰形状特征提取方法和尖峰定位方法进行评估。在这个比较中,UMAP作为优越的特征提取方法应运而生,证明其在准确表示尖峰形状方面的有效性。此外,我们发现COM方法优于其他尖峰定位方法,突出了其提高穗类分选准确性的能力。
    Microelectrode arrays (MEAs) enable simultaneous measurement of spike trains from numerous neurons, owing to advancements in microfabrication technology. These probes are highly valuable for comprehending the intricate dynamics of neuronal networks. Spike sorting is a pivotal step in comprehensively analyzing the activity of neuronal networks from extracellular recordings. However, the accuracy of spike sorting is relatively low due to the dense sampling of spikes in MEAs. Here, we propose an unsupervised pipeline named UMAP-COM method, which utilizes combined features to address this problem. These combined features comprise dominant spike shape features extracted by the uniform manifold approximation and projection (UMAP), as well as spike locations estimated by the center of mass (COM). We validate the UMAP-COM method on publicly available datasets from different kinds of probes, demonstrating that it is more accurate than other spike sorting methods. Furthermore, we conduct separate evaluations of spike shape feature extraction methods and spike localization methods. In this comparison, UMAP emerges as the superior feature extraction method, demonstrating its effectiveness in accurately representing spike shapes. Additionally, we find that the COM method outperforms other spike localization methods, highlighting its ability to enhance the accuracy of spike sorting.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    这项研究解决了饮用水部门面临的高度关注物质(SVHCs)的存在。充分响应SVHCs的潜在危害,发射途径的知识,毒性,存在于饮用水源中,水生产过程中的可去除性是至关重要的。由于无法单独接收每个化合物的信息,我们采用了详细的聚类方法,该方法基于具有超过1,000种化合物的列表中的SVHCs的化学性质和结构。通过这个过程,915种物质分为51个簇。我们在风险评估中测试了这种聚类。为了评估风险,我们利用随机森林和多元线性回归建立了毒性预测模型。应用这些模型对化合物列表进行毒性预测。这项研究表明,聚类是减少样本量的可行方法。此外,毒性模型提供了对潜在人类健康风险的见解。这项研究有助于更明智的决策和改善饮用水部门的风险评估,帮助保护人类健康和环境。这一原则是普遍适用的。如果在一组中找到合适的代表,该化合物的实验数据可用于测量该组化学品的行为。
    This research addresses the presence of substances of very high concern (SVHCs) confronting the drinking water sector. Responding adequately to the potential hazards by SVHCs, knowledge of emission pathways, toxicity, presence in drinking water sources, and removability during water production is crucial. As this information cannot be received for each compound individually, we employed a detailed clustering approach based on chemical properties and structures of SVHCs from lists with over 1,000 compounds. Through this process, 915 substances were divided into 51 clusters. We tested this clustering in risk assessment. To assess the risks, we developed toxicity prediction models utilizing random forests and multiple linear regression. These models were applied to make toxicity predictions for the list of compounds. This study shows that clustering is a viable approach to reducing sample size. In addition, the toxicity models provide insights into the potential human health risks. This research contributes to more informed decision-making and improved risk assessment in the drinking water sector, aiding in the protection of human health and the environment. This principle is generally applicable. If in a group a suitable representative is found, data from experiments with this compound can be used to gauge the behaviour of chemicals in this group.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    已经开发了几种方法来计算预测单细胞RNA测序(scRNAseq)数据的细胞类型。随着方法的发展,调查人员面临的一个普遍问题是确定他们应该应用于其特定用例的最佳方法。为了应对这一挑战,我们提出了CHAI(用于单细胞类型识别的一致性聚类相似矩阵积分),scRNAseq聚类的人群方法的智慧。CHAI提出了两种竞争方法,它们汇总了来自七种最先进的聚类方法的聚类结果:CHAI-AvgSim和CHAI-SNF。CHAI-AvgSim和CHAI-SNF在多个基准测试数据集上表现出卓越的性能。此外,两种CHAI方法都优于最新的共识聚类方法,SAME-clustering。我们通过鉴定富含CDH3的前导肿瘤细胞团来展示CHAI的实际用例。CHAI提供了一个多体集成的平台,我们证明了CHAI-SNF在包括空间转录组学数据时具有更好的性能。CHAI通过将最新和性能最高的scRNAseq聚类算法合并到聚合框架中,克服了以前的局限性。它也是一个直观且易于自定义的R包,用户可以将自己的聚类方法添加到管道中,或向下选择他们想要用于群集聚合的那些。这确保了随着更先进的聚类算法的开发,CHAI将作为一个通用框架对社区仍然有用。CHAI可以在GitHub上作为开源R包提供:https://github.com/lodimk2/chai。
    Several methods have been developed to computationally predict cell-types for single cell RNA sequencing (scRNAseq) data. As methods are developed, a common problem for investigators has been identifying the best method they should apply to their specific use-case. To address this challenge, we present CHAI (consensus Clustering tHrough similArIty matrix integratIon for single cell-type identification), a wisdom of crowds approach for scRNAseq clustering. CHAI presents two competing methods which aggregate the clustering results from seven state-of-the-art clustering methods: CHAI-AvgSim and CHAI-SNF. CHAI-AvgSim and CHAI-SNF demonstrate superior performance across several benchmarking datasets. Furthermore, both CHAI methods outperform the most recent consensus clustering method, SAME-clustering. We demonstrate CHAI\'s practical use case by identifying a leader tumor cell cluster enriched with CDH3. CHAI provides a platform for multiomic integration, and we demonstrate CHAI-SNF to have improved performance when including spatial transcriptomics data. CHAI overcomes previous limitations by incorporating the most recent and top performing scRNAseq clustering algorithms into the aggregation framework. It is also an intuitive and easily customizable R package where users may add their own clustering methods to the pipeline, or down-select just the ones they want to use for the clustering aggregation. This ensures that as more advanced clustering algorithms are developed, CHAI will remain useful to the community as a generalized framework. CHAI is available as an open source R package on GitHub: https://github.com/lodimk2/chai.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号