distribution difference

  • 文章类型: Journal Article
    在免疫学研究中,流式细胞术是一种常用的多变量单细胞检测方法。流式细胞术分析的一个关键目标是检测对某些刺激有反应的免疫细胞。统计上,这个问题可以转化为比较刺激前后的两个蛋白质表达概率密度函数(pdfs);目标是确定这两个pdfs不同的区域。可以进行这些差异区域的进一步筛选以鉴定富集的响应细胞组。在本文中,我们将识别差异密度区域建模为多重测试问题。首先,我们将样本空间分成小的箱子。在每个垃圾箱中,我们形成了一个假设来检验微分pdfs的存在。第二,我们开发了一种新颖的多重测试方法,称为TEAM(聚合树方法上的测试),在将错误发现率(FDR)控制在所需水平下的同时,识别那些含有差异PDF的垃圾箱。TEAM将测试程序嵌入到聚合树中,以从精细分辨率到粗略分辨率进行测试。该过程实现了将密度差异精确定位到最小可能区域的统计目标。团队的计算效率很高,与竞争方法相比,能够在更短的时间内分析大型流式细胞术数据集。我们将TEAM和竞争方法应用于流式细胞术数据集以鉴定响应巨细胞病毒(CMV)-pp65抗原刺激的T细胞。通过额外的下游筛选,团队成功地确定了含有单官能的富集集,双功能,和多功能T细胞。竞争方法要么没有在合理的时间范围内完成,要么提供的结果解释性较差。数值模拟和理论证明,TEAM具有渐近有效性,强大,和强大的性能。总的来说,TEAM是一种计算高效且统计强大的算法,可以在流式细胞术研究中产生有意义的生物学见解。
    In immunology studies, flow cytometry is a commonly used multivariate single-cell assay. One key goal in flow cytometry analysis is to detect the immune cells responsive to certain stimuli. Statistically, this problem can be translated into comparing two protein expression probability density functions (pdfs) before and after the stimulus; the goal is to pinpoint the regions where these two pdfs differ. Further screening of these differential regions can be performed to identify enriched sets of responsive cells. In this paper, we model identifying differential density regions as a multiple testing problem. First, we partition the sample space into small bins. In each bin, we form a hypothesis to test the existence of differential pdfs. Second, we develop a novel multiple testing method, called TEAM (Testing on the Aggregation tree Method), to identify those bins that harbor differential pdfs while controlling the false discovery rate (FDR) under the desired level. TEAM embeds the testing procedure into an aggregation tree to test from fine- to coarse-resolution. The procedure achieves the statistical goal of pinpointing density differences to the smallest possible regions. TEAM is computationally efficient, capable of analyzing large flow cytometry data sets in much shorter time compared with competing methods. We applied TEAM and competing methods on a flow cytometry data set to identify T cells responsive to the cytomegalovirus (CMV)-pp65 antigen stimulation. With additional downstream screening, TEAM successfully identified enriched sets containing monofunctional, bifunctional, and polyfunctional T cells. Competing methods either did not finish in a reasonable time frame or provided less interpretable results. Numerical simulations and theoretical justifications demonstrate that TEAM has asymptotically valid, powerful, and robust performance. Overall, TEAM is a computationally efficient and statistically powerful algorithm that can yield meaningful biological insights in flow cytometry studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    栀子(栀子)既是药用又是食用物质,广泛用于各种行业。市场上常见的药材有两种:止子和水之子。通常,织子带着小,圆形水果用于药用,而水之子,特点是大,细长的水果,用于染色。市场调查揭示了各种各样的智子类型,现代研究表明,水枝子含有丰富的化学成分和药理活性。在这项研究中,我们收集了25批的智子和水之子样品,根据外观将它们分类为倒卵形和圆形水果,七个长度等级(A-G)。采用超高效液相色谱-三重四极杆质谱(UHPLC-QQQ-MS/MS)法,同时对栀子果实中的13种主要化学成分进行了定量。此外,我们比较了果皮的重量百分比,肉,和具有不同性状的样品的种子部分,并量化了不同部位的13种化学成分。结果表明,除了几个水果大小范围重叠的例子,水之子通常比之子表现出更大和更长的尺寸。水枝子果皮的重量比例往往高于枝子果皮。定量结果突出了两个品种在化学成分含量上的显著差异,水枝子一般含有较高水平的环烯醚萜类化合物。PCA和OPLS-DA分析明显划分了水之子和之子,其中有三种环烯醚萜类,两种有机酸,一种类黄酮对它们的分类做出了重大贡献。聚类热图分析还表明,知子和水之子之间完全分离,不同来源的芝子样品之间有明显的区别。13种化学成分在不同部位的分布保持一致,环烯醚萜类和色素集中在种子和果肉中,果皮中富含两种有机酸和一种类黄酮。总之,本研究为知子的分类提供了有价值的见解,并为合理使用知子和知子的不同部位提供了指导。
    Gardeniae Fructus (Zhizi) serves as both a medicinal and edible substance and finds widespread use in various industries. There are often two kinds of medicinal materials in the market: Zhizi and Shuizhizi. Typically, Zhizi with small, round fruit is used for medicinal purposes, while Shuizhizi, characterized by large, elongated fruit, is employed for dyeing. Market surveys have revealed a diverse range of Zhizi types, and modern research indicates that Shuizhizi contains rich chemical components and pharmacological activities. In this study, we collected 25 batches of Zhizi and Shuizhizi samples, categorizing them based on appearance into obovate and round fruits, with seven length grades (A-G). Using the ultra-high performance liquid chromatography coupled with triple quadrupole mass spectrometry (UHPLC-QQQ-MS/MS) method, we simultaneously quantified 13 main chemical components in fruits of Gardenia species. In addition, we compared the weight percentage of the pericarp, flesh, and seeds parts of samples with different traits, and quantified 13 chemical components in different parts. Results indicated that, aside from a few instances of overlapping fruit size ranges, Shuizhizi generally exhibits larger and longer dimensions than Zhizi. The weight proportion of the Shuizhizi pericarp is often higher than that of the Zhizi pericarp. Quantitative results highlighted significant differences in the chemical component content between Zhizi and Shuizhizi, with Shuizhizi generally containing higher levels of iridoids. The PCA and OPLS-DA analysis distinctly divided Shuizhizi and Zhizi, among which three iridoids, two organic acids, and one flavonoid made significant contributions to their classification. Cluster heatmap analysis also demonstrated complete separation between Zhizi and Shuizhizi, with clear distinctions among Zhizi samples from different origins. The distribution of the 13 chemical components in different Zhizi and Shuizhizi parts remained consistent, with iridoids and pigments concentrated in the seeds and flesh, and two organic acids and one flavonoid enriched in the pericarp. In summary, this study contributes valuable insights for classifying Zhizi and offers guidance on the rational use of Shuizhizi and the different parts of Zhizi.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    最近,由于难以收集涵盖工业场景中所有机械故障类型的状态数据,在没有目标先验信息的情况下,不完整数据下的故障诊断问题越来越受到重视。现有的开放集或通用域自适应(DA)诊断方法通常将目标中的私有故障样本视为广义的“未知”故障类,忽视其固有结构。这种疏忽可能导致潜在特征空间表示的混淆和分离未知样本的困难。因此,开发了一种具有无监督聚类的通用DA方法来探索机械故障诊断目标样本的内在结构,在不同工作条件下的多源信息被认为是为了传递互补的知识。首先,构建了结合单域和跨域评估的复合聚类度量,以识别源目标域上的共享和未知健康类。第二,为了缓解阶级内部转移,同时扩大阶级之间的差距,提出了一种基于最大均值差异的类DA算法。最后,熵正则化准则用于促进不同健康类的聚类。通过在三个旋转机械数据集上的大量实验,验证了所提出的方法在监测数据不足时的故障诊断问题中的有效性。
    Recently, due to the difficulty of collecting condition data covering all mechanical fault types in industrial scenarios, the fault diagnosis problem under incomplete data is receiving increasing attention where no target prior information can be available. The existing open-set or universal domain adaptation (DA) diagnosis methods typically treat private fault samples in the target as a generalized \"unknown\" fault class, neglecting their inherent structure. This oversight can lead to confusion in latent feature space representations and difficulties in separating unknown samples. Therefore, a universal DA method with unsupervised clustering is developed to explore the intrinsic structure of the target samples for mechanical fault diagnosis, where multi-source information on different working conditions is considered to transfer complementary knowledge. First, a composite clustering metric combining single-domain and cross-domain evaluation is constructed to recognize shared and unknown health classes on source-target domains. Second, to alleviate the intra-class shift while enlarging the inter-class gap, a class-wise DA algorithm is suggested which operates on the basis of maximum mean discrepancy. Finally, an entropy regularization criterion is utilized to facilitate clustering of different health classes. The efficacy of the presented approach in the fault diagnosis issues when monitoring data is inadequate has been verified through extensive experiments on three rotating machinery datasets.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    这项研究的目的是确定遗留和新兴的全氟烷基物质(PFAS)从三个典型的氟化工业园区(FIP)在中国,并评估其环境发生和命运。实施了互补的可疑目标和非目标筛查,并确定了总共111个新出现的PFAS。基于多质量尺度分析,首次确定了25个新兴的PFAS,包括24种全氟和多氟烷基醚羧酸(PFECAs)和1种超短氯化全氟烷基羧酸(Cl-PFCAs,C2),非目标PFAS的最大百分比为48.2%(不包括目标PFAS)。在不同培养基中鉴定的PFAS的组成受官能团的影响,碳链长度,取代基和醚键插入,优选在水中取代聚氢,沉积物中PFECAs的模式更多样化。PFAS同源物的模式在PFAS生产模式的转变中揭示了三种典型FIP之间的明显差异。阜新和常熟地区的主要PFAS为C4-PFAS和短链羧酸(≤C6),分别。相比之下,全氟辛酸(PFOA,C8)在淄博仍然占主导地位,水和沉积物中的最高点浓度高达706微克/升和553微克/克,分别。
    The objectives of this study were to identify both legacy and emerging per- and polyfluoroalkyl substances (PFAS) from three typical fluoridated industrial parks (FIPs) in China, and to assess their environmental occurrence and fate. Complementary suspect target and nontarget screening were implemented, and a total of 111 emerging PFAS were identified. Based on the multi-mass scale analysis, 25 emerging PFAS were identified for the first time, including 24 per- and polyfluoroalkyl ether carboxylic acids (PFECAs) and 1 ultra-short chlorinated perfluoroalkyl carboxylic acids (Cl-PFCAs, C2), with a maximum percentage of 48.2 % in nontarget PFAS (exclude target PFAS). The composition of PFAS identified in different media was influenced by functional groups, carbon chain length, substituents and ether bond insertion, with poly-hydrogen substituted being preferably in water and a more diverse pattern of PFECAs in sediments. The patterns of PFAS homologs revealed distinct differences among the three typical FIPs in the shift of PFAS production patterns. The C4-PFAS and short-chain carboxylic acids (≤C6) were the main PFAS in the Fuxin and Changshu, respectively. In contrast, perfluorooctanoic acid (PFOA, C8) remained dominant in Zibo, and the highest point concentrations in water and sediment were up to 706 µg/L and 553 µg/g, respectively.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    智能故障诊断旨在利用有限的数据集建立鲁棒的机械状态识别模型。在这个阶段,故障诊断面临两个实际挑战:(1)机械工况的多变性使得采集到的数据分布不一致,导致类别差距。为了应对这两个纠缠在一起的挑战,本研究开发了一种开放集多源域自适应方法。具体来说,引入了在多个分类器上定义的互补可转移性度量,以量化每个目标样本与已知类别的相似性,以加权对抗机制。通过应用未知模式检测器,可以自动识别未知故障。此外,进一步采用多源相互监督策略来挖掘不同源之间的相关信息,以增强模型性能。对三个旋转机械数据集进行了广泛的实验,结果表明,在出现新故障模式的机械诊断问题上,该方法优于传统的域自适应方法。
    Intelligent fault diagnosis aims to build robust mechanical condition recognition models with limited dataset. At this stage, fault diagnosis faces two practical challenges: (1) the variability of mechanical working conditions makes the collected data distribution inconsistent, which brings about the domain shift; (2) some unpredictable unknown fault modes that do not observe in the training dataset may occur in the testing scenario, leading to a category gap. In order to cope with these two entangled challenges, an open set multi-source domain adaptation approach is developed in this study. Specifically, a complementary transferability metric defined on multiple classifiers is introduced to quantify the similarity of each target sample to known classes to weight the adversarial mechanism. By applying an unknown mode detector, unknown faults can be automatically identified. Moreover, a multi-source mutual-supervised strategy is further adopted to mine relevant information between different sources to enhance the model performance. Extensive experiments are conducted on three rotating machinery datasets, and the results show that the proposed method is superior to traditional domain adaptation approaches in the mechanical diagnosis issues that new fault modes occur.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    轧钢产品的多工艺制造需要复杂多变的轧制条件的配合。这种情况对轧机关键设备的故障诊断提出了挑战。迁移学习的发展在一定程度上缓解了变工况下的故障诊断问题。然而,现有的基于迁移学习的诊断方法仅考虑来自单个表示的分布对齐,这可能只会转移部分状态知识并生成模糊决策边界。因此,本文提出了一种具有双重对抗学习的多表示域自适应网络,用于热轧机组故障诊断。首先,设计了一种多表示的网络结构,从多个角度提取轧机设备状态信息。然后,采用域对抗策略来匹配每对表示的源域和目标域,以从多个表示网络中学习域不变特征。此外,采用最大分类器差异对抗算法生成接近源支持的目标特征,从而形成一个鲁棒的决策边界。最后,将两个分类器的预测概率的平均值作为最终诊断结果。在四高热轧机的实验平台上进行了广泛的实验,以收集减速齿轮箱和轧辊轴承的故障状态数据。实验结果表明,该方法能有效实现轧机设备在变工况下的故障诊断,在轧机齿轮箱和轴承数据集上的平均诊断率可达99.15%和99.40%,分别比最具竞争力的方法高出2.19%和1.93%。
    The multi-process manufacturing of steel rolling products requires the cooperation of complicated and variable rolling conditions. Such conditions pose challenges to the fault diagnosis of the key equipment of the rolling mill. The development of transfer learning has alleviated the problem of fault diagnosis under variable working conditions to a certain extent. However, existing diagnosis methods based on transfer learning only consider the distribution alignment from a single representation, which may only transfer part of the state knowledge and generate fuzzy decision boundaries. Therefore, this paper proposes a multi-representation domain adaptation network with duplex adversarial learning for hot rolling mill fault diagnosis. First, a multi-representation network structure is designed to extract rolling mill equipment status information from multiple perspectives. Then, the domain adversarial strategy is adopted to match the source and target domains of each pair of representations for learning domain-invariant features from multiple representation networks. In addition, the maximum classifier discrepancy adversarial algorithm is adopted to generate target features that are close to the source support, thereby forming a robust decision boundary. Finally, the average value of the predicted probabilities of the two classifiers is used as the final diagnostic result. Extensive experiments are conducted on an experimental platform of a four-high hot rolling mill to collect the fault state data of the reduction gearbox and roll bearing. The experimental results reveal that the method can effectively realize the fault diagnosis of rolling mill equipment under variable working conditions and can achieve average diagnostic rates of up to 99.15% and 99.40% on the data sets of the rolling mill gearbox and bearing, which are respectively 2.19% and 1.93% higher than the rates achieved by the most competitive method.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    作为焙烧过程中的关键变量,焙烧温度对操作条件有显著影响。模型预测控制(MPC)提供了稳定焙烧温度的途径。然而,由于焙烧过程的饲料成分波动,在不同时期收集的过程数据通常遵循不同的分布,导致在线控制上的模型不匹配。出于这个原因,提出了一种基于域间映射学习的传输预测控制方法(IDML-MPC)。所提出的方法首先将历史数据和在线数据视为两个领域。然后,学习从一个域到另一个域的分布映射功能,使历史数据的分布遵循在线数据的分布。最后,建立了准确的在线预测模型,通过最小化相对于预测值和控制输入的成本函数来实现焙烧温度控制。通过数值算例和焙烧过程仿真平台的对比实验证明了该方法的有效性。与一些最先进的方法进行比较的实验结果表明,当生产条件发生变化时,有必要考虑历史数据和在线数据之间的分布差异。IDML-MPC提高了焙烧温度的控制性能,均方根误差平均降低了56.98%。
    As a critical variable in the roasting process, the roasting temperature has a significant influence on operating conditions. Model predictive control (MPC) provides a path to stabilize the roasting temperature. However, process data collected at different periods usually follow different distributions due to the fluctuation of feed composition for the roasting process, result in a model mismatch on online control. For this reason, a transfer predictive control method based on inter-domain mapping learning (IDML-MPC) is proposed. The proposed method first treat historical and online data as two domains. Then, a distribution mapping function from one domain to another domain is learned to make the distribution of the historical data follow that of the online data. Finally, an accurate online prediction model is built, roasting temperature control is achieved by minimizing the cost function with respect to the predicted value and the control input. The effectiveness of the proposed method is demonstrated by comparative experiments based on a numerical example and a simulation platform of the roasting process. Experimental results compared with some state-of-the-art methods show that it is necessary to take into account the distribution differences between historical data and online data when production conditions change. The IDML-MPC improved the control performance for the roasting temperature with an average 56.98% reduction in the root mean square error.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号