Kernel density estimation

核密度估计
  • 文章类型: Journal Article
    本文介绍了一种在多分辨率(MR)密度估计和非参数密度估计中选择带宽或平滑参数的方法。它是基于第二个的演变,第三和第四中心矩以及不同带宽和分辨率级别的估计密度的形状。所提出的方法已通过多分辨率密度和核密度估计(分别为MRDE和KDE)应用于密度估计。模拟和经验应用的结果表明,矩量法产生的分辨率水平在多模态密度下比贝叶斯信息准则(BIC)的多分辨率密度估计和内核密度估计插件更好。
    This paper introduces an approach to select the bandwidth or smoothing parameter in multiresolution (MR) density estimation and nonparametric density estimation. It is based on the evolution of the second, third and fourth central moments and the shape of the estimated densities for different bandwidths and resolution levels. The proposed method has been applied to density estimation by means of multiresolution densities as well as kernel density estimation (MRDE and KDE respectively). The results of the simulations and the empirical application demonstrate that the level of resolution resulting from the moments method performs better with multimodal densities than the Bayesian Information Criterion (BIC) for multiresolution densities estimation and the plug-in for kernel densities estimation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    为探讨珠江三角洲城市群热浪灾害期间热脆弱性的时空演变特征,本研究采用熵权法(EWM)计算了该地区2001年至2022年九个城市的热脆弱性评估结果。通过核密度估计的应用,Moran\'sI,以及地理和时间加权回归(GTWR)模型,被证明优于传统模型,如OLS,本研究分析了研究区热脆弱性的动态分布规律,剖析了影响因素的变化趋势。结果表明,从2001年到2022年,研究区的热脆弱性指数总体呈波动下降趋势。热脆弱性的主要贡献者包括高频和长时间的热浪,人口敏感性,以及居民消费水平的变化。在这个发展时期,城市之间的热脆弱性差距逐渐扩大,表明该地区发展不平衡的总体格局。未来的注意力应集中在高度脆弱性地区制定热适应战略,以增强研究区域的整体可持续性。
    To explore the spatiotemporal evolution characteristics of heat vulnerability in the Pearl River Delta urban agglomeration during heatwave disasters, this research employs the Entropy Weight Method (EWM) to calculate the heat vulnerability assessment results for nine cities in the region spanning from 2001 to 2022. Through the application of kernel density estimation, Moran\'s I, and the Geographically and Temporally Weighted Regression (GTWR) model, which is proven to be superior to traditional model such as OLS, this study analyzes the dynamic distribution patterns of heat vulnerability in the study area and dissect the trends of influencing factors. The results reveal that from 2001 to 2022, the overall heat vulnerability index in the study area demonstrates a fluctuating downward trend. Key contributors to heat vulnerability include high-frequency and long-duration heatwaves, population sensitivity, and changes in residents\' consumption levels. Throughout this period of development, the disparity in heat vulnerability among cities has gradually widened, indicating an overall pattern of uneven development in the region. Future attention should be focused on formulating heat adaptation strategies in areas with high vulnerability to enhance the overall sustainability of the study area.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    提出了一种基于小波变换(WT)的最优时空混合模型(STHM),以提高检测复杂工业生产系统中早期发生的缓慢演化且容易被噪声淹没的故障的灵敏度和准确性。具体来说,执行WT以对原始数据进行去噪,从而降低背景噪声的影响。然后,主成分分析(PCA)和滑动窗口算法用于获取空间和时间维度上的最近邻居。随后,累积和(CUSUM)和马氏距离(MD)用于重建具有时空序列的混合统计量。它有助于增强高频时间动态与空间的相关性,提高故障检测精度。此外,核密度估计(KDE)方法用于估计混合统计量的上阈值,以优化故障检测过程。最后,通过在田纳西州伊士曼(TE)过程的早期故障检测中应用基于WT的最佳STHM进行仿真,为了证明所提出的故障检测方法具有较高的故障检测率(FDR)和较低的误报率(FAR),它可以提高生产安全性和产品质量。
    An optimal spatio-temporal hybrid model (STHM) based on wavelet transform (WT) is proposed to improve the sensitivity and accuracy of detecting slowly evolving faults that occur in the early stage and easily submerge with noise in complex industrial production systems. Specifically, a WT is performed to denoise the original data, thus reducing the influence of background noise. Then, a principal component analysis (PCA) and the sliding window algorithm are used to acquire the nearest neighbors in both spatial and time dimensions. Subsequently, the cumulative sum (CUSUM) and the mahalanobis distance (MD) are used to reconstruct the hybrid statistic with spatial and temporal sequences. It helps to enhance the correlation between high-frequency temporal dynamics and space and improves fault detection precision. Moreover, the kernel density estimation (KDE) method is used to estimate the upper threshold of the hybrid statistic so as to optimize the fault detection process. Finally, simulations are conducted by applying the WT-based optimal STHM in the early fault detection of the Tennessee Eastman (TE) process, with the aim of proving that the fault detection method proposed has a high fault detection rate (FDR) and a low false alarm rate (FAR), and it can improve both production safety and product quality.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    使用可穿戴传感器跟踪和监测畜牧业中的动物的兴趣一直在稳步增长。这些设备的使用在广泛的牲畜系统中尤为重要,在这些系统中,动物和农民之间的直接互动很少。需要在远距离畜群监测方面做出艰苦的努力。物联网(IoT)技术提供了一种有前途的解决方案,可以解决远距离带来的挑战。实现实时和远程动物监测。在这项研究中,使用定制设计的设备进行了一项实验试验,位于聚氯乙烯(PVC)的情况下,特别适合衣领。此案包含了一个集成的SigFox通信系统,即,低功耗全球定位系统(LP-GPS)全向系统,和电源。审判在位于不同领土区域的两个放牧区进行,指定为案例研究I和II。为每个选定的动物提供了LP-GPS项圈,病例研究I的数据间隔为20分钟,病例研究II的数据间隔为10分钟。然后将获得的数据导入并使用地理信息系统(GIS)软件进行分析。信息是通过专门构建的Web应用程序(AppWeb)收集的。目的是通过开发基于GIS的方法来分析两个被认为是放牧区域内主要由动物占据的领土区域。具体来说,使用诸如Heatmap和内核密度估计(KDE)插件之类的定制算法进行空间分析。通过Heatmap插件获得的地图,显示了放牧区域内动物的时空分布。此外,KDE工具用于对首选领土区域进行分类,为样本中的每个动物生成定制的图表。各个核心领域,通过对每只动物的KDE评估确定,被覆盖以提供对被监测动物的全面分析。应用基于GIS的方法所获得的结果有助于识别动物的位置,并可用于提供对摄食行为和土壤侵蚀的见解,从而有助于预防环境问题。
    Interest in tracking and monitoring animals in livestock farming using wearable sensors has been steadily increasing. The use of these devices is particularly crucial in extensive livestock systems where direct interaction between animals and farmers is infrequent, necessitating strenuous efforts in long-distance herd monitoring. Internet of Things (IoT) technologies offer a promising solution to address the challenges posed by vast distances, enabling real-time and remote animal monitoring. In this study, an experimental trial was conducted using a custom-designed device, located in a Polyvinyl Chloride (PVC) case, specifically tailored to fit onto a collar. This case incorporates an integrated SigFox communication system, i.e., a Low Power Global Positioning System (LP-GPS) omnidirectional system, and a power supply. The trial took place in two grazing areas located in different territorial zones, designated as Case Study I and II. A LP-GPS collar was provided for each selected animal, and the data were recorded at 20-min intervals for Case Study I and 10-min intervals for Case Study II. The acquired data were then imported and analysed using Geographical Information Systems (GIS) software. Information was collected through a purpose-built web application (AppWeb). The objective was to analyze those territorial areas mostly occupied by animals within the two considered grazing areas by developing a GIS-based methodology. Specifically, customized algorithms such as Heatmap and Kernel Density Estimation (KDE) plugins were employed to conduct spatial analyses. The maps obtained through Heatmap plugin, showed the temporal-spatial distribution of animals within their grazing areas. Additionally, the KDE tool was used to classify preferred territorial areas, generating tailored charts for each animal in the sample. The individual Core Areas, determined through KDE evaluation for each animal, were overlaid to provide a comprehensive analysis of the monitored animals.The results achieved applying the GIS-based methodology facilitated the identification of animal positions and could be adopted to provide insights into feeding behavior and soil erosion, thereby aiding in the prevention of environmental issues.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    强化学习算法越来越多地用于电力系统中的不同领域。训练和部署这些算法的一个值得注意的挑战是获取大型,现实的数据集。这些算法必须进行广泛的训练,大量迭代的真实数据集,以确保在现实场景中的最佳性能。为了实现这个目标,我们策划了一个全面的数据集,该数据集捕获了指定停车设施内29,600天的电动汽车(EV)充电细节。该数据集包含必要的信息,例如连接时间,充电持续时间,和单个电动汽车的能源消耗。该方法涉及采用条件表格生成对抗网络(CTGAN),从位于加州理工学院校园的EV充电设施收集的较小初始数据集制作合成数据集池。随后,实现了多种后处理技术来从这个池中提取数据,确保符合充电站的容量限制,同时保持从历史数据得出的真实的每日电动汽车需求概况。使用核密度估计(KDE),历史数据的分布特征,特别是关于电动汽车连接的时机,被忠实地复制。开发的数据集对训练离线强化学习算法特别有用。
    Reinforcement learning algorithms are increasingly utilized across diverse domains within power systems. One notable challenge in training and deploying these algorithms is the acquisition of large, realistic datasets. It is imperative that these algorithms are trained on extensive, realistic datasets over numerous iterations to ensure optimal performance in real-world scenarios. In pursuit of this goal, we curated a comprehensive dataset capturing electric vehicle (EV) charging details over a span of 29,600 days within a designated parking facility. This dataset encompasses necessary information such as connection times, charging durations, and energy consumption of individual EVs. The methodology involved employing conditional tabular generative adversarial networks (CTGAN) to craft a pool of synthetic dataset from a smaller initial dataset collected from an EV charging facility located on the Caltech campus. Subsequently, multiple post-processing techniques were implemented to extract data from this pool, ensuring compliance with the charging station\'s capacity constraint while maintaining a realistic daily EV demand profile derived from historical data. Using kernel density estimation (KDE), the distributional characteristics of the historical data, especially concerning the timing of EV connections, were faithfully replicated. The developed dataset is specifically useful in training offline reinforcement learning algorithms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:机动车碰撞是城市高速公路上死亡和伤害的主要来源。从时间的角度来看,随着时间的推移,确定路段容易碰撞可能会发生剧烈波动,使运输机构难以提出交通干预措施。然而,随着时间的推移,识别和表征具有不同碰撞密度模式的易发生碰撞的路段的研究有限。
    方法:本研究提出了一种识别和表征框架,该框架可概述具有各种碰撞密度变化的易发生碰撞的道路。我们首先采用时空网络核密度估计(STNKDE)方法和时间序列聚类来识别具有不同碰撞密度模式的路段。接下来,我们基于时空信息来表征易发生碰撞的路段,后果,车辆类型,以及导致碰撞的因素。所提出的方法适用于纽约市的两年机动车碰撞记录。
    结果:确定了具有不同碰撞密度模式的七个路段集群。经常被确定为容易发生碰撞的路段主要位于曼哈顿下城和布朗克斯区中心。此外,随着时间的推移,路段附近的碰撞会导致更多的伤亡,其中许多是由人为因素和车辆因素造成的。
    结论:随着时间的推移,具有各种碰撞密度模式的易碰撞路段在时空域和在其上发生的碰撞方面具有明显的差异。
    结论:提出的方法可以帮助决策者了解易发生碰撞的路段如何随时间变化,并可以作为更有针对性的交通处理的参考。
    BACKGROUND: Motor vehicle collisions are a leading source of mortality and injury on urban highways. From a temporal perspective, the determination of a road segment as being collision-prone over time can fluctuate dramatically, making it difficult for transportation agencies to propose traffic interventions. However, there has been limited research to identify and characterize collision-prone road segments with varying collision density patterns over time.
    METHODS: This study proposes an identification and characterization framework that profiles collision-prone roads with various collision density variations. We first employ the spatio-temporal network kernel density estimation (STNKDE) method and time-series clustering to identify road segments with different collision density patterns. Next, we characterize collision-prone road segments based on spatio-temporal information, consequences, vehicle types, and contributing factors to collisions. The proposed method is applied to two-year motor vehicle collision records for New York City.
    RESULTS: Seven clusters of road segments with different collision density patterns were identified. Road segments frequently determined as collision-prone were primarily found in Lower Manhattan and the center of the Bronx borough. Furthermore, collisions near road segments that exhibit greater collision densities over time result in more fatalities and injuries, many of which are caused by both human and vehicle factors.
    CONCLUSIONS: Collision-prone road segments with various collision density patterns over time have distinct differences in the spatio-temporal domain and the collisions that occur on them.
    CONCLUSIONS: The proposed method can help policymakers understand how collision-prone road segments change over time, and can serve as a reference for more targeted traffic treatment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    农业能源消费造成的二氧化碳排放不平等是中国各地区协调低碳农业发展的重大挑战。然而,中国农业能源相关CO2排放不平等的演化特征和驱动因素知之甚少。作为回应,采用Kaya-Theil模型考察了影响中国农业能源消费中CO2排放不平等的三个潜在因素。结果显示,从1997年到2021年,人均农业能源相关二氧化碳排放量呈现显著上升趋势,具有突出的极化和右拖尾现象。总的来说,不平等呈下降趋势,泰尔指数从1997年的0.4109下降到2021年的0.1957。同时,国家不平等的分解表明,群体内不平等从0.3991下降到0.1634,大于群体间不平等,在将28个省划分为三个粮食生产功能区的基础上。至于卡亚的三个因素,能源强度对整体不平等的贡献最大,其次是农业经济发展和CO2排放强度。基于这些结果,这项研究提供了一些减少农业相关二氧化碳排放的潜在策略。
    The inequality in CO2 emissions from agricultural energy consumption is a major challenge for coordinating low-carbon agricultural development across regions in China. However, the evolutionary characteristics and driving factors of inequality in China\'s agricultural energy-related CO2 emissions are poorly understood. In response, the Kaya-Theil model was adopted to examine the three potential factors influencing CO2 emission inequality in China\'s agricultural energy consumption. The results revealed that, from 1997 to 2021, agricultural energy-related CO2 emissions per capita showed a significant upward trend, with prominent polarization and right-tailing phenomena. Overall, the inequality was on a downward trend, with the Theil index falling from 0.4109 in 1997 to 0.1957 in 2021. Meanwhile, the decomposition of the national inequality revealed that the within-group inequality declined from 0.3991 to 0.1634, which was greater than between-group inequality, based on zoning the 28 provinces into three grain production functional areas. As for the three kaya factors, the energy intensity contributed the most to the overall inequality, followed by the agricultural economic development and CO2 emission intensity. Based on these results, this study provided some potential strategies to reduce agricultural-related CO2 emissions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    家庭范围和家庭范围重叠可用于描述野生动物的空间使用和运动。在过去的几年里,技术的进步大大提高了我们对动物运动的理解,尤其是大型食草动物。温带地区野生有蹄类动物的丰度和分布有所增加。此外,他们的疾病-包括伊比利亚伊比利亚伊比利亚(Caprapyrenaica)的沙眼-已成为牲畜关注的原因,公共卫生,和野生动物保护。在这项研究中,我们首先回顾了有关Capra属物种归属范围的现有文献。然后,我们分析了52个GPS-GSM环带伊比利亚ibexes的数据,其中33人是健康的,19人受到伊比利亚半岛东南部3个不同种群的sarcopticmange的影响,以分析:(1)通过3种最常用的方法-最小凸多边形获得的家园范围的大小和特征的差异,核密度估计,和布朗桥运动模型(BBMMs);(2)地方性沙眼对伊比利亚伊比利亚伊比利亚地区的影响。文献综述显示,有关Capraspp的空间行为的可用信息。仅基于3个物种,包括伊比利亚人,通过多种方法估计,这使得很难比较结果。我们发现伊比利亚地区不同的家庭范围估计方法之间存在正相关,BBMM被证明是最准确的。这项研究是第一个使用BBMM来估计该物种的家园范围的研究,它揭示了空间使用中明显的季节性行为,尽管沙眼mange平滑了这种季节性模式。获得的季节性重叠表明,伊比利亚伊比利亚人的核心地区在更广泛的家庭范围内发生变化,是与确定物种管理和保护的关键区域相关的生态参数。
    Home range and home range overlap can be used to describe use of space and movement of wildlife. During the last years, advancements in technology have greatly improved our understanding of animal movement, especially among large herbivores. Wild ungulate abundance and distribution have increased in temperate areas. Moreover, their diseases-including sarcoptic mange in the Iberian Ibex (Capra pyrenaica)-have become a cause of concern for livestock, public health, and wildlife conservation. In this study, we first reviewed existing literature on the home range of species in the genus Capra. We then analyzed data from 52 GPS-GSM-collared Iberian ibexes, of which 33 were healthy and 19 were affected by sarcoptic mange from 3 different populations in the southeastern Iberian Peninsula to analyze: (1) differences in size and characteristics of home ranges obtained by the 3 most commonly used methodologies-minimum convex polygon, kernel density estimation, and Brownian bridges movement models (BBMMs); and (2) the impact of endemic sarcoptic mange on Iberian Ibex home range. The literature review revealed that available information on spatial behavior of Capra spp. was based only on 3 species, including the Iberian Ibex, estimated through a diversity of methods which made it difficult to compare results. We found positive correlations among the different home range estimation methods in the Iberian Ibex, with BBMMs proving to be the most accurate. This study is the first to use BBMMs for estimating home range in this species, and it revealed a marked seasonal behavior in spatial use, although sarcoptic mange smoothed such seasonal pattern. The seasonal overlaps obtained suggest that core areas of the Iberian Ibex change within wider home range areas, which are ecological parameters relevant to identifying key areas for species management and conservation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    老化强度(AI),定义为瞬时危险率和基线危险率的比率,是描述与寿命相对应的随机变量的可靠性属性的有用工具。在这项工作中,在步进应力加速寿命测试(SSALT)实验中引入了人工智能的概念,为模型提供新的见解,并能够进一步澄清两种常用的累积暴露(CE)和篡改故障率(TFR)模型之间的差异。提出了新的基于AI的SSALT模型参数估计器,并根据示例和仿真研究与MLE进行了比较。
    The aging intensity (AI), defined as the ratio of the instantaneous hazard rate and a baseline hazard rate, is a useful tool for the describing reliability properties of a random variable corresponding to a lifetime. In this work, the concept of AI is introduced in step-stress accelerated life testing (SSALT) experiments, providing new insights to the model and enabling the further clarification of the differences between the two commonly employed cumulative exposure (CE) and tampered failure rate (TFR) models. New AI-based estimators for the parameters of a SSALT model are proposed and compared to the MLEs in terms of examples and a simulation study.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在具有连续数据的实际应用中使用信息理论量通常受到概率密度函数需要在更高维度上估计的事实的阻碍,这可能变得不可靠,甚至在计算上不可行。为了使这些有用的数量更容易获得,已经提出了诸如使用直方图和k-最近邻(k-NN)的分组频率的替代方法。然而,缺乏对这些方法适用性的系统比较。我们希望通过在精心设计的合成测试用例中将基于核密度的估计(KDE)与这两种替代方案进行比较来填补这一空白。具体来说,我们希望估计信息论量:熵,Kullback-Leibler分歧,和相互信息,从样本数据。作为参考,将结果与封闭形式的解或数值积分进行比较。我们从尺寸范围从一到十的各种形状的分布中生成样本。我们将估计器的性能评估为样本量的函数,分布特征,和选择的超参数。我们进一步比较了所需的计算时间和具体的实现挑战。值得注意的是,k-NN估计往往优于其他方法,考虑算法实现,计算效率,和估计准确性,特别是有足够的数据。这项研究为信息理论量的不同估计方法的优势和局限性提供了宝贵的见解。它还强调了考虑数据特征的重要性,以及选择适当的估计技术时的目标信息理论量。这些发现将有助于科学家和从业者选择最合适的方法,考虑到它们的具体应用和可用数据。我们已经在一个现成的开源Python3工具箱中收集了比较的估计方法,因此,希望促进研究人员和实践者使用信息理论量来评估各个学科的数据和模型中的信息。
    Using information-theoretic quantities in practical applications with continuous data is often hindered by the fact that probability density functions need to be estimated in higher dimensions, which can become unreliable or even computationally unfeasible. To make these useful quantities more accessible, alternative approaches such as binned frequencies using histograms and k-nearest neighbors (k-NN) have been proposed. However, a systematic comparison of the applicability of these methods has been lacking. We wish to fill this gap by comparing kernel-density-based estimation (KDE) with these two alternatives in carefully designed synthetic test cases. Specifically, we wish to estimate the information-theoretic quantities: entropy, Kullback-Leibler divergence, and mutual information, from sample data. As a reference, the results are compared to closed-form solutions or numerical integrals. We generate samples from distributions of various shapes in dimensions ranging from one to ten. We evaluate the estimators\' performance as a function of sample size, distribution characteristics, and chosen hyperparameters. We further compare the required computation time and specific implementation challenges. Notably, k-NN estimation tends to outperform other methods, considering algorithmic implementation, computational efficiency, and estimation accuracy, especially with sufficient data. This study provides valuable insights into the strengths and limitations of the different estimation methods for information-theoretic quantities. It also highlights the significance of considering the characteristics of the data, as well as the targeted information-theoretic quantity when selecting an appropriate estimation technique. These findings will assist scientists and practitioners in choosing the most suitable method, considering their specific application and available data. We have collected the compared estimation methods in a ready-to-use open-source Python 3 toolbox and, thereby, hope to promote the use of information-theoretic quantities by researchers and practitioners to evaluate the information in data and models in various disciplines.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号