fuzzy clustering

  • 文章类型: Journal Article
    As one of the essential topological structures in complex networks, community structure has significant theoretical and application value and has attracted the attention of researchers in many fields. In a social network, individuals may belong to different communities simultaneously, such as a workgroup and a hobby group. Therefore, overlapping community discovery can help us understand and model the network structure of these multiple relationships more accurately. This article proposes a two-stage multi-objective evolutionary algorithm for overlapping community discovery problem. First, using the initialization method to divide the central node based on node degree, combined with the cross-mutation evolution strategy of the genome matrix, the first stage of non-overlapping community division is completed on the decomposition-based multi-objective optimization framework. Then, based on the result set of the first stage, appropriate nodes are selected from each individual\'s community as the central node of the initial population in the second stage, and the fuzzy threshold is optimized through the fuzzy clustering method based on evolutionary calculation and the feedback model, to find reasonable overlapping nodes. Finally, tests are conducted on synthetic datasets and real datasets. The statistical results demonstrate that compared with other representative algorithms, this algorithm performs optimally on test instances and has better results.






  • 文章类型: Journal Article
    Electrical tomography sensors have been widely used for pipeline parameter detection and estimation. Before they can be used in formal applications, the sensors must be calibrated using enough labeled data. However, due to the high complexity of actual measuring environments, the calibrated sensors are inaccurate since the labeling data may be uncertain, inconsistent, incomplete, or even invalid. Alternatively, it is always possible to obtain partial data with accurate labels, which can form mandatory constraints to correct errors in other labeling data. In this paper, a semi-supervised fuzzy clustering algorithm is proposed, and the fuzzy membership degree in the algorithm leads to a set of mandatory constraints to correct these inaccurate labels. Experiments in a dredger validate the proposed algorithm in terms of its accuracy and stability. This new fuzzy clustering algorithm can generally decrease the error of labeling data in any sensor calibration process.






  • 文章类型: Journal Article
    Vocal complexity is central to many evolutionary hypotheses about animal communication. Yet, quantifying and comparing complexity remains a challenge, particularly when vocal types are highly graded. Male Bornean orangutans (Pongo pygmaeus wurmbii) produce complex and variable \"long call\" vocalizations comprising multiple sound types that vary within and among individuals. Previous studies described six distinct call (or pulse) types within these complex vocalizations, but none quantified their discreteness or the ability of human observers to reliably classify them. We studied the long calls of 13 individuals to: (1) evaluate and quantify the reliability of audio-visual classification by three well-trained observers, (2) distinguish among call types using supervised classification and unsupervised clustering, and (3) compare the performance of different feature sets. Using 46 acoustic features, we used machine learning (i.e., support vector machines, affinity propagation, and fuzzy c-means) to identify call types and assess their discreteness. We additionally used Uniform Manifold Approximation and Projection (UMAP) to visualize the separation of pulses using both extracted features and spectrogram representations. Supervised approaches showed low inter-observer reliability and poor classification accuracy, indicating that pulse types were not discrete. We propose an updated pulse classification approach that is highly reproducible across observers and exhibits strong classification accuracy using support vector machines. Although the low number of call types suggests long calls are fairly simple, the continuous gradation of sounds seems to greatly boost the complexity of this system. This work responds to calls for more quantitative research to define call types and quantify gradedness in animal vocal systems and highlights the need for a more comprehensive framework for studying vocal complexity vis-à-vis graded repertoires.






  • 文章类型: Journal Article
    The use of medical data for machine learning, including unsupervised methods such as clustering, is often restricted by privacy regulations such as the Health Insurance Portability and Accountability Act (HIPAA). Medical data is sensitive and highly regulated and anonymization is often insufficient to protect a patient\'s identity. Traditional clustering algorithms are also unsuitable for longitudinal behavioral health trials, which often have missing data and observe individual behaviors over varying time periods. In this work, we develop a new decentralized federated multiple imputation-based fuzzy clustering algorithm for complex longitudinal behavioral trial data collected from multisite randomized controlled trials over different time periods. Federated learning (FL) preserves privacy by aggregating model parameters instead of data. Unlike previous FL methods, this proposed algorithm requires only two rounds of communication and handles clients with varying numbers of time points for incomplete longitudinal data. The model is evaluated on both empirical longitudinal dietary health data and simulated clusters with different numbers of clients, effect sizes, correlations, and sample sizes. The proposed algorithm converges rapidly and achieves desirable performance on multiple clustering metrics. This new method allows for targeted treatments for various patient groups while preserving their data privacy and enables the potential for broader applications in the Internet of Medical Things.






  • 文章类型: Journal Article
    Manual segmentation poses a time-consuming challenge for disease quantification, therapy evaluation, treatment planning, and outcome prediction. Convolutional neural networks (CNNs) hold promise in accurately identifying tumor locations and boundaries in PET scans. However, a major hurdle is the extensive amount of supervised and annotated data necessary for training. To overcome this limitation, this study explores semi-supervised approaches utilizing unlabeled data, specifically focusing on PET images of diffuse large B-cell lymphoma (DLBCL) and primary mediastinal large B-cell lymphoma (PMBCL) obtained from two centers. We considered 2-[18F]FDG PET images of 292 patients PMBCL (n = 104) and DLBCL (n = 188) (n = 232 for training and validation, and n = 60 for external testing). We harnessed classical wisdom embedded in traditional segmentation methods, such as the fuzzy clustering loss function (FCM), to tailor the training strategy for a 3D U-Net model, incorporating both supervised and unsupervised learning approaches. Various supervision levels were explored, including fully supervised methods with labeled FCM and unified focal/Dice loss, unsupervised methods with robust FCM (RFCM) and Mumford-Shah (MS) loss, and semi-supervised methods combining FCM with supervised Dice loss (MS + Dice) or labeled FCM (RFCM + FCM). The unified loss function yielded higher Dice scores (0.73 ± 0.11; 95% CI 0.67-0.8) than Dice loss (p value < 0.01). Among the semi-supervised approaches, RFCM + αFCM (α = 0.3) showed the best performance, with Dice score of 0.68 ± 0.10 (95% CI 0.45-0.77), outperforming MS + αDice for any supervision level (any α) (p < 0.01). Another semi-supervised approach with MS + αDice (α = 0.2) achieved Dice score of 0.59 ± 0.09 (95% CI 0.44-0.76) surpassing other supervision levels (p < 0.01). Given the time-consuming nature of manual delineations and the inconsistencies they may introduce, semi-supervised approaches hold promise for automating medical imaging segmentation workflows.






  • 文章类型: Journal Article
    UNASSIGNED: Dynamic functional network connectivity (dFNC) analysis of resting state functional magnetic resonance imaging data has yielded insights into many neurological and neuropsychiatric disorders. A common dFNC analysis approach uses hard clustering methods like k-means clustering to assign samples to states that summarize network dynamics. However, hard clustering methods obscure network dynamics by assuming (1) that all samples within a cluster are equally like their assigned centroids and (2) that samples closer to one another in the data space than to their centroids are well-represented by their centroids. In addition, it can be hard to compare subjects, as in some cases an individual may not manifest a state strongly enough to enter a hard cluster. Approaches that allow a dimensional approach to connectivity patterns (e.g., fuzzy clustering) can mitigate these issues. In this study, we present an explainable fuzzy clustering framework by combining fuzzy c-means clustering with several explainability metrics and novel summary features.
    UNASSIGNED: We apply our framework for schizophrenia (SZ) default mode network analysis. Namely, we extract dFNC from individuals with SZ and controls, identify 5 dFNC states, and characterize the dFNC features most crucial to those states with a new perturbation-based clustering explainability approach. We then extract several features typically used in hard clustering and further present a variety of unique features specially designed for use with fuzzy clustering to quantify state dynamics. We examine differences in those features between individuals with SZ and controls and further search for relationships between those features and SZ symptom severity.
    UNASSIGNED: Importantly, we find that individuals with SZ spend more time in states of moderate anticorrelation between the anterior and posterior cingulate cortices and strong anticorrelation between the precuneus and anterior cingulate cortex. We further find that individuals with SZ tend to transition more rapidly than controls between low-magnitude and high-magnitude dFNC states.
    UNASSIGNED: We present a novel dFNC analysis framework and use it to identify effects of SZ upon network dynamics. Given the ease of implementing our framework and its enhanced insight into network dynamics, it has great potential for use in future dFNC studies.






  • 文章类型: Journal Article
    Accurate short-term load forecasting (STLF) is essential for power grid systems to ensure reliability, security and cost efficiency. Thanks to advanced smart sensor technologies, time-series data related to power load can be captured for STLF. Recent research shows that deep neural networks (DNNs) are capable of achieving accurate STLP since they are effective in predicting nonlinear and complicated time-series data. To perform STLP, existing DNNs use time-varying dynamics of either past load consumption or past power correlated features such as weather, meteorology or date. However, the existing DNN approaches do not use the time-invariant features of users, such as building spaces, ages, isolation material, number of building floors or building purposes, to enhance STLF. In fact, those time-invariant features are correlated to user load consumption. Integrating time-invariant features enhances STLF. In this paper, a fuzzy clustering-based DNN is proposed by using both time-varying and time-invariant features to perform STLF. The fuzzy clustering first groups users with similar time-invariant behaviours. DNN models are then developed using past time-varying features. Since the time-invariant features have already been learned by the fuzzy clustering, the DNN model does not need to learn the time-invariant features; therefore, a simpler DNN model can be generated. In addition, the DNN model only learns the time-varying features of users in the same cluster; a more effective learning can be performed by the DNN and more accurate predictions can be achieved. The performance of the proposed fuzzy clustering-based DNN is evaluated by performing STLF, where both time-varying features and time-invariant features are included. Experimental results show that the proposed fuzzy clustering-based DNN outperforms the commonly used long short-term memory networks and convolution neural networks.






  • 文章类型: Journal Article
    Cluster assignment is vital to analyzing single-cell RNA sequencing (scRNA-seq) data to understand high-level biological processes. Deep learning-based clustering methods have recently been widely used in scRNA-seq data analysis. However, existing deep models often overlook the interconnections and interactions among network layers, leading to the loss of structural information within the network layers. Herein, we develop a new self-supervised clustering method based on an adaptive multi-scale autoencoder, called scAMAC. The self-supervised clustering network utilizes the Multi-Scale Attention mechanism to fuse the feature information from the encoder, hidden and decoder layers of the multi-scale autoencoder, which enables the exploration of cellular correlations within the same scale and captures deep features across different scales. The self-supervised clustering network calculates the membership matrix using the fused latent features and optimizes the clustering network based on the membership matrix. scAMAC employs an adaptive feedback mechanism to supervise the parameter updates of the multi-scale autoencoder, obtaining a more effective representation of cell features. scAMAC not only enables cell clustering but also performs data reconstruction through the decoding layer. Through extensive experiments, we demonstrate that scAMAC is superior to several advanced clustering and imputation methods in both data clustering and reconstruction. In addition, scAMAC is beneficial for downstream analysis, such as cell trajectory inference. Our scAMAC model codes are freely available at https://github.com/yancy2024/scAMAC.






  • 文章类型: Journal Article
    Current methods for measuring black carbon aerosol (BC) by optical methods apportion BC to fossil fuel and wood combustion. However, these results are aggregated: local and non-local combustion sources are lumped together. The spatial apportioning of carbonaceous aerosol sources is challenging in remote or suburban areas because non-local sources may be significant. Air quality modeling would require highly accurate emission inventories and unbiased dispersion models to quantify such apportionment. We propose FUSTA (FUzzy SpatioTemporal Apportionment) methodology for analyzing aethalometer results for equivalent black carbon coming from fossil fuel (eBCff) and wood combustion (eBCwb). We applied this methodology to ambient measurements at three suburban sites around Santiago, Chile, in the winter season 2021. FUSTA results showed that local sources contributed ∼80% to eBCff and eBCwb in all sites. By using PM2.5 - eBCff and PM2.5 - eBCwb scatterplots for each fuzzy cluster (or source) found by FUSTA, the estimated lower edge lines showed distinctive slopes in each measurement site. These slopes were larger for non-local sources (aged aerosols) than for local ones (fresh emissions) and were used to apportion combustion PM2.5 in each site. In sites Colina, Melipilla and San Jose de Maipo, fossil fuel combustion contributions to PM2.5 were 26 % (15.9 μg m-3), 22 % (9.9 μg m-3), and 22 % (7.8 μg m-3), respectively. Wood burning contributions to PM2.5 were 22 % (13.4 μg m-3), 19 % (8.9 μg m-3) and 22% (7.3 μg m-3), respectively. This methodology generates a joint source apportionment of eBC and PM2.5, which is consistent with available chemical speciation data for PM2.5 in Santiago.






  • 文章类型: Journal Article
    The process of using robotic technology to examine underwater systems is still a difficult undertaking because the majority of automated activities lack network connectivity. Therefore, the suggested approach finds the main hole in undersea systems and fills it using robotic automation. In the predicted model, an analytical framework is created to operate the robot within predetermined areas while maximizing communication ranges. Additionally, a clustering algorithm with a fuzzy membership function is implemented, allowing the robots to advance in accordance with predefined clusters and arrive at their starting place within a predetermined amount of time. A cluster node is connected in each clustered region and provides the central control center with the necessary data. The weights are evenly distributed, and the designed robotic system is installed to prevent an uncontrolled operational state. Five different scenarios are used to test and validate the created model, and in each case, the proposed method is found to be superior to the current methodology in terms of range, energy, density, time periods, and total metrics of operation.





