k-means

K - means
  • 文章类型: Journal Article
    Identifying clusters of physical activity (PA) from accelerometer data is important to identify levels of sedentary behaviour and physical activity associated with risks of serious health conditions and time spent engaging in healthy PA. Unsupervised machine learning models can capture PA in everyday free-living activity without the need for labelled data. However, there is scant research addressing the selection of features from accelerometer data. The aim of this systematic review is to summarise feature selection techniques applied in studies concerned with unsupervised machine learning of accelerometer-based device obtained physical activity, and to identify commonly used features identified through these techniques. Feature selection methods can reduce the complexity and computational burden of these models by removing less important features and assist in understanding the relative importance of feature sets and individual features in clustering.
    We conducted a systematic search of Pubmed, Medline, Google Scholar, Scopus, Arxiv and Web of Science databases to identify studies published before January 2021 which used feature selection methods to derive PA clusters using unsupervised machine learning models.
    A total of 13 studies were eligible for inclusion within the review. The most popular feature selection techniques were Principal Component Analysis (PCA) and correlation-based methods, with k-means frequently used in clustering accelerometer data. Cluster quality evaluation methods were diverse, including both external (e.g. cluster purity) or internal evaluation measures (silhouette score most frequently). Only four of the 13 studies had more than 25 participants and only four studies included two or more datasets.
    There is a need to assess multiple feature selection methods upon large cohort data consisting of multiple (3 or more) PA datasets. The cut-off criteria e.g. number of components, pairwise correlation value, explained variance ratio for PCA, etc. should be expressly stated along with any hyperparameters used in clustering.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    Cancer, a disease of cells, causes cell growth which differs from normal cell growth ratio, this cell growth spreads in the human body and kills the body cells. Breast cancer, it’s a highly heterogeneous disease and western women commonly witness this. Mammography, a pre-screening X-ray based check is used to diagnose woman’s breast cancer. This basic test mode helps in identifying breast cancer at early stage and this early stage detection would support in recovering more number of women from this serious disease. Medical centres deputed highly skilled radiologists and they were given the responsibility of analysing this mammography results but still human errors are inevitable. An error frequency ratio is high when radiologists exhausted in their analysis task and leads variations in either observations ie., internal or external observation. Also, quality of the image plays vital role in Mammographic sensitivity and leads to variation. Several automation processes were tried in streamlining and standardising diagnosis analysis process and quality of breast cancer images were improved. This paper inducts a two way mode algorithm for grouping of breast cancer images to 1. benign (tumour growing, but not dangerous) and 2. malignant (cannot be controlled, it causes death) classes. Two-way mode data mining algorithms are used due to thinly dispersed distribution of abnormal mammograms. First type algorithm is k-means algorithm, which regroups the given data elements into clusters (ie., prioritized by the users). Second type algorithm is Support Vector Machine (SVM), which is used to identify the most suitable function which differentiates the members based on the training data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号