Feature Extraction

特征提取
  • 文章类型: Journal Article
    背景:尽管数据重用提供了许多机会,它的实施存在许多困难,和原始数据不能直接重用。信息在源数据库中并不总是直接可用的,并且之后需要使用用于定义算法的原始数据来计算。
    目的:本文的主要目的是对进行回顾性观察研究时在特征提取过程中所需的步骤和转换进行标准化描述。次要目标是确定如何将特征存储在数据仓库的模式中。
    方法:本研究涉及以下3个主要步骤:(1)收集与特征提取相关的研究案例,并基于数据的自动和二次使用;(2)原始数据的标准化描述,steps,和转换,(3)在观察医学结果伙伴关系(OMOP)通用数据模型(CDM)中识别适当的表格以存储特征。
    结果:我们采访了来自3家法国大学医院和一个国家机构的10名研究人员,他们参与了8项回顾性和观察性研究。基于这些研究,出现了2个状态(轨道和特征)和2个转换(轨道定义和轨道聚合)。“轨道”是一个时间相关的信号或感兴趣的周期,由统计单位定义,一个值,和2个里程碑(开始事件和结束事件)。“特征”是与时间无关的高级信息,其维度与研究的统计单位相同,由标签和值定义。时间维度已隐含在变量的值或名称中。我们提出了2个表“TRACK”和“FEATURE”来存储特征提取中获得的变量,并扩展了OMOPCDM。
    结论:我们提出了对特征提取过程的标准化描述。该过程结合了轨道定义和轨道聚合的两个步骤。通过将特征提取分为这两个步骤,在轨道定义过程中管理了困难。轨道的标准化需要大量的数据专业知识,但允许应用无限数量的复杂转换。相反,轨道聚合是一个非常简单的操作,具有有限的可能性。对这些步骤的完整描述可以增强回顾性研究的可重复性。
    BACKGROUND: Despite the many opportunities data reuse offers, its implementation presents many difficulties, and raw data cannot be reused directly. Information is not always directly available in the source database and needs to be computed afterwards with raw data for defining an algorithm.
    OBJECTIVE: The main purpose of this article is to present a standardized description of the steps and transformations required during the feature extraction process when conducting retrospective observational studies. A secondary objective is to identify how the features could be stored in the schema of a data warehouse.
    METHODS: This study involved the following 3 main steps: (1) the collection of relevant study cases related to feature extraction and based on the automatic and secondary use of data; (2) the standardized description of raw data, steps, and transformations, which were common to the study cases; and (3) the identification of an appropriate table to store the features in the Observation Medical Outcomes Partnership (OMOP) common data model (CDM).
    RESULTS: We interviewed 10 researchers from 3 French university hospitals and a national institution, who were involved in 8 retrospective and observational studies. Based on these studies, 2 states (track and feature) and 2 transformations (track definition and track aggregation) emerged. \"Track\" is a time-dependent signal or period of interest, defined by a statistical unit, a value, and 2 milestones (a start event and an end event). \"Feature\" is time-independent high-level information with dimensionality identical to the statistical unit of the study, defined by a label and a value. The time dimension has become implicit in the value or name of the variable. We propose the 2 tables \"TRACK\" and \"FEATURE\" to store variables obtained in feature extraction and extend the OMOP CDM.
    CONCLUSIONS: We propose a standardized description of the feature extraction process. The process combined the 2 steps of track definition and track aggregation. By dividing the feature extraction into these 2 steps, difficulty was managed during track definition. The standardization of tracks requires great expertise with regard to the data, but allows the application of an infinite number of complex transformations. On the contrary, track aggregation is a very simple operation with a finite number of possibilities. A complete description of these steps could enhance the reproducibility of retrospective studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Automatic classification of glaucoma from fundus images is a vital diagnostic tool for Computer-Aided Diagnosis System (CAD). In this work, a novel fused feature extraction technique and ensemble classifier fusion is proposed for diagnosis of glaucoma. The proposed method comprises of three stages. Initially, the fundus images are subjected to preprocessing followed by feature extraction and feature fusion by Intra-Class and Extra-Class Discriminative Correlation Analysis (IEDCA). The feature fusion approach eliminates between-class correlation while retaining sufficient Feature Dimension (FD) for Correlation Analysis (CA). The fused features are then fed to the classifiers namely Support Vector Machine (SVM), Random Forest (RF) and K-Nearest Neighbor (KNN) for classification individually. Finally, Classifier fusion is also designed which combines the decision of the ensemble of classifiers based on Consensus-based Combining Method (CCM). CCM based Classifier fusion adjusts the weights iteratively after comparing the outputs of all the classifiers. The proposed fusion classifier provides a better improvement in accuracy and convergence when compared to the individual algorithms. A classification accuracy of 99.2% is accomplished by the two-level hybrid fusion approach. The method is evaluated on the public datasets High Resolution Fundus (HRF) and DRIVE datasets with cross dataset validation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号