Feature Extraction

特征提取
  • 文章类型: Journal Article
    深度学习深刻影响了各个领域,特别是医学图像分析。该领域的传统迁移学习方法依赖于在特定领域的医学数据集上预训练的模型,这限制了它们的通用性和可访问性。在这项研究中,我们提出了一个叫做真实世界特征迁移学习的新框架,它利用最初在大规模通用数据集如ImageNet上训练的骨干模型。与从头开始训练的模型相比,我们评估了这种方法的有效性和鲁棒性,专注于对X射线图像中的肺炎进行分类的任务。我们的实验,其中包括将灰度图像转换为RGB格式,证明了真实世界的特征迁移学习在各种性能指标上始终优于传统的训练方法。这一进步有可能通过利用从通用预训练模型学习的丰富特征表示来加速医学成像中的深度学习应用。所提出的方法克服了特定领域预训练模型的局限性,从而加速医疗诊断和医疗保健领域的创新。从数学的角度来看,我们形式化现实世界的特征迁移学习的概念,并提供了一个严格的数学公式的问题。我们的实验结果提供了支持这种方法有效性的经验证据,为进一步的理论分析和探索奠定基础。这项工作有助于更广泛地理解跨域的特征可转移性,并对开发准确有效的医学图像分析模型具有重要意义。即使在资源受限的环境中。
    Deep learning has profoundly influenced various domains, particularly medical image analysis. Traditional transfer learning approaches in this field rely on models pretrained on domain-specific medical datasets, which limits their generalizability and accessibility. In this study, we propose a novel framework called real-world feature transfer learning, which utilizes backbone models initially trained on large-scale general-purpose datasets such as ImageNet. We evaluate the effectiveness and robustness of this approach compared to models trained from scratch, focusing on the task of classifying pneumonia in X-ray images. Our experiments, which included converting grayscale images to RGB format, demonstrate that real-world-feature transfer learning consistently outperforms conventional training approaches across various performance metrics. This advancement has the potential to accelerate deep learning applications in medical imaging by leveraging the rich feature representations learned from general-purpose pretrained models. The proposed methodology overcomes the limitations of domain-specific pretrained models, thereby enabling accelerated innovation in medical diagnostics and healthcare. From a mathematical perspective, we formalize the concept of real-world feature transfer learning and provide a rigorous mathematical formulation of the problem. Our experimental results provide empirical evidence supporting the effectiveness of this approach, laying the foundation for further theoretical analysis and exploration. This work contributes to the broader understanding of feature transferability across domains and has significant implications for the development of accurate and efficient models for medical image analysis, even in resource-constrained settings.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    由于基因组序列技术,未注释的蛋白质序列的数量正在爆炸性增加。对蛋白质注释的蛋白质功能的更全面理解需要发现无法从常规方法捕获的新特征。深度学习可以从输入数据中提取重要特征,并根据特征预测蛋白质功能。这里,3种深度学习模型生成的蛋白质特征向量使用集成梯度进行分析,以探索氨基酸位点的重要特征。作为一个案例研究,利用这些模型建立了UbiD酶的预测和特征提取模型。从模型中提取的重要氨基酸残基与二级结构不同,已知UbiD信息的保守区域和活性位点。有趣的是,根据模型和序列的类型,UbiD序列中的不同氨基酸残基被认为是重要的因素。与其他模型相比,Transformer模型专注于更具体的区域。这些结果表明,每个深度学习模型从现有知识的不同方面理解蛋白质特征,并有可能发现蛋白质功能的新规律。本研讨将有助于为其他注解卵白提取新的卵白质特点。
    The number of unannotated protein sequences is explosively increasing due to genome sequence technology. A more comprehensive understanding of protein functions for protein annotation requires the discovery of new features that cannot be captured from conventional methods. Deep learning can extract important features from input data and predict protein functions based on the features. Here, protein feature vectors generated by 3 deep learning models are analyzed using Integrated Gradients to explore important features of amino acid sites. As a case study, prediction and feature extraction models for UbiD enzymes were built using these models. The important amino acid residues extracted from the models were different from secondary structures, conserved regions and active sites of known UbiD information. Interestingly, the different amino acid residues within UbiD sequences were regarded as important factors depending on the type of models and sequences. The Transformer models focused on more specific regions than the other models. These results suggest that each deep learning model understands protein features with different aspects from existing knowledge and has the potential to discover new laws of protein functions. This study will help to extract new protein features for the other protein annotations.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    无监督聚类算法在生态学和保护中被广泛用于对动物声音进行分类,而且在基础生物声学研究中也提供了一些优势。因此,重要的是克服现有的挑战。一种常见的做法是一维地提取发声的声学特征,仅提取整个发声的给定特征的平均值。用调频发声,其声学特征可以随着时间的推移而改变,这可能导致表征不足。是否正确设置了必要的参数,并且获得的聚类结果是否可靠地对随后的发声进行了分类,通常还不清楚。提供的软件,CASE,旨在克服这些挑战。建立和新的无监督聚类方法(社区检测,亲和繁殖,HDBSCAN,和模糊聚类)结合各种分类器(k-最近邻,动态时间扭曲,和互相关)使用不同转化的动物发声。将这些方法与预定义的聚类进行比较,以确定它们的优缺点。此外,提出了一种多维数据转换程序,可以更好地表示多个声学特征的过程。结果表明,尤其是调频发声,与一维特征提取相比,聚类更适用于多维特征提取。多维空间中发声的表征和聚类为未来的生物声学研究提供了巨大的潜力。软件CASE包括开发的多维特征提取方法,以及所有使用的聚类方法。它允许将几种聚类算法快速应用于一个数据集以比较它们的结果并基于它们的一致性来验证它们的可靠性。此外,软件CASE自动确定大多数必要参数的最佳值。为了利用这些好处,该软件的情况下提供免费下载。
    Unsupervised clustering algorithms are widely used in ecology and conservation to classify animal sounds, but also offer several advantages in basic bioacoustics research. Consequently, it is important to overcome the existing challenges. A common practice is extracting the acoustic features of vocalizations one-dimensionally, only extracting an average value for a given feature for the entire vocalization. With frequency-modulated vocalizations, whose acoustic features can change over time, this can lead to insufficient characterization. Whether the necessary parameters have been set correctly and the obtained clustering result reliably classifies the vocalizations subsequently often remains unclear. The presented software, CASE, is intended to overcome these challenges. Established and new unsupervised clustering methods (community detection, affinity propagation, HDBSCAN, and fuzzy clustering) are tested in combination with various classifiers (k-nearest neighbor, dynamic time-warping, and cross-correlation) using differently transformed animal vocalizations. These methods are compared with predefined clusters to determine their strengths and weaknesses. In addition, a multidimensional data transformation procedure is presented that better represents the course of multiple acoustic features. The results suggest that, especially with frequency-modulated vocalizations, clustering is more applicable with multidimensional feature extraction compared with one-dimensional feature extraction. The characterization and clustering of vocalizations in multidimensional space offer great potential for future bioacoustic studies. The software CASE includes the developed method of multidimensional feature extraction, as well as all used clustering methods. It allows quickly applying several clustering algorithms to one data set to compare their results and to verify their reliability based on their consistency. Moreover, the software CASE determines the optimal values of most of the necessary parameters automatically. To take advantage of these benefits, the software CASE is provided for free download.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    未经证实:早期胃癌(EGC)的迅速诊断对于提高患者生存率至关重要。然而,以前的大多数计算机辅助诊断(CAD)系统都没有具体化或解释诊断理论。我们旨在在放大图像增强内窥镜检查(M-IEE)下为EGC开发一种名为ENDOANGEL-LA(逻辑拟人化)的逻辑拟人化人工智能(AI)诊断系统。
    UASSIGNED:我们回顾性地收集了武汉大学人民医院的692名患者和1897张照片,武汉,2016年11月15日至2019年5月7日之间的中国。将图像随机分配给患者的训练集和测试集,比率约为4:1。ENDOANGEL-LA是基于特征提取结合定量分析开发的,深度学习(DL),机器学习(ML)将11个诊断特征指标集成到7个ML模型中,并选择了最优模型。评估了ENDOANGEL-LA的性能,并与内窥镜医师和唯一的DL模型进行了比较。还比较了内窥镜医师对ENDOANGEL-LA和唯一DL模型的满意度。
    UNASSIGNED:随机森林表现出最佳性能,分界线和微结构密度是最重要的特征指标。图像中ENDOANGEL-LA的准确性(88.76%)明显高于单独的DL模型(82.77%,p=0.034)和新手(71.63%,p<0.001),与专家(88.95%)相当。视频中ENDOANGEL-LA的准确率(87.00%)明显高于单机DL模型(68.00%,p<0.001),与内窥镜医师(89.00%)相当。精度(87.45%,p<0.001)的新手在ENDOANGEL-LA的协助下显著提高。内窥镜医师对ENDOANGEL-LA的满意度明显高于单独的DL模型。
    UNASSIGNED:我们建立了一个逻辑拟人系统(ENDOANGEL-LA),可以在M-IEE下通过诊断理论具体化来诊断EGC,精度高,和良好的解释能力。它有可能增加内窥镜医师和CADs之间的互动,提高内窥镜医师对CADs的信任度和可接受性。
    UNASSIGNED:这项工作得到了湖北省重大科技创新项目(2018-916-000-008)和中央大学基础研究基金(2042021kf0084)的部分资助。
    UNASSIGNED: Prompt diagnosis of early gastric cancer (EGC) is crucial for improving patient survival. However, most previous computer-aided-diagnosis (CAD) systems did not concretize or explain diagnostic theories. We aimed to develop a logical anthropomorphic artificial intelligence (AI) diagnostic system named ENDOANGEL-LA (logical anthropomorphic) for EGCs under magnifying image enhanced endoscopy (M-IEE).
    UNASSIGNED: We retrospectively collected data for 692 patients and 1897 images from Renmin Hospital of Wuhan University, Wuhan, China between Nov 15, 2016 and May 7, 2019. The images were randomly assigned to the training set and test set by patient with a ratio of about 4:1. ENDOANGEL-LA was developed based on feature extraction combining quantitative analysis, deep learning (DL), and machine learning (ML). 11 diagnostic feature indexes were integrated into seven ML models, and an optimal model was selected. The performance of ENDOANGEL-LA was evaluated and compared with endoscopists and sole DL models. The satisfaction of endoscopists on ENDOANGEL-LA and sole DL model was also compared.
    UNASSIGNED: Random forest showed the best performance, and demarcation line and microstructures density were the most important feature indexes. The accuracy of ENDOANGEL-LA in images (88.76%) was significantly higher than that of sole DL model (82.77%, p = 0.034) and the novices (71.63%, p<0.001), and comparable to that of the experts (88.95%). The accuracy of ENDOANGEL-LA in videos (87.00%) was significantly higher than that of the sole DL model (68.00%, p<0.001), and comparable to that of the endoscopists (89.00%). The accuracy (87.45%, p<0.001) of novices with the assistance of ENDOANGEL-LA was significantly improved. The satisfaction of endoscopists on ENDOANGEL-LA was significantly higher than that of sole DL model.
    UNASSIGNED: We established a logical anthropomorphic system (ENDOANGEL-LA) that can diagnose EGC under M-IEE with diagnostic theory concretization, high accuracy, and good explainability. It has the potential to increase interactivity between endoscopists and CADs, and improve trust and acceptability of endoscopists for CADs.
    UNASSIGNED: This work was partly supported by a grant from the Hubei Province Major Science and Technology Innovation Project (2018-916-000-008) and the Fundamental Research Funds for the Central Universities (2042021kf0084).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    未经证实:心房颤动(AF)是最常见的心血管疾病之一,其无症状趋势使房颤检测具有挑战性。机器和深度学习方法通常用于AF检测。
    UNASSIGNED:这项研究的目的是评估卷积神经网络(CNN)和随机森林(RF)机器学习模型提供的信息,以进行AF分类。
    UNASSIGNED:我们手动提取了166个时频域以及线性和非线性特征,将单导联心电图(ECG)分类为正常,AF,other,或嘈杂的窦性心律。我们使用射频模型中使用的遗传算法选择了56个鲁棒特征的子集。在另一项研究中,一维,在原始ECG节律上设计了12层CNN。来自CNN的输出层的四个特征和来自完全连接层的128个特征被独立地探索用于分类。这些模型在8,528个ECG上进行了训练和内部验证,并在包含3,658个ECG的隐藏数据集上进行了外部验证。接下来,我们分析了工程和CNN学习特征之间的相关性.
    UNASSIGNED:使用56个工程特征训练的RF分类器对于正常,F1得分为0.91、0.78和0.72,AF,和其他节奏,分别。然而,支持向量机和CNN模型的集合分别导致F1得分为0.92、0.87和0.80。
    UNASSIGNED:我们探索了各种功能和机器学习模型,以使用短(9-61秒)单导联ECG记录来识别AF节律。我们的结果表明,提出的CNN模型为AF分类提取了独特的特征。
    UNASSIGNED: Atrial fibrillation (AF) is one of the most common cardiovascular problems, and its asymptomatic tendency makes AF detection challenging. Machine and deep learning methods are commonly used in AF detection.
    UNASSIGNED: The purpose of this study was to evaluate the information provided by convolutional neural network (CNN) and random forest (RF) machine learning models for AF classification.
    UNASSIGNED: We manually extracted 166 time-frequency domains and linear and nonlinear features to classify single-lead electrocardiograms (ECGs) as normal, AF, other, or noisy sinus rhythms. We selected a subset of 56 robust features using a genetic algorithm that was used in the RF model. In a separate study, a 1-dimensional, 12-layer CNN was designed on the raw ECG rhythms. Four features from the output layer and 128 features from the fully connected layer of CNN were explored independently for classification. The models were trained and internally validated on 8,528 ECGs and externally validated on a hidden dataset containing 3,658 ECGs. Next,we analyzed the correlation between engineered and CNN-learned features.
    UNASSIGNED: An RF classifier trained with 56-engineered features resulted in an F1 score of 0.91, 0.78, and 0.72 for normal, AF, and other rhythms, respectively. However, an ensemble of support vector machine and the CNN model resulted in an F1 score of 0.92, 0.87, and 0.80, respectively.
    UNASSIGNED: We explored various features and machine learning models to identify AF rhythms using short (9-61 seconds) single-lead ECG recordings. Our results showed that the proposed CNN model abstracted distinctive features for AF classification.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    步行引起的波动对室内气流和污染物扩散有重大影响。这项研究开发了一种方法来量化通风系统在控制步行引起的波动控制中的鲁棒性。实验是在具有四种不同通风系统的全尺寸室内进行的:天花板供应和侧返回(CS),天花板供应和天花板返回(CC),侧供应和天花板返回(SC),以及侧供应和侧返回(SS)。测量的温度,流量和污染物场数据(1)通过FFT滤波或小波变换进行去噪;(2)通过高斯函数拟合;(3)针对范围和时间尺度扰动提取特征;然后(4)使用本研究中开发的无量纲方程计算不同通风系统的范围尺度和时间尺度鲁棒性。FFT滤波和小波变换的选择过程,FFT滤波器截止频率,小波函数,并讨论了分解层,以及小波去噪的阈值,如果步行频率或采样频率与其他研究中的不同,则可以进行相应调整。结果表明,对于流场和污染物场,使用通风系统可以将范围范围的鲁棒性提高19.7%-39.4%和10.0%-38.8%,SS系统的鲁棒性分别比其他三个通风系统高7.0%-25.7%。然而,所有四种通风系统在控制时间尺度干扰方面的作用都非常有限。
    Walking-induced fluctuations have a significant influence on indoor airflow and pollutant dispersion. This study developed a method to quantify the robustness of ventilation systems in the control of walking-induced fluctuation control. Experiments were conducted in a full-scale chamber with four different kinds of ventilation systems: ceiling supply and side return (CS), ceiling supply and ceiling return (CC), side supply and ceiling return (SC), and side supply and side return (SS). The measured temperature, flow and pollutant field data was (1) denoised by FFT filtering or wavelet transform; (2) fitted by a Gaussian function; (3) feature-extracted for the range and time scale disturbance; and then (4) used to calculate the range scale and time scale robustness for different ventilation systems with dimensionless equations developed in this study. The selection processes for FFT filtering and wavelet transform, FFT filter cut-off frequency, wavelet function, and decomposition layers are also discussed, as well as the threshold for wavelet denoising, which can be adjusted accordingly if the walking frequency or sampling frequency differs from that in other studies. The results show that for the flow and pollutant fields, the use of a ventilation system can increase the range scale robustness by 19.7%-39.4% and 10.0%-38.8%, respectively; and the SS system was 7.0%-25.7% more robust than the other three ventilation systems. However, all four kinds of ventilation systems had a very limited effect in controlling the time scale disturbance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Environmental quality data sets are typically imbalanced, because environmental pollution events are rarely observed in daily life. Prediction of imbalanced data sets is a major challenge in machine learning. Our recent work has shown deep cascade forest (DCF), as a base learning model, is promising to be recommended for environmental quality prediction. Although some traditional models were improved by introducing the cost matrix, little is known about whether cost matrix could enhance the prediction performance of DCF. Additionally, feature extraction is also an important way to potentially improve the model\'s ability to predict the imbalanced data. Here, we developed two novelty learning models based on DCF: cost-sensitive DCF (CS-DCF) and DCF that combines unsupervised learning models and greedy methods (USM-DCF-G). Subsequently, CS-DCF and USM-DCF-G were successfully verified by an imbalanced drinking water quality data set. Our data presented both CS-DCF and USM-DCF-G show better prediction performance than that of DCF alone did. In particular, USM-DCF-G shows the best performance with the highest F1-score (95.12 ± 2.56%), after feature extraction and selection by using unsupervised learning models and greedy methods. Thus, the two learning models, especially USM-DCF-G, were promising learning models to address environmental imbalanced issues and accurately predict environmental quality.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    This case study provides feasibility analysis of adapting Spiking Neural Networks (SNN) based Structural Health Monitoring (SHM) system to explore low-cost solution for inspection of structural health of damaged buildings which survived after natural disaster that is, earthquakes or similar activities. Various techniques are used to detect the structural health status of a building for performance benchmarking, including different feature extraction methods and classification techniques (e.g., SNN, K-means and artificial neural network etc.). The SNN is utilized to process the sensory data generated from full-scale seven-story reinforced concrete building to verify the classification performances. Results show that the proposed SNN hardware has high classification accuracy, reliability, longevity and low hardware area overhead.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    手语识别系统有助于聋人之间的交流,听力受损的人,和扬声器。表面肌电图(sEMG)是已经得到越来越多研究并且可以用作这些系统输入的信号类型之一。这项工作介绍了使用从臂章获得的sEMG识别巴西手语(Libras)的一组字母手势。仅sEMG信号用作输入。使用MyoTM臂章获取来自12名受试者的信号,以获取Libras字母的26个符号。此外,由于sEMG有几个信号处理参数,分割的影响,特征提取,在模式识别的每个步骤都考虑了分类。在分割中,窗口长度和存在四个水平的重叠率进行了分析,以及每个功能的贡献,文学特征集,以及针对不同分类器提出的新特征集。我们发现重叠率对这项任务有很大影响。对于以下因素,精度达到了99%左右:1.75s的片段,重叠率为12.5%;建议的四个特征集;和随机森林(RF)分类器。
    Sign Language recognition systems aid communication among deaf people, hearing impaired people, and speakers. One of the types of signals that has seen increased studies and that can be used as input for these systems is surface electromyography (sEMG). This work presents the recognition of a set of alphabet gestures from Brazilian Sign Language (Libras) using sEMG acquired from an armband. Only sEMG signals were used as input. Signals from 12 subjects were acquired using a MyoTM armband for the 26 signs of the Libras alphabet. Additionally, as the sEMG has several signal processing parameters, the influence of segmentation, feature extraction, and classification was considered at each step of the pattern recognition. In segmentation, window length and the presence of four levels of overlap rates were analyzed, as well as the contribution of each feature, the literature feature sets, and new feature sets proposed for different classifiers. We found that the overlap rate had a high influence on this task. Accuracies in the order of 99% were achieved for the following factors: segments of 1.75 s with a 12.5% overlap rate; the proposed set of four features; and random forest (RF) classifiers.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Scene classification relying on images is essential in many systems and applications related to remote sensing. The scientific interest in scene classification from remotely collected images is increasing, and many datasets and algorithms are being developed. The introduction of convolutional neural networks (CNN) and other deep learning techniques contributed to vast improvements in the accuracy of image scene classification in such systems. To classify the scene from areal images, we used a two-stream deep architecture. We performed the first part of the classification, the feature extraction, using pre-trained CNN that extracts deep features of aerial images from different network layers: the average pooling layer or some of the previous convolutional layers. Next, we applied feature concatenation on extracted features from various neural networks, after dimensionality reduction was performed on enormous feature vectors. We experimented extensively with different CNN architectures, to get optimal results. Finally, we used the Support Vector Machine (SVM) for the classification of the concatenated features. The competitiveness of the examined technique was evaluated on two real-world datasets: UC Merced and WHU-RS. The obtained classification accuracies demonstrate that the considered method has competitive results compared to other cutting-edge techniques.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号