Feature reduction

特征约简
  • 文章类型: Journal Article
    种子储存不当可能会损害农业生产力,导致作物产量下降。因此,播种前评估种子活力至关重要。尽管存在许多评估种子条件的技术,这项研究利用高光谱成像(HSI)技术作为一项创新,快速,干净,和精确的无损检测方法。该研究旨在确定最有效的西瓜种子分类模型。最初,将购买的西瓜种子分为两组:一组在脱水机中在40°C下灭菌36小时,而另一批在有利的条件下储存。使用HSI和400至1000nm的电荷耦合器件相机捕获西瓜子的光谱图像,并测量所有样品的分割区域。应用预处理技术和波长选择方法来管理光谱数据工作量,其次是支持向量机(SVM)模型的实现。初始的混合SVM模型实现了100%的预测准确率,测试集精度为92.33%。随后,引入人工蜂群(ABC)优化模型以提高模型精度。结果表明,使用内核参数(c,g)分别设置为13.17和0.01,运行时间为4.19328s,数据集的训练和评估达到了100%的准确率。因此,利用HSI技术结合PCA-ABC-SVM模型检测不同的西瓜种子是实用的。因此,这些发现引入了一种准确预测种子活力的新技术,用于农业工业多光谱成像。实际应用:确定种子状况的传统方法主要强调美学,依靠主观评估,是耗时的,并且需要大量的劳动力。另一方面,采用HSI技术作为绿色技术来缓解上述问题。这项工作通过增强辨别各种类型的种子和农作物产品的能力,为工业多光谱成像领域做出了重大贡献。
    The improper storage of seeds can potentially compromise agricultural productivity, leading to reduced crop yields. Therefore, assessing seed viability before sowing is of paramount importance. Although numerous techniques exist for evaluating seed conditions, this research leveraged hyperspectral imaging (HSI) technology as an innovative, rapid, clean, and precise nondestructive testing method. The study aimed to determine the most effective classification model for watermelon seeds. Initially, purchased watermelon seeds were segregated into two groups: One underwent sterilization in a dehydrator machine at 40°C for 36 h, whereas the other batch was stored under favorable conditions. Watermelon seeds\' spectral images were captured using an HSI with a charge-coupled device camera ranging from 400 to 1000 nm, and the segmented regions of all samples were measured. Preprocessing techniques and wavelength selection methods were applied to manage spectral data workload, followed by the implementation of a support vector machine (SVM) model. The initial hybrid-SVM model achieved a predictive accuracy rate of 100%, with a test set accuracy of 92.33%. Subsequently, an artificial bee colony (ABC) optimization was introduced to enhance model precision. The results indicated that, with kernel parameters (c, g) set at 13.17 and 0.01, respectively, and a runtime of 4.19328 s, the training and evaluation of the dataset achieved an accuracy rate of 100%. Hence, it was practical to utilize HSI technology combined with the PCA-ABC-SVM model to detect different watermelon seeds. As a result, these findings introduce a novel technique for accurately forecasting seed viability, intended for use in agricultural industrial multispectral imaging. PRACTICAL APPLICATION: The traditional methods for determining the condition of seeds primarily emphasize aesthetics, rely on subjective assessment, are time-consuming, and require a lot of labor. On the other hand, HSI technology as green technology was employed to alleviate the aforementioned problems. This work significantly contributes to the field of industrial multispectral imaging by enhancing the capacity to discern various types of seeds and agricultural crop products.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    脑机接口(BCI)是获取大脑电活动并提供外部设备控制的系统。由于脑电图(EEG)是捕获大脑电活动的最简单的非侵入性方法,基于EEG的BCI是非常流行的设计。除了对四肢运动进行分类之外,最近的BCI研究集中在通过采用机器学习技术对同一只手上的手指运动进行分类的准确编码。最先进的研究有兴趣通过忽略大脑的空闲情况来编码五个手指运动(即,大脑不执行任何心理任务的状态)。这可能容易导致更多的误报,并大大降低分类性能,因此,BCI的表现。这项研究旨在提出一种更现实的系统,以从EEG信号中解码五个手指的运动和无心理任务(NoMT)情况。
    在这项研究中,利用了一种新颖的特征提取方法。使用通过固有时间尺度分解(ITD)计算的正确旋转分量(PRCs),最近已成功应用于不同的生物医学信号,提取用于分类的特征。随后,这些特征被应用于众所周知的分类器的输入及其不同的实现,以区分这六个类别。报告了在独立于受试者和依赖受试者的情况下获得的最高分类器性能。此外,检查了基于ANOVA的特征选择,以确定统计上显著的特征是否对分类器性能有影响.
    因此,集成学习分类器在测试分类器中达到了55.0%的最高准确率,和基于ANOVA的特征选择提高了分类器在基于EEG的BCI系统中对五指运动确定的性能。
    与类似研究相比,提出的实践在分类性能上实现了适度但显著的改进,尽管类的数量增加了一个(即,NoMT)。
    UNASSIGNED: Brain-computer interfaces (BCIs) are systems that acquire the brain\'s electrical activity and provide control of external devices. Since electroencephalography (EEG) is the simplest non-invasive method to capture the brain\'s electrical activity, EEG-based BCIs are very popular designs. Aside from classifying the extremity movements, recent BCI studies have focused on the accurate coding of the finger movements on the same hand through their classification by employing machine learning techniques. State-of-the-art studies were interested in coding five finger movements by neglecting the brain\'s idle case (i.e., the state that brain is not performing any mental tasks). This may easily cause more false positives and degrade the classification performances dramatically, thus, the performance of BCIs. This study aims to propose a more realistic system to decode the movements of five fingers and the no mental task (NoMT) case from EEG signals.
    UNASSIGNED: In this study, a novel praxis for feature extraction is utilized. Using Proper Rotational Components (PRCs) computed through Intrinsic Time Scale Decomposition (ITD), which has been successfully applied in different biomedical signals recently, features for classification are extracted. Subsequently, these features were applied to the inputs of well-known classifiers and their different implementations to discriminate between these six classes. The highest classifier performances obtained in both subject-independent and subject-dependent cases were reported. In addition, the ANOVA-based feature selection was examined to determine whether statistically significant features have an impact on the classifier performances or not.
    UNASSIGNED: As a result, the Ensemble Learning classifier achieved the highest accuracy of 55.0% among the tested classifiers, and ANOVA-based feature selection increases the performance of classifiers on five-finger movement determination in EEG-based BCI systems.
    UNASSIGNED: When compared with similar studies, proposed praxis achieved a modest yet significant improvement in classification performance although the number of classes was incremented by one (i.e., NoMT).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    肝脏肿瘤的准确分割是肝癌早期诊断的前提。分割网络以相同的尺度连续提取特征,不能适应计算机断层扫描(CT)中肝脏肿瘤体积的变化。因此,本文提出了一种用于肝脏肿瘤分割的多尺度特征注意网络(MS-FANet)。在MS-FANet的编码器中引入了新颖的残余注意力(RA)块和多尺度下采样(MAD),以充分学习可变的肿瘤特征并同时提取不同尺度的肿瘤特征。在特征缩减过程中引入了双路特征(DF)滤波器和密集上采样(DU),以减少有效特征,实现肝肿瘤的精确分割。在公共LiTS数据集和3DIRCADb数据集上,MS-FANet达到平均骰子的74.2%和78.0%,分别,优于大多数最先进的网络,这有力地证明了优秀的肝肿瘤分割性能和学习不同尺度特征的能力。
    Accurate segmentation of liver tumors is a prerequisite for early diagnosis of liver cancer. Segmentation networks extract features continuously at the same scale, which cannot adapt to the variation of liver tumor volume in computed tomography (CT). Hence, a multi-scale feature attention network (MS-FANet) for liver tumor segmentation is proposed in this paper. The novel residual attention (RA) block and multi-scale atrous downsampling (MAD) are introduced in the encoder of MS-FANet to sufficiently learn variable tumor features and extract tumor features at different scales simultaneously. The dual-path feature (DF) filter and dense upsampling (DU) are introduced in the feature reduction process to reduce effective features for the accurate segmentation of liver tumors. On the public LiTS dataset and 3DIRCADb dataset, MS-FANet achieved 74.2% and 78.0% of average Dice, respectively, outperforming most state-of-the-art networks, this strongly proves the excellent liver tumor segmentation performance and the ability to learn features at different scales.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    为了准确预测胶质母细胞瘤(GBM)患者放疗后的生存,我们通过一种新的多序列MRI特征构建方法,开发了一个基于子区域的生存预测框架.所提出的方法包括两个主要步骤:1)特征空间优化算法,以确定在多序列MRI和肿瘤子区域之间得出的最合适的匹配关系,对于使用多模态图像数据更为合理;2)采用基于聚类的特征捆绑构造算法,对提取的高维影像特征进行压缩,构造较小但有效的特征集,构建准确的预测模型。对于每个肿瘤分区,使用Pyradiomics从一个MRI序列中总共提取了680个影像组学特征.收集了另外71个几何特征和临床信息,导致8,231个极高维特征空间,以训练和评估1年的生存预测,和更具挑战性的总体生存预测。该框架是基于BraTS2020数据集中的98名GBM患者在5倍交叉验证下开发的,并在从同一数据集中随机选择的19名GBM患者的外部队列中进行测试。最后,我们确定了每个子区域与其对应的MRI序列之间的最佳匹配关系,所提出的特征捆绑和构建框架生成了235个特征的子集(8,231个特征中).基于子区域的生存预测框架在1年生存预测的训练和独立测试队列中分别实现了0.998和0.983的AUC。使用训练和验证队列的8,231个初始提取特征进行生存预测的AUC分别为0.940和0.923。最后,我们进一步构建了一个有效的堆叠结构集成回归器,以预测C指数为0.872的总生存期。拟议的基于分区的生存预测框架使我们能够更好地对患者进行分层,以进行GBM的个性化治疗。
    Aiming at accurate survival prediction of Glioblastoma (GBM) patients following radiation therapy, we developed a subregion-based survival prediction framework via a novel feature construction method on multi-sequence MRIs. The proposed method consists of two main steps: (1) a feature space optimization algorithm to determine the most appropriate matching relation derived between multi-sequence MRIs and tumor subregions, for using multimodal image data more reasonable; (2) a clustering-based feature bundling and construction algorithm to compress the high-dimensional extracted radiomic features and construct a smaller but effective set of features, for accurate prediction model construction. For each tumor subregion, a total of 680 radiomic features were extracted from one MRI sequence using Pyradiomics. Additional 71 geometric features and clinical information were collected resulting in an extreme high-dimensional feature space of 8231 to train and evaluate the survival prediction at 1 year, and the more challenging overall survival prediction. The framework was developed based on 98 GBM patients from the BraTS 2020 dataset under five-fold cross-validation, and tested on an external cohort of 19 GBM patients randomly selected from the same dataset. Finally, we identified the best matching relationship between each subregion and its corresponding MRI sequence, a subset of 235 features (out of 8231 features) were generated by the proposed feature bundling and construction framework. The subregion-based survival prediction framework achieved AUCs of 0.998 and 0.983 on the training and independent test cohort respectively for 1 year survival prediction, compared to AUCs of 0.940 and 0.923 for survival prediction using the 8231 initial extracted features for training and validation cohorts respectively. Finally, we further constructed an effective stacking structure ensemble regressor to predict the overall survival with the C-index of 0.872. The proposed subregion-based survival prediction framework allow us to better stratified patients towards personalized treatment of GBM.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    DNA合成在合成生物学中广泛用于构建和组装从短RBS到超长合成基因组的序列。许多序列特征,如GC含量和重复序列,已知会影响合成难度和随后的合成成本。此外,有潜在的序列特征,特别是序列的局部特征,这也可能影响DNA合成过程。对给定序列的合成难度的可靠预测对于降低成本很重要。但这仍然是一个挑战。在这项研究中,我们提出了一种新的自动机器学习(AutoML)方法来预测DNA合成难度,它的F1得分为0.930,优于当前最先进的模型。我们发现了在以前的方法中被忽略的局部序列特征,这也可能影响DNA合成的难度。此外,基于大肠杆菌菌株MG1655的十个基因的实验验证表明,我们的模型可以达到80%的准确率,这也比艺术更好。此外,为了方便最终用户,我们使用完全基于云的无服务器架构开发了云平台SCP4SSD。
    DNA synthesis is widely used in synthetic biology to construct and assemble sequences ranging from short RBS to ultra-long synthetic genomes. Many sequence features, such as the GC content and repeat sequences, are known to affect the synthesis difficulty and subsequently the synthesis cost. In addition, there are latent sequence features, especially local characteristics of the sequence, which might affect the DNA synthesis process as well. Reliable prediction of the synthesis difficulty for a given sequence is important for reducing the cost, but this remains a challenge. In this study, we propose a new automated machine learning (AutoML) approach to predict the DNA synthesis difficulty, which achieves an F1 score of 0.930 and outperforms the current state-of-the-art model. We found local sequence features that were neglected in previous methods, which might also affect the difficulty of DNA synthesis. Moreover, experimental validation based on ten genes of Escherichia coli strain MG1655 shows that our model can achieve an 80% accuracy, which is also better than the state of art. Moreover, we developed the cloud platform SCP4SSD using an entirely cloud-based serverless architecture for the convenience of the end users.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    目的:膝关节振动关节造影(VAG)信号是进行非侵入性膝骨关节炎(KOA)诊断的有效方法,VAG信号分析在实现膝关节早期病理筛查中起着至关重要的作用。为了提高膝关节病理筛查的准确性,研究适合嵌入可穿戴膝关节诊断装置的方法,本文提出了一种膝关节病理筛查方法。旨在填补单一特征和融合特征缺乏合适统一评价指标的空白,本文提出了特征可分性评价标准。
    方法:在本文中,提出了一种基于特征融合和降维结合随机森林分类器的膝关节病理筛选方法,还有,特征可分性的评价标准。至于病理筛查方法,本文提出了多维特征融合的思想,利用主成分分析(PCA)来减少融合特征(F-F)的冗余部分,得到具有更高可分性的深度融合特征(D-F-F)。同时,本文提出了最大信息系数(MIC)和相关矩阵共线性(CMC)特征评价准则,这些不仅可以用作新的特征量化指标,但也说明了深度融合特征的可分性比特征降维之前更有效。
    结果:实验结果表明,本文方法在随机森林分类器上的病理分类中具有良好的性能,准确率为96%,特别是特征降维后SVM和K-NN的精度也得到了提高。
    结论:本分类研究对KOA的诊断具有较高的筛查效率,可为计算机辅助KOA的无创性诊断提供一种可行的方法。并且我们为VAG信号特征的可分性评估提供了一种新颖的方法。
    OBJECTIVE: Knee-joint vibroarthrographic (VAG) signal is an effective method for performing a non-invasive knee osteoarthritis (KOA) diagnosis, VAG signal analysis plays a crucial role in achieving the early pathological screening of the knee joint. In order to improve the accuracy of knee pathology screening and to investigate the method suitable for embedded in wearable diagnostic device for knee joint, this paper proposes a knee pathology screening method. Aiming to fill the gap of lacking suitable and unified evaluation indexes for single feature and fusion feature, this paper proposes feature separability evaluation criteria.
    METHODS: In this paper, we propose a knee joint pathology screening method based on feature fusion and dimension reduction combined with random forest classifier, as well as, the evaluation criteria of feature separability. As for pathological screening method, this paper proposes the idea of multi-dimensional feature fusion, using principal component analysis (PCA) to reduce the redundant part of fusion feature (F-F) to obtain deep fusion feature (D-F-F) with more separability. Meanwhile, this paper proposes the maximal information coefficient (MIC) and correlation matrix collinearity (CMC) feature evaluation criteria, these not only can be used as new feature quantitative metrics, but also illustrate that the divisibility of the deep fusion feature is more potent than that before feature dimension reduction.
    RESULTS: The experimental results show that the method in this paper has good performance in pathology classification on random forest classifier with 96% accuracy, especially the accuracy of SVM and K-NN are also improved after feature dimension reduction.
    CONCLUSIONS: The results indicate that this classification research has high screening efficiency for KOA diagnosis and could provide a feasible method for computer-assisted non-invasive diagnosis of KOA. And we provide a novel way for separability evaluation of VAG signal features.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    本研究旨在利用机器学习算法与特征减少相结合,根据热解条件和生物质特性预测热解气体产率和组成。为此,对随机森林(RF)和支持向量机(SVM)进行了介绍和比较。结果表明,六个特征足以准确预测(R2>0.85,RMSE<5.7%)产率,而组合物仅需要三个。此外,提取了模型背后的深刻信息。热解条件对产率的相对贡献高于生物质特性(55%),CO2(73%),和H2(81%),这与CO(12%)和CH4(38%)相反。此外,部分依赖性分析量化了减少的特征及其相互作用对热解过程的影响。该研究以更少的特性为热解气体的生产和升级提供了参考,并将知识扩展到生物质热解过程中。
    This study aimed to utilize machine learning algorithems combined with feature reduction for predicting pyrolytic gas yield and compositions based on pyrolysis conditions and biomass characteristics. To this end, random forest (RF) and support vector machine (SVM) was introduced and compared. The results suggested that six features were adequate to accurately forecast (R2 > 0.85, RMSE < 5.7%) the yield while the compositions only required three. Moreover, the profound information behind the models was extracted. The relative contribution of pyrolysis conditions was higher than that of biomass characteristics for yield (55%), CO2 (73%), and H2 (81%), which was inverse for CO (12%) and CH4 (38%). Furthermore, partial dependence analysis quantified the effects of both reduced features and their interactions exerted on pyrolysis process. This study provided references for pyrolytic gas production and upgrading in a more convenient manner with fewer features and extended the knowledge into the biomass pyrolysis process.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    In the clinical diagnosis of epileptic diseases, the intelligent diagnosis of epileptic electroencephalogram (EEG) signals has become a research focus in the field of brain diseases. In order to solve the problem of time-consuming and easily influenced by human subjective factors, artificial intelligence pattern recognition algorithm has been applied to EEG signals recognition. However, at present, the common empirical mode decomposition (EMD) signal decomposition algorithm does not consider the problem of mode aliasing. The EEG features obtained by feature extraction may be mixed with some unimportant features that affect the classification accuracy. In this paper, we proposed a new method based on complementary ensemble empirical mode decomposition (CEEMD) combined with iterative feature reduction for aided diagnosis of epileptic EEG. First of all, the evaluation indexes of decomposing and reconstructing signals by several methods were compared. The CEEMD was selected as the decomposition method of the signals. Then, the support vector machine recursive elimination (SVM-RFE) was used to reduce 9 features extracted from EEG data. The support vector classification of the gray wolf optimizer (GWO-SVC) recognition model was established for different feature subsets. By comparing the classification accuracy of training set and test set of different feature subsets, and considering the complexity of the model reflected by the number of features selected by SVM-RFE, the analysis showed that the 6 feature subsets with fewer features and higher classification accuracy could reflect the key information of epileptic EEG. The accuracy of the training set classification was 99.38% and the test set was as high as 100%. The recognition time was only 1.6551 s. Finally, in order to verify the reliability of the algorithm proposed in this paper, the proposed algorithm compared with the classification model established by the raw EEG signals and the optimization model established by other intelligent optimization algorithms. It is found that the algorithm used in this paper has higher classification accuracy and faster recognition time than other processing methods. The experimental results show that CEEMD combined with SVM-RFE is feasible for rapid and accurate recognition of EEG signals, which provides a theoretical basis for the aided diagnosis of epilepsy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    At present, the traditional scoring methods generally utilize laboratory measurements to predict mortality. It results in difficulties of early mortality prediction in the rural areas lack of professional laboratorians and medical laboratory equipment. To improve the efficiency, accuracy, and applicability of mortality prediction in the remote areas, a novel mortality prediction method based on machine learning algorithms is proposed, which only uses non-invasive parameters readily available from ordinary monitors and manual measurement. A new feature selection method based on the Bayes error rate is developed to select valuable features. Based on non-invasive parameters, four machine learning models were trained for early mortality prediction. The subjects contained in this study suffered from general critical diseases including but not limited to cancer, bone fracture, and diarrhea. Comparison tests among five traditional scoring methods and these four machine learning models with and without laboratory measurement variables are performed. Only using the non-invasive parameters, the LightGBM algorithms have an excellent performance with the largest accuracy of 0.797 and AUC of 0.879. There is no apparent difference between the mortality prediction performance with and without laboratory measurement variables for the four machine learning methods. After reducing the number of feature variables to no more than 50, the machine learning models still outperform the traditional scoring systems, with AUC higher than 0.83. The machine learning approaches only using non-invasive parameters achieved an excellent mortality prediction performance and can equal those using extra laboratory measurements, which can be applied in rural areas and remote battlefield for mortality risk evaluation. Graphical abstract.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Published Erratum
    [This corrects the article on p. 146 in vol. 9, PMID: 28572766.].
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号