dimension reduction

降维
  • 文章类型: Journal Article
    UNASSIGNED:主成分分析(PCA)被称为多变量统计模型,用于将维度简化为主成分的表示。因此,PCA通常用于建立心理测量学属性,即,结构效度。自动编码器是一种神经网络模型,它也被证明在降维方面表现良好。虽然有几种方法可以比较PCA和自动编码器的差异,最近的大多数文献都集中在图像重建的差异上,通常足以训练数据。在目前的研究中,我们研究了每个自动编码器分类器的细节,以及它们如何提供神经网络优势,可以更好地推广非正态分布的小数据集。
    未经评估:进行了蒙特卡罗模拟,改变非正态的水平,样本大小,和社区水平。使用均方误差比较了自动编码器和PCA的性能,平均绝对值,和欧几里得距离。研究了小样本自动编码器的可行性。
    UNASSIGNED:在使用线性和非线性映射的解码表示中具有极大的灵活性,这项研究表明,自动编码器可以鲁棒地减小尺寸,因此在样本量小至100的情况下有效地构建了结构效度。自编码器可以获得较小的均方误差和原始数据集与小的非正常数据集的预测之间的小的欧几里得距离。因此,当行为科学家试图探索新设计的问卷的结构效度时,自动编码器也可以被认为是PCA的替代方案。
    UNASSIGNED: The principal component analysis (PCA) is known as a multivariate statistical model for reducing dimensions into a representation of principal components. Thus, the PCA is commonly adopted for establishing psychometric properties, i.e., the construct validity. Autoencoder is a neural network model, which has also been shown to perform well in dimensionality reduction. Although there are several ways the PCA and autoencoders could be compared for their differences, most of the recent literature focused on differences in image reconstruction, which are often sufficient for training data. In the current study, we looked at details of each autoencoder classifier and how they may provide neural network superiority that can better generalize non-normally distributed small datasets.
    UNASSIGNED: A Monte Carlo simulation was conducted, varying the levels of non-normality, sample sizes, and levels of communality. The performances of autoencoders and a PCA were compared using the mean square error, mean absolute value, and Euclidian distance. The feasibility of autoencoders with small sample sizes was examined.
    UNASSIGNED: With extreme flexibility in decoding representation using linear and non-linear mapping, this study demonstrated that the autoencoder can robustly reduce dimensions, and hence was effective in building the construct validity with a sample size as small as 100. The autoencoders could obtain a smaller mean square error and small Euclidian distance between original dataset and predictions for a small non-normal dataset. Hence, when behavioral scientists attempt to explore the construct validity of a newly designed questionnaire, an autoencoder could also be considered an alternative to a PCA.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    We introduce a novel class of factor analysis methodologies for the joint analysis of multiple studies. The goal is to separately identify and estimate (1) common factors shared across multiple studies, and (2) study-specific factors. We develop an Expectation Conditional-Maximization algorithm for parameter estimates and we provide a procedure for choosing the numbers of common and specific factors. We present simulations for evaluating the performance of the method and we illustrate it by applying it to gene expression data in ovarian cancer. In both, we clarify the benefits of a joint analysis compared to the standard factor analysis. We have provided a tool to accelerate the pace at which we can combine unsupervised analysis across multiple studies, and understand the cross-study reproducibility of signal in multivariate data. An R package (MSFA), is implemented and is available on GitHub.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Quantitative bioactivity and toxicity assessment of chemical compounds plays a central role in drug discovery as it saves a substantial amount of resources. To this end, high-performance computing has enabled researchers and practitioners to leverage hundreds, or even thousands, of computed molecular descriptors for the activity prediction of candidate compounds. In this paper, we evaluate the utility of two large groups of chemical descriptors by such predictive modelling, as well as chemical structure discovery, through empirical analysis. We use a suite of commercially available and in-house software to calculate molecular descriptors for two sets of chemical mutagens - a homogeneous set of 95 amines, and a diverse set of 508 chemicals. Using calculated descriptors, we model the mutagenic activity of these compounds using a number of methods from the statistics and machine-learning literature, and use robust principal component analysis to investigate the low-dimensional subspaces that characterize these chemicals. Our results suggest that combining different sets of descriptors is likely to result in a better predictive model - but that depends on the compounds being modelled and the modelling technique being used.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    BACKGROUND: Speech disorders such as dysphonia and dysarthria represent an early and common manifestation of Parkinson\'s disease. Class prediction is an essential task in automatic speech treatment, particularly in the Parkinson\'s disease case. Many classification experiments have been performed which focus on the automatic detection of Parkinson\'s disease patients from healthy speakers but results are still not optimistic. A major problem in accomplishing this task is high dimensionality of speech data.
    OBJECTIVE: In this work, the potential of Principal Component Analysis (PCA) based modeling in dimensionality reduction is taken into consideration as the data smoothening tool with multiclass target expression data.
    METHODS: On the basis of suggested PCA-based modeling, the power of class prediction using logistic regression (LR) and C5.0 in numeric data is investigated in publicly available Parkinson\'s disease dataset Silverman voice treatment (LSVT) to develop an advanced classification model.
    RESULTS: The main advantage of our model is the effective reduction of the number of factors from p= 309 to k= 32 for LSVT Voice Rehabilitation dataset, with a fine classification accuracy of 100% and 99.92% for PCA-LR and PCA-C5.0 respectively. In addition, using only 9 dysphonia features, classification accuracy was (99.20%) and (99.11%) for PCA-LR, and PCA-C5.0 respectively.
    CONCLUSIONS: Our combined dimension reduction and data smoothening approaches have significant potential to minimize the number of features and increase the classification accuracy and then automatically classify subjects into Parkinson\'s disease patients or healthy speakers.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号