关键词: Autoencoder Construct validity Dimension reduction Factor analysis Principal component analysis Sample size

来  源:   DOI:10.7717/peerj-cs.782   PDF(Pubmed)

Abstract:
UNASSIGNED: The principal component analysis (PCA) is known as a multivariate statistical model for reducing dimensions into a representation of principal components. Thus, the PCA is commonly adopted for establishing psychometric properties, i.e., the construct validity. Autoencoder is a neural network model, which has also been shown to perform well in dimensionality reduction. Although there are several ways the PCA and autoencoders could be compared for their differences, most of the recent literature focused on differences in image reconstruction, which are often sufficient for training data. In the current study, we looked at details of each autoencoder classifier and how they may provide neural network superiority that can better generalize non-normally distributed small datasets.
UNASSIGNED: A Monte Carlo simulation was conducted, varying the levels of non-normality, sample sizes, and levels of communality. The performances of autoencoders and a PCA were compared using the mean square error, mean absolute value, and Euclidian distance. The feasibility of autoencoders with small sample sizes was examined.
UNASSIGNED: With extreme flexibility in decoding representation using linear and non-linear mapping, this study demonstrated that the autoencoder can robustly reduce dimensions, and hence was effective in building the construct validity with a sample size as small as 100. The autoencoders could obtain a smaller mean square error and small Euclidian distance between original dataset and predictions for a small non-normal dataset. Hence, when behavioral scientists attempt to explore the construct validity of a newly designed questionnaire, an autoencoder could also be considered an alternative to a PCA.
摘要:
UNASSIGNED:主成分分析(PCA)被称为多变量统计模型,用于将维度简化为主成分的表示。因此,PCA通常用于建立心理测量学属性,即,结构效度。自动编码器是一种神经网络模型,它也被证明在降维方面表现良好。虽然有几种方法可以比较PCA和自动编码器的差异,最近的大多数文献都集中在图像重建的差异上,通常足以训练数据。在目前的研究中,我们研究了每个自动编码器分类器的细节,以及它们如何提供神经网络优势,可以更好地推广非正态分布的小数据集。
未经评估:进行了蒙特卡罗模拟,改变非正态的水平,样本大小,和社区水平。使用均方误差比较了自动编码器和PCA的性能,平均绝对值,和欧几里得距离。研究了小样本自动编码器的可行性。
UNASSIGNED:在使用线性和非线性映射的解码表示中具有极大的灵活性,这项研究表明,自动编码器可以鲁棒地减小尺寸,因此在样本量小至100的情况下有效地构建了结构效度。自编码器可以获得较小的均方误差和原始数据集与小的非正常数据集的预测之间的小的欧几里得距离。因此,当行为科学家试图探索新设计的问卷的结构效度时,自动编码器也可以被认为是PCA的替代方案。
公众号