阿尔茨海默病(AD)正在影响越来越多的个体。因此,迫切需要准确和早期的诊断方法。本研究旨在通过开发最佳的数据分析策略以增强计算诊断来实现这一目标。尽管收集了各种形式的AD诊断数据,过去对AD诊断计算方法的研究主要集中在使用单模态输入。我们假设整合,或“融合”,“各种数据模式作为预测模型的输入,可以通过提供更全面的个人健康状况视图来提高诊断准确性。然而,一个潜在的挑战出现了,因为这种多种模式的融合可能会导致更高的维度数据。我们假设,在异构模态中采用合适的降维方法不仅可以帮助诊断模型提取潜在信息,还可以提高准确性。因此,必须确定数据融合和降维的最佳策略。在本文中,我们对80多种统计机器学习方法进行了综合比较,考虑到各种分类器,降维技术,和数据融合策略来评估我们的假设。具体来说,我们探索了三种主要策略:(1)简单的数据融合,这涉及在将数据集输入分类器之前直接串联(融合)数据集;(2)早期数据融合,首先连接数据集,然后在将结果数据馈送到分类器之前应用降维技术;以及(3)中间数据融合,其中降维方法在连接它们以构造分类器之前单独应用于每个数据集。对于降维,我们已经探索了几种常用的技术,如主成分分析(PCA),自动编码器(AE),还有LASSO.此外,我们已经实现了一种新的降维方法,称为监督编码器(SE),这涉及对标准深度神经网络的轻微修改。我们的结果表明,与PCA相比,SE大大提高了预测精度,AE,还有LASSO,特别是结合中间融合进行多类诊断预测。
Alzheimer\'s disease (AD) is affecting a growing number of individuals. As a result, there is a pressing need for accurate and early diagnosis methods. This study aims to achieve this goal by developing an optimal data analysis strategy to enhance computational diagnosis. Although various modalities of AD diagnostic data are collected, past research on computational methods of AD diagnosis has mainly focused on using single-modal inputs. We hypothesize that integrating, or \"fusing,\" various data modalities as inputs to prediction models could enhance diagnostic accuracy by offering a more comprehensive view of an individual\'s health profile. However, a potential challenge arises as this fusion of multiple modalities may result in significantly higher dimensional data. We hypothesize that employing suitable dimensionality reduction methods across heterogeneous modalities would not only help diagnosis models extract latent information but also enhance accuracy. Therefore, it is imperative to identify optimal strategies for both data fusion and dimensionality reduction. In this paper, we have conducted a comprehensive comparison of over 80 statistical machine learning methods, considering various classifiers, dimensionality reduction techniques, and data fusion strategies to assess our hypotheses. Specifically, we have explored three primary strategies: (1) Simple data fusion, which involves straightforward concatenation (fusion) of datasets before inputting them into a classifier; (2) Early data fusion, in which datasets are concatenated first, and then a dimensionality reduction technique is applied before feeding the resulting data into a classifier; and (3) Intermediate data fusion, in which dimensionality reduction methods are applied individually to each dataset before concatenating them to construct a classifier. For dimensionality reduction, we have explored several commonly-used techniques such as principal component analysis (PCA), autoencoder (AE), and LASSO. Additionally, we have implemented a new dimensionality-reduction method called the supervised encoder (SE), which involves slight modifications to standard deep neural networks. Our results show that SE substantially improves prediction accuracy compared to PCA, AE, and LASSO, especially in combination with intermediate fusion for multiclass diagnosis prediction.