关键词: Deep learning Dimension selection Embedding Imputation Social network Stochastic Gradient Markov chain Monte Carlo

来  源:   DOI:10.1016/j.neunet.2024.106512

Abstract:
Network embedding is a general-purpose machine learning technique that converts network data from non-Euclidean space to Euclidean space, facilitating downstream analyses for the networks. However, existing embedding methods are often optimization-based, with the embedding dimension determined in a heuristic or ad hoc way, which can cause potential bias in downstream statistical inference. Additionally, existing deep embedding methods can suffer from a nonidentifiability issue due to the universal approximation power of deep neural networks. We address these issues within a rigorous statistical framework. We treat the embedding vectors as missing data, reconstruct the network features using a sparse decoder, and simultaneously impute the embedding vectors and train the sparse decoder using an adaptive stochastic gradient Markov chain Monte Carlo (MCMC) algorithm. Under mild conditions, we show that the sparse decoder provides a parsimonious mapping from the embedding space to network features, enabling effective selection of the embedding dimension and overcoming the nonidentifiability issue encountered by existing deep embedding methods. Furthermore, we show that the embedding vectors converge weakly to a desired posterior distribution in the 2-Wasserstein distance, addressing the potential bias issue experienced by existing embedding methods. This work lays down the first theoretical foundation for network embedding within the framework of missing data imputation.
摘要:
网络嵌入是一种通用的机器学习技术,它将网络数据从非欧几里得空间转换为欧几里得空间,促进网络的下游分析。然而,现有的嵌入方法通常是基于优化的,嵌入维度以启发式或临时方式确定,这可能会导致下游统计推断中的潜在偏差。此外,由于深度神经网络的普遍逼近能力,现有的深度嵌入方法可能会遇到不可辨识性问题。我们在严格的统计框架内解决这些问题。我们将嵌入向量视为缺失数据,使用稀疏解码器重建网络特征,并使用自适应随机梯度马尔可夫链蒙特卡罗(MCMC)算法同时计算嵌入向量并训练稀疏解码器。在温和的条件下,我们证明了稀疏解码器提供了从嵌入空间到网络特征的简约映射,能够有效选择嵌入维度,克服现有深度嵌入方法遇到的不可识别性问题。此外,我们证明了嵌入向量在2-Wasserstein距离上微弱地收敛到所需的后验分布,解决现有嵌入方法遇到的潜在偏见问题。这项工作为在缺失数据填补框架内的网络嵌入奠定了第一个理论基础。
公众号