关键词: Interpolation Matrix completion Matrix factorization Multivariate longitudinal data Regression

来  源:   DOI:10.1080/10618600.2023.2257257   PDF(Pubmed)

Abstract:
In clinical practice and biomedical research, measurements are often collected sparsely and irregularly in time, while the data acquisition is expensive and inconvenient. Examples include measurements of spine bone mineral density, cancer growth through mammography or biopsy, a progression of defective vision, or assessment of gait in patients with neurological disorders. Practitioners often need to infer the progression of diseases from such sparse observations. A classical tool for analyzing such data is a mixed-effect model where time is treated as both a fixed effect (population progression curve) and a random effect (individual variability). Alternatively, researchers use Gaussian processes or functional data analysis, assuming that observations are drawn from a certain distribution of processes. While these models are flexible, they rely on probabilistic assumptions, require very careful implementation, and tend to be slow in practice. In this study, we propose an alternative elementary framework for analyzing longitudinal data motivated by matrix completion. Our method yields estimates of progression curves by iterative application of the Singular Value Decomposition. Our framework covers multivariate longitudinal data, and regression and can be easily extended to other settings. As it relies on existing tools for matrix algebra, it is efficient and easy to implement. We apply our methods to understand trends of progression of motor impairment in children with Cerebral Palsy. Our model approximates individual progression curves and explains 30% of the variability. Low-rank representation of progression trends enables identification of different progression trends in subtypes of Cerebral Palsy.
摘要:
在临床实践和生物医学研究中,测量通常是稀疏和不定期地收集的,而数据采集昂贵且不方便。例子包括脊柱骨矿物质密度的测量,通过乳房X线照相术或活检,视力缺陷的进展,或评估神经系统疾病患者的步态。从业者通常需要从这种稀疏的观察中推断疾病的进展。用于分析此类数据的经典工具是混合效应模型,其中时间被视为固定效应(群体进展曲线)和随机效应(个体变异性)。或者,研究人员使用高斯过程或函数数据分析,假设观察是从一定的过程分布中得出的。虽然这些模型是灵活的,他们依赖于概率假设,需要非常仔细的执行,在实践中往往很慢。在这项研究中,我们提出了一个替代的基本框架,用于分析由矩阵完成驱动的纵向数据。我们的方法通过迭代应用奇异值分解来获得级数曲线的估计。我们的框架涵盖了多元纵向数据,和回归,可以很容易地扩展到其他设置。由于它依赖于矩阵代数的现有工具,它是高效和易于实现。我们应用我们的方法来了解脑瘫儿童运动障碍的进展趋势。我们的模型逼近了个体的进展曲线,并解释了30%的变异性。进展趋势的低秩表示能够识别脑瘫亚型的不同进展趋势。
公众号