%0 Dataset %T A multimodal framework for extraction and fusion of satellite images and public health data. %A Moukheiber D %A Restrepo D %A Cajas SA %A Montoya MPA %A Celi LA %A Kuo KT %A López DM %A Moukheiber L %A Moukheiber M %A Moukheiber S %A Osorio-Valencia JS %A Purkayastha S %A Paddo AR %A Wu C %A Kuo PC %J Sci Data %V 11 %N 1 %D 2024 Jun 15 %M 38879585 %F 8.501 %R 10.1038/s41597-024-03366-1 %X In low- and middle-income countries, the substantial costs associated with traditional data collection pose an obstacle to facilitating decision-making in the field of public health. Satellite imagery offers a potential solution, but the image extraction and analysis can be costly and requires specialized expertise. We introduce SatelliteBench, a scalable framework for satellite image extraction and vector embeddings generation. We also propose a novel multimodal fusion pipeline that utilizes a series of satellite imagery and metadata. The framework was evaluated generating a dataset with a collection of 12,636 images and embeddings accompanied by comprehensive metadata, from 81 municipalities in Colombia between 2016 and 2018. The dataset was then evaluated in 3 tasks: including dengue case prediction, poverty assessment, and access to education. The performance showcases the versatility and practicality of SatelliteBench, offering a reproducible, accessible and open tool to enhance decision-making in public health.