{Reference Type}: Dataset {Title}: A multimodal framework for extraction and fusion of satellite images and public health data. {Author}: Moukheiber D;Restrepo D;Cajas SA;Montoya MPA;Celi LA;Kuo KT;López DM;Moukheiber L;Moukheiber M;Moukheiber S;Osorio-Valencia JS;Purkayastha S;Paddo AR;Wu C;Kuo PC; {Journal}: Sci Data {Volume}: 11 {Issue}: 1 {Year}: 2024 Jun 15 {Factor}: 8.501 {DOI}: 10.1038/s41597-024-03366-1 {Abstract}: In low- and middle-income countries, the substantial costs associated with traditional data collection pose an obstacle to facilitating decision-making in the field of public health. Satellite imagery offers a potential solution, but the image extraction and analysis can be costly and requires specialized expertise. We introduce SatelliteBench, a scalable framework for satellite image extraction and vector embeddings generation. We also propose a novel multimodal fusion pipeline that utilizes a series of satellite imagery and metadata. The framework was evaluated generating a dataset with a collection of 12,636 images and embeddings accompanied by comprehensive metadata, from 81 municipalities in Colombia between 2016 and 2018. The dataset was then evaluated in 3 tasks: including dengue case prediction, poverty assessment, and access to education. The performance showcases the versatility and practicality of SatelliteBench, offering a reproducible, accessible and open tool to enhance decision-making in public health.