%0 Journal Article
%T Optimizing clinico-genomic disease prediction across ancestries: a machine learning strategy with Pareto improvement.
%A Gao Y
%A Cui Y
%J Genome Med
%V 16
%N 1
%D 2024 Jun 4
%M 38835075
%F 15.266
%R 10.1186/s13073-024-01345-0
%X BACKGROUND: Accurate prediction of an individual's predisposition to diseases is vital for preventive medicine and early intervention. Various statistical and machine learning models have been developed for disease prediction using clinico-genomic data. However, the accuracy of clinico-genomic prediction of diseases may vary significantly across ancestry groups due to their unequal representation in clinical genomic datasets.
METHODS: We introduced a deep transfer learning approach to improve the performance of clinico-genomic prediction models for data-disadvantaged ancestry groups. We conducted machine learning experiments on multi-ancestral genomic datasets of lung cancer, prostate cancer, and Alzheimer's disease, as well as on synthetic datasets with built-in data inequality and distribution shifts across ancestry groups.
RESULTS: Deep transfer learning significantly improved disease prediction accuracy for data-disadvantaged populations in our multi-ancestral machine learning experiments. In contrast, transfer learning based on linear frameworks did not achieve comparable improvements for these data-disadvantaged populations.
CONCLUSIONS: This study shows that deep transfer learning can enhance fairness in multi-ancestral machine learning by improving prediction accuracy for data-disadvantaged populations without compromising prediction accuracy for other populations, thus providing a Pareto improvement towards equitable clinico-genomic prediction of diseases.