%0 Journal Article
%T Prospective modeling and estimating the epidemiologically informative match rate within large foodborne pathogen genomic databases.
%A Yin L
%A Pettengill JB
%J BMC Res Notes
%V 17
%N 1
%D 2024 Jul 9
%M 38982485
暂无%R 10.1186/s13104-024-06847-z
%X <B>OBJECTIVE: </B>Much has been written about the utility of genomic databases to public health. Within food safety these databases contain data from two types of isolates-those from patients (i.e., clinical) and those from non-clinical sources (e.g., a food manufacturing environment). A genetic match between isolates from these sources represents a signal of interest. We investigate the match rate within three large genomic databases (Listeria monocytogenes, Escherichia coli, and Salmonella) and the smaller Cronobacter database; the databases are part of the Pathogen Detection project at NCBI (National Center for Biotechnology Information).<BR><B>RESULTS: </B>Currently, the match rate of clinical isolates to non-clinical isolates is 33% for L. monocytogenes, 46% for Salmonella, and 7% for E. coli. These match rates are associated with several database features including the diversity of the organism, the database size, and the proportion of non-clinical BioSamples. Modeling match rate via logistic regression showed relatively good performance. Our prediction model illustrates the importance of populating databases with non-clinical isolates to better identify a match for clinical samples. Such information should help public health officials prioritize surveillance strategies and show the critical need to populate fledgling databases (e.g., Cronobacter sakazakii).