{Reference Type}: Journal Article {Title}: Prospective modeling and estimating the epidemiologically informative match rate within large foodborne pathogen genomic databases. {Author}: Yin L;Pettengill JB; {Journal}: BMC Res Notes {Volume}: 17 {Issue}: 1 {Year}: 2024 Jul 9 暂无{DOI}: 10.1186/s13104-024-06847-z {Abstract}: OBJECTIVE: Much has been written about the utility of genomic databases to public health. Within food safety these databases contain data from two types of isolates-those from patients (i.e., clinical) and those from non-clinical sources (e.g., a food manufacturing environment). A genetic match between isolates from these sources represents a signal of interest. We investigate the match rate within three large genomic databases (Listeria monocytogenes, Escherichia coli, and Salmonella) and the smaller Cronobacter database; the databases are part of the Pathogen Detection project at NCBI (National Center for Biotechnology Information).
RESULTS: Currently, the match rate of clinical isolates to non-clinical isolates is 33% for L. monocytogenes, 46% for Salmonella, and 7% for E. coli. These match rates are associated with several database features including the diversity of the organism, the database size, and the proportion of non-clinical BioSamples. Modeling match rate via logistic regression showed relatively good performance. Our prediction model illustrates the importance of populating databases with non-clinical isolates to better identify a match for clinical samples. Such information should help public health officials prioritize surveillance strategies and show the critical need to populate fledgling databases (e.g., Cronobacter sakazakii).