Source attribution

来源归属
  • 文章类型: Journal Article
    我们量化了与韩国环境空气污染暴露相关的人为健康负担来源,并使用联合国环境规划署(UNEP)提供的2050年国内排放控制情景预测未来的健康负担。我们的健康负担估计框架使用GEOS-Chem模拟,卫星衍生的NO2,以及地面对PM2.5,O3和NO2的观测。我们估计,由于长期暴露于PM2.5,O3和NO2,分别导致19,000,3,300和8,500例过早死亡。2016年与NO2相关的儿童哮喘发病率为23,000例。接下来,我们使用伴随敏感性分析计算了每个物种和网格单元对这四种健康负担的人为排放贡献。国内来源占56%,38%,87%,和88%的边际排放贡献对PM2.5-,O3-,与NO2相关的过早死亡和与NO2相关的儿童哮喘发病率,分别。我们使用环境署的国内排放情景(基线和缓解)和韩国统计局的人口预测,预计到2050年的健康负担。仅仅因为人口老龄化,与PM2.5,O3和NO2暴露相关的过早死亡人数增加了41,000,10,000和20,000,分别,与NO2相关的儿童哮喘发病率减少9,000。缓解方案使NO2相关的健康益处比基线方案增加一倍,到2050年,预防24,000例过早死亡和13,000例儿童哮喘发病率。它还略微减少了与PM2.5和O3相关的过早死亡9.9%和7.0%,与这些污染物增加的基线情景不同。此外,我们检查了九种基于SSP/RCP的情景对外国排放的影响,强调国际合作减少PM2.5和O3污染的必要性。
    We quantify anthropogenic sources of health burdens associated with ambient air pollution exposure in South Korea and forecast future health burdens using domestic emission control scenarios by 2050 provided by the United Nations Environment Programme (UNEP). Our health burden estimation framework uses GEOS-Chem simulations, satellite-derived NO2, and ground-based observations of PM2.5, O3, and NO2. We estimate 19,000, 3,300, and 8,500 premature deaths owing to long-term exposure to PM2.5, O3, and NO2, respectively, and 23,000 NO2-associated childhood asthma incidences in 2016. Next, we calculate anthropogenic emission contributions to these four health burdens from each species and grid cell using adjoint sensitivity analysis. Domestic sources account for 56%, 38%, 87%, and 88% of marginal emission contributions to the PM2.5-, O3-, and NO2-associated premature deaths and the NO2-associated childhood asthma incidences, respectively. We project health burdens to 2050 using UNEP domestic emission scenarios (Baseline and Mitigation) and population forecasts from Statistics Korea. Because of population aging alone, there are 41,000, 10,000, and 20,000 more premature deaths associated with PM2.5, O3, and NO2 exposure, respectively, and 9,000 fewer childhood asthma incidences associated with NO2. The Mitigation scenario doubles the NO2-associated health benefits over the Baseline scenario, preventing 24,000 premature deaths and 13,000 childhood asthma incidences by 2050. It also slightly reduces PM2.5- and O3-associated premature deaths by 9.9% and 7.0%, unlike the Baseline scenario where these pollutants increase. Furthermore, we examine foreign emission impacts from nine SSP/RCP-based scenarios, highlighting the need for international cooperation to reduce PM2.5 and O3 pollution.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    微生物源追踪利用了多种旨在追踪水生环境中粪便污染起源的方法。尽管源跟踪方法通常在实验室环境中使用,可以利用计算技术来推进微生物源跟踪方法。在这里,我们提出了一种基于逻辑回归的监督学习方法,用于在大肠杆菌基因组的基因间区域内发现源信息遗传标记,可用于源跟踪。只有一个基因间基因座,逻辑回归能够识别高度特定的来源(即,超过97.00%)的生物标志物,用于广泛的宿主和利基来源,某些来源类别的敏感度高达30.00%-50.00%,包括猪,绵羊,鼠标,和废水,取决于分析的特定基因间基因座。限制来源范围,以反映大肠杆菌传播的最突出的人畜共患来源(即,牛,鸡肉,人类,和猪)允许生成所有宿主类别的信息生物标志物,特异性至少为90.00%,敏感性在12.50%至70.00%之间,使用来自关键基因间区域的序列数据,包括emrKY-evgas,ibsB-(mdtABCD-baeSR),ompC-rcsDB,和yedS-yedR,似乎与抗生素耐药性有关。值得注意的是,我们能够使用这种方法将瑞典西北部收集的113种河水大肠杆菌分离物中的48种分类为海狸,人类,或起源的驯鹿具有高度的共识-从而突出了逻辑回归建模作为增强当前源跟踪工作的新颖方法的潜力。重要的是微生物污染物的存在,特别是从粪便来源,在水中对公众健康构成严重威胁。水传播病原体的健康和经济负担可能是巨大的-因此,检测和识别环境水域粪便污染源的能力对于控制水传播疾病至关重要。这可以通过微生物来源追踪来实现,其中涉及使用各种实验室技术来追踪环境中微生物污染的起源。基于当前的源跟踪方法,我们描述了一种使用逻辑回归的新工作流程,一种有监督的机器学习方法,在大肠杆菌中发现遗传标记,一种常见的粪便指示细菌,可用于源跟踪工作。重要的是,我们的研究提供了一个例子,说明如何将机器学习算法的重要性提高到改进当前的微生物源跟踪方法。
    Microbial source tracking leverages a wide range of approaches designed to trace the origins of fecal contamination in aquatic environments. Although source tracking methods are typically employed within the laboratory setting, computational techniques can be leveraged to advance microbial source tracking methodology. Herein, we present a logic regression-based supervised learning approach for the discovery of source-informative genetic markers within intergenic regions across the Escherichia coli genome that can be used for source tracking. With just single intergenic loci, logic regression was able to identify highly source-specific (i.e., exceeding 97.00%) biomarkers for a wide range of host and niche sources, with sensitivities reaching as high as 30.00%-50.00% for certain source categories, including pig, sheep, mouse, and wastewater, depending on the specific intergenic locus analyzed. Restricting the source range to reflect the most prominent zoonotic sources of E. coli transmission (i.e., bovine, chicken, human, and pig) allowed for the generation of informative biomarkers for all host categories, with specificities of at least 90.00% and sensitivities between 12.50% and 70.00%, using the sequence data from key intergenic regions, including emrKY-evgAS, ibsB-(mdtABCD-baeSR), ompC-rcsDB, and yedS-yedR, that appear to be involved in antibiotic resistance. Remarkably, we were able to use this approach to classify 48 out of 113 river water E. coli isolates collected in Northwestern Sweden as either beaver, human, or reindeer in origin with a high degree of consensus-thus highlighting the potential of logic regression modeling as a novel approach for augmenting current source tracking efforts.IMPORTANCEThe presence of microbial contaminants, particularly from fecal sources, within water poses a serious risk to public health. The health and economic burden of waterborne pathogens can be substantial-as such, the ability to detect and identify the sources of fecal contamination in environmental waters is crucial for the control of waterborne diseases. This can be accomplished through microbial source tracking, which involves the use of various laboratory techniques to trace the origins of microbial pollution in the environment. Building on current source tracking methodology, we describe a novel workflow that uses logic regression, a supervised machine learning method, to discover genetic markers in Escherichia coli, a common fecal indicator bacterium, that can be used for source tracking efforts. Importantly, our research provides an example of how the rise in prominence of machine learning algorithms can be applied to improve upon current microbial source tracking methodology.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    最近对连续监测(CM)解决方案的监管关注和CM解决方案的快速发展要求通过定期、使用共识测试协议进行严格测试。这项研究是这种协议的第二个已知实现,涉及9CM解决方案的单盲控制测试。在风速范围(0.7-9.9m/s)下,在持续时间(0.4-10.2h)内控制释放速率(6-7100g)CH4/h持续11周。结果表明,4种溶液在测试的排放速率范围内达到了方法检测限(DL90s),所有4种溶液的DL90s最低(3.9[3.0,5.5]kgCH4/h至6.2[3.7,16.7]kgCH4/h)和假阳性率(6.9-13.2%),表明在平衡低敏感度和低假阳性率方面的努力。由于测试中心代表了接近理想的上游气田天然气运行条件,因此这些结果可能是最佳情况下的估计。量化结果显示了广泛的个体估计不确定性,排放低估和高估的因素分别高达>14和42。对于在[0.1-1]kgCH4/h和>1kgCH4/h范围内的受控释放,三种溶液的估计值在3的定量因子内>80%。相对于Bell等人的研究。,当前的解决方案性能,作为一个群体,总体改善,主要是由于贝尔等人的研究得出的解决方案。重新测试过的。这一结果突出了定期质量测试对于提高CM解决方案以有效缓解排放的重要性。
    The recent regulatory spotlight on continuous monitoring (CM) solutions and the rapid development of CM solutions have demanded the characterization of solution performance through regular, rigorous testing using consensus test protocols. This study is the second known implementation of such a protocol involving single-blind controlled testing of 9 CM solutions. Controlled releases of rates (6-7100 g) CH4/h over durations (0.4-10.2 h) under a wind speed range of (0.7-9.9 m/s) were conducted for 11 weeks. Results showed that 4 solutions achieved method detection limits (DL90s) within the tested emission rate range, with all 4 solutions having both the lowest DL90s (3.9 [3.0, 5.5] kg CH4/h to 6.2 [3.7, 16.7] kg CH4/h) and false positive rates (6.9-13.2%), indicating efforts at balancing low sensitivity with a low false positive rate. These results are likely best-case scenario estimates since the test center represents a near-ideal upstream field natural gas operation condition. Quantification results showed wide individual estimate uncertainties, with emissions underestimation and overestimation by factors up to >14 and 42, respectively. Three solutions had >80% of their estimates within a quantification factor of 3 for controlled releases in the ranges of [0.1-1] kg CH4/h and > 1 kg CH4/h. Relative to the study by Bell et al., current solutions performance, as a group, generally improved, primarily due to solutions from the study by Bell et al. that were retested. This result highlights the importance of regular quality testing to the advancement of CM solutions for effective emissions mitigation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    来源归因传统上涉及将流行病学数据与不同的病原体表征方法相结合,包括7基因多位点序列分型(MLST)或血清分型,然而,这些方法的分辨率有限。相比之下,全基因组测序数据提供了可用于归因算法的全基因组的概述。这里,我们应用随机森林(RF)算法来预测人类临床鼠伤寒沙门氏菌(S.鼠伤寒沙门氏菌)和单相变体(单相鼠伤寒沙门氏菌)分离株。为此,我们利用从1,061个实验室证实的人和动物鼠伤寒沙门氏菌和单相鼠伤寒沙门氏菌分离株获得的核心基因组MLST等位基因中的单核苷酸多态性多样性作为RF模型的输入.该算法用于监督学习,将399只动物鼠伤寒沙门氏菌和单相鼠伤寒沙门氏菌分离株分为八个不同的主要来源类别之一,包括常见的牲畜和宠物动物物种:牛,猪,绵羊,其他哺乳动物(宠物:主要是狗和马),肉鸡,图层,火鸡,和野鸟(野鸡,鹌鹑,和鸽子)。当应用于训练组动物分离物时,模型准确性为0.929和κ0.905,而对于测试集动物分离株,从模型中保留了主要的源类信息,准确度为0.779,kappa为0.700.随后,该模型用于将662例人类临床病例分配到8个主要来源类别中.在数据集中,60/399(15.0%)的动物和141/662(21.3%)的人类分离株与已知的鼠伤寒沙门氏菌确定型(DT)104爆发有关。该模型将141个DT104爆发中的两个与人类分离株正确地归因于确定为DT104爆发起源的主要来源类别。在没有克隆DT104动物分离株的情况下运行的模型产生了很大程度上一致的输出(训练集准确性0.989和κ0.985;测试集准确性0.781和κ0.663)。总的来说,我们的研究结果表明,RF作为食源性病原体流行病学追踪和来源归因的合适方法提供了相当大的前景.
    Source attribution has traditionally involved combining epidemiological data with different pathogen characterisation methods, including 7-gene multi locus sequence typing (MLST) or serotyping, however, these approaches have limited resolution. In contrast, whole genome sequencing data provide an overview of the whole genome that can be used by attribution algorithms. Here, we applied a random forest (RF) algorithm to predict the primary sources of human clinical Salmonella Typhimurium (S. Typhimurium) and monophasic variants (monophasic S. Typhimurium) isolates. To this end, we utilised single nucleotide polymorphism diversity in the core genome MLST alleles obtained from 1,061 laboratory-confirmed human and animal S. Typhimurium and monophasic S. Typhimurium isolates as inputs into a RF model. The algorithm was used for supervised learning to classify 399 animal S. Typhimurium and monophasic S. Typhimurium isolates into one of eight distinct primary source classes comprising common livestock and pet animal species: cattle, pigs, sheep, other mammals (pets: mostly dogs and horses), broilers, layers, turkeys, and game birds (pheasants, quail, and pigeons). When applied to the training set animal isolates, model accuracy was 0.929 and kappa 0.905, whereas for the test set animal isolates, for which the primary source class information was withheld from the model, the accuracy was 0.779 and kappa 0.700. Subsequently, the model was applied to assign 662 human clinical cases to the eight primary source classes. In the dataset, 60/399 (15.0%) of the animal and 141/662 (21.3%) of the human isolates were associated with a known outbreak of S. Typhimurium definitive type (DT) 104. All but two of the 141 DT104 outbreak linked human isolates were correctly attributed by the model to the primary source classes identified as the origin of the DT104 outbreak. A model that was run without the clonal DT104 animal isolates produced largely congruent outputs (training set accuracy 0.989 and kappa 0.985; test set accuracy 0.781 and kappa 0.663). Overall, our results show that RF offers considerable promise as a suitable methodology for epidemiological tracking and source attribution for foodborne pathogens.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    有效控制军团病爆发的基础是快速识别致病因素的环境来源的能力,嗜肺军团菌。基因组学彻底改变了病原体监测,但是嗜肺乳杆菌具有复杂的生态学和种群结构,可以限制基于标准核心基因组系统发育的来源推断。这里,我们提出了一种强大的机器学习方法,该方法比当前的核心基因组比较更准确地分配军团病爆发的地理来源。模型是根据534个嗜肺乳杆菌基因组序列开发的,通过详细的病例调查,包括与20例先前报告的军团病暴发相关的149个基因组。我们的分类模型是在仅使用环境嗜肺乳杆菌基因组的交叉验证框架中开发的。临床分离物地理来源的分配显示了模型的高预测敏感性和特异性,在20个爆发群体中,有13个没有假阳性或假阴性,尽管存在爆发内多克隆种群结构。使用常规系统基因组树和基于核心基因组多基因座序列类型等位基因距离的分类方法对相同的534基因组面板进行分析,表明我们的机器学习方法与流行病学信息具有最高的总体分类性能-一致性。我们的多变量统计学习方法最大限度地利用基因组变异数据,因此非常适合支持军团病爆发调查。重要意义识别军团病爆发的来源对于有效控制至关重要。目前的基因组方法,虽然有用,由于嗜肺军团菌复杂的生态和种群结构,病原体。我们的研究引入了一种高性能的机器学习方法,以更准确地对军团病爆发进行地理来源归因。使用环境嗜肺乳杆菌基因组的交叉验证开发,我们的模型显示出优异的预测敏感性和特异性.重要的是,这种新方法优于传统方法,如系统基因组树和核心基因组多位点序列分型,证明在利用基因组变异数据推断爆发源方面更有效。我们的机器学习算法,利用核心和辅助基因组变异,在公共卫生环境中提供重大承诺。通过在军团病暴发中实现快速和精确的来源识别,这种方法有可能加快干预工作并减少疾病传播。
    Fundamental to effective Legionnaires\' disease outbreak control is the ability to rapidly identify the environmental source(s) of the causative agent, Legionella pneumophila. Genomics has revolutionized pathogen surveillance, but L. pneumophila has a complex ecology and population structure that can limit source inference based on standard core genome phylogenetics. Here, we present a powerful machine learning approach that assigns the geographical source of Legionnaires\' disease outbreaks more accurately than current core genome comparisons. Models were developed upon 534 L. pneumophila genome sequences, including 149 genomes linked to 20 previously reported Legionnaires\' disease outbreaks through detailed case investigations. Our classification models were developed in a cross-validation framework using only environmental L. pneumophila genomes. Assignments of clinical isolate geographic origins demonstrated high predictive sensitivity and specificity of the models, with no false positives or false negatives for 13 out of 20 outbreak groups, despite the presence of within-outbreak polyclonal population structure. Analysis of the same 534-genome panel with a conventional phylogenomic tree and a core genome multi-locus sequence type allelic distance-based classification approach revealed that our machine learning method had the highest overall classification performance-agreement with epidemiological information. Our multivariate statistical learning approach maximizes the use of genomic variation data and is thus well-suited for supporting Legionnaires\' disease outbreak investigations.IMPORTANCEIdentifying the sources of Legionnaires\' disease outbreaks is crucial for effective control. Current genomic methods, while useful, often fall short due to the complex ecology and population structure of Legionella pneumophila, the causative agent. Our study introduces a high-performing machine learning approach for more accurate geographical source attribution of Legionnaires\' disease outbreaks. Developed using cross-validation on environmental L. pneumophila genomes, our models demonstrate excellent predictive sensitivity and specificity. Importantly, this new approach outperforms traditional methods like phylogenomic trees and core genome multi-locus sequence typing, proving more efficient at leveraging genomic variation data to infer outbreak sources. Our machine learning algorithms, harnessing both core and accessory genomic variation, offer significant promise in public health settings. By enabling rapid and precise source identification in Legionnaires\' disease outbreaks, such approaches have the potential to expedite intervention efforts and curtail disease transmission.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    水产养殖位于城市河口,可能发生其他人为活动的地方,对插入它们的环境有影响,也可能受到影响,即通过抗菌素抗性基因的交换。后者可能最终,通过食物链,代表人类抗性组的抗性基因的来源。在对位于城市河口的水产养殖沉积物中存在抗性基因的探索性研究中,应用两种机器学习模型来预测牡蛎和金头鱼鱼水产养殖沉积物中34个抗性观察值的来源,位于萨渡河和利马河的河口和阿威罗泻湖,以及在Tejo河口的沉积物中,日本蛤仔和贻贝是在那里收集的。第一个模型包括了所有的34个电阻体,共有53种不同的抗菌素抗性基因用作来源预测因子。来源归属的最重要的抗菌基因是四环素抗性基因tet(51)和tet(L);氨基糖苷抗性基因aadA6;β-内酰胺抗性基因blaBRO-2;和氨酚抗性基因cmx_1。第二个模型只包括牡蛎沉积物电阻,共有30个抗菌素耐药基因作为预测因子。用于来源归属的最重要的抗微生物基因是氨基糖苷抗性基因aadA6,其次是四环素基因tet(L)和tet(33)。这项探索性研究提供了有关葡萄牙集约化和半集约化水产养殖中抗生素抗性基因的初步信息,帮助认识到环境控制对保持水产养殖场的完整性和可持续性的重要性。
    Aquaculture located in urban river estuaries, where other anthropogenic activities may occur, has an impact on and may be affected by the environment where they are inserted, namely by the exchange of antimicrobial resistance genes. The latter may ultimately, through the food chain, represent a source of resistance genes to the human resistome. In an exploratory study of the presence of resistance genes in aquaculture sediments located in urban river estuaries, two machine learning models were applied to predict the source of 34 resistome observations in the aquaculture sediments of oysters and gilt-head sea bream, located in the estuaries of the Sado and Lima Rivers and in the Aveiro Lagoon, as well as in the sediments of the Tejo River estuary, where Japanese clams and mussels are collected. The first model included all 34 resistomes, amounting to 53 different antimicrobial resistance genes used as source predictors. The most important antimicrobial genes for source attribution were tetracycline resistance genes tet(51) and tet(L); aminoglycoside resistance gene aadA6; beta-lactam resistance gene blaBRO-2; and amphenicol resistance gene cmx_1. The second model included only oyster sediment resistomes, amounting to 30 antimicrobial resistance genes as predictors. The most important antimicrobial genes for source attribution were the aminoglycoside resistance gene aadA6, followed by the tetracycline genes tet(L) and tet(33). This exploratory study provides the first information about antimicrobial resistance genes in intensive and semi-intensive aquaculture in Portugal, helping to recognize the importance of environmental control to maintain the integrity and the sustainability of aquaculture farms.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    弯曲杆菌病在世界范围内引起人类的重大疾病负担,并且是芬兰最常见的人畜共患胃肠炎。为了确定国内弯曲杆菌感染的感染源,我们分析了2004-2021年芬兰传染病注册中心(FIDR)的弯曲杆菌病例数据和2010-2021年国家食源性和水源性暴发疫情注册中心(FWO注册中心)的疫情数据,并于2022年7-8月进行了病例对照试验研究(256例病例和756例对照),并采用全基因组测序(WGS)进行来源归属和患者样本分析.在FIDR中,41%的病例缺乏旅行史信息。基于病例对照研究,我们估计在所有案件中,39%来自国内。使用WGS,在185例国内病例中观察到22组两个或两个以上病例,这些都没有报告到FWO登记册。基于本病例对照研究和来源归因,家禽是芬兰弯曲杆菌病的重要来源。对患者进行更广泛的采样和比较,食物,动物,和环境分离需要估计其他来源的重要性。在芬兰,弯曲杆菌病通常比FIDR通知显示的更多来自家庭。为了确定国内案件,旅行信息应包括在FIDR通知中,并改善爆发检测,所有国内患者分离株都应进行测序.
    Campylobacteriosis causes a significant disease burden in humans worldwide and is the most common type of zoonotic gastroenteritis in Finland. To identify infection sources for domestic Campylobacter infections, we analyzed Campylobacter case data from the Finnish Infectious Disease Register (FIDR) in 2004-2021 and outbreak data from the National Food- and Waterborne Outbreak Register (FWO Register) in 2010-2021, and conducted a pilot case-control study (256 cases and 756 controls) with source attribution and patient sample analysis using whole-genome sequencing (WGS) in July-August 2022. In the FIDR, 41% of the cases lacked information on travel history. Based on the case-control study, we estimated that of all cases, 39% were of domestic origin. Using WGS, 22 clusters of two or more cases were observed among 185 domestic cases, none of which were reported to the FWO register. Based on this case-control study and source attribution, poultry is an important source of campylobacteriosis in Finland. More extensive sampling and comparison of patient, food, animal, and environmental isolates is needed to estimate the significance of other sources. In Finland, campylobacteriosis is more often of domestic origin than FIDR notifications indicate. To identify the domestic cases, travel information should be included in the FIDR notification, and to improve outbreak detection, all domestic patient isolates should be sequenced.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    沙门氏菌是人类食源性疾病的主要病因之一。它是全世界特有的,将不同的动物和动物食品作为感染的宿主和媒介。确定沙门氏菌的动物宿主和潜在的传播途径对于预防和控制至关重要。源归因有很多方法,每个都使用不同的统计模型和数据流。一些旨在识别动物水库,而其他人则旨在确定暴露发生的点。随着全基因组测序(WGS)技术的进步,新的来源归因模型将极大地受益于WGS获得的鉴别力。这篇综述讨论了一些关键的来源归因方法及其数学和统计工具。我们还重点介绍了利用WGS进行来源归因的最新研究,并讨论了开发新WGS方法的开放问题和挑战。我们的目标是更好地了解这些方法的现状,并应用于沙门氏菌和其他食源性病原体,这些病原体是家禽和人类部门的常见疾病来源。
    Salmonella is one of the main causes of human foodborne illness. It is endemic worldwide, with different animals and animal-based food products as reservoirs and vehicles of infection. Identifying animal reservoirs and potential transmission pathways of Salmonella is essential for prevention and control. There are many approaches for source attribution, each using different statistical models and data streams. Some aim to identify the animal reservoir, while others aim to determine the point at which exposure occurred. With the advance of whole-genome sequencing (WGS) technologies, new source attribution models will greatly benefit from the discriminating power gained with WGS. This review discusses some key source attribution methods and their mathematical and statistical tools. We also highlight recent studies utilizing WGS for source attribution and discuss open questions and challenges in developing new WGS methods. We aim to provide a better understanding of the current state of these methodologies with application to Salmonella and other foodborne pathogens that are common sources of illness in the poultry and human sectors.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:基于基因组数据的机器学习工具有望用于对食源性细菌(如单核细胞增生李斯特菌)进行来源归属的实时监测活动。鉴于机器学习实践的异质性,我们的目的是确定那些影响通常的保持方法与重复k折交叉验证方法的源预测性能的因素.
    方法:根据几个基因组指标建立了大量已知来源的1.100个单核细胞增生李斯特菌基因组,以确保基因组图谱的真实性和完整性。基于这些基因组概况(即7个基因座等位基因,核心等位基因,辅助基因,核心SNP和pankmers),我们开发了一个多功能的工作流程,评估训练数据集拆分的不同组合的预测性能(即50、60、70、80和90%),数据预处理(即有或没有接近零的方差去除),和学习模型(即BLR,ERT,射频,SGB,SVM和XGB)。性能指标包括准确性,科恩的卡帕,F1分数,接收器工作特性曲线的曲线下面积,精度召回曲线或精度召回增益曲线,和执行时间。
    结果:来自辅助基因和pankmers的测试平均准确度明显高于来自核心等位基因或SNP的准确度。虽然70%和80%的训练数据集拆分的准确性没有显著差异,来自80%的比例显着高于其他测试比例。接近零的方差去除不允许产生7个基因座等位基因的结果,没有显著影响核心等位基因的准确性,辅助基因和pankmers,并显著降低核心SNP的准确性。SVM和XGB模型彼此之间的准确性没有显着差异,并且比BLR达到了更高的准确性。SGB,ERT和RF,在这个数量级上。然而,SVM模型比XGB模型需要更多的计算能力,特别是对于大量的描述符,如核心SNP和pankmers。
    结论:除了关于基于基因组数据的单核细胞增生李斯特菌来源归因的机器学习实践的建议之外,本研究还提供了一个免费的工作流程来解决其他平衡或不平衡的多类表型来自其他微生物的二进制和分类基因组谱,而无需修改源代码。
    BACKGROUND: Genomic data-based machine learning tools are promising for real-time surveillance activities performing source attribution of foodborne bacteria such as Listeria monocytogenes. Given the heterogeneity of machine learning practices, our aim was to identify those influencing the source prediction performance of the usual holdout method combined with the repeated k-fold cross-validation method.
    METHODS: A large collection of 1 100 L. monocytogenes genomes with known sources was built according to several genomic metrics to ensure authenticity and completeness of genomic profiles. Based on these genomic profiles (i.e. 7-locus alleles, core alleles, accessory genes, core SNPs and pan kmers), we developed a versatile workflow assessing prediction performance of different combinations of training dataset splitting (i.e. 50, 60, 70, 80 and 90%), data preprocessing (i.e. with or without near-zero variance removal), and learning models (i.e. BLR, ERT, RF, SGB, SVM and XGB). The performance metrics included accuracy, Cohen\'s kappa, F1-score, area under the curves from receiver operating characteristic curve, precision recall curve or precision recall gain curve, and execution time.
    RESULTS: The testing average accuracies from accessory genes and pan kmers were significantly higher than accuracies from core alleles or SNPs. While the accuracies from 70 and 80% of training dataset splitting were not significantly different, those from 80% were significantly higher than the other tested proportions. The near-zero variance removal did not allow to produce results for 7-locus alleles, did not impact significantly the accuracy for core alleles, accessory genes and pan kmers, and decreased significantly accuracy for core SNPs. The SVM and XGB models did not present significant differences in accuracy between each other and reached significantly higher accuracies than BLR, SGB, ERT and RF, in this order of magnitude. However, the SVM model required more computing power than the XGB model, especially for high amount of descriptors such like core SNPs and pan kmers.
    CONCLUSIONS: In addition to recommendations about machine learning practices for L. monocytogenes source attribution based on genomic data, the present study also provides a freely available workflow to solve other balanced or unbalanced multiclass phenotypes from binary and categorical genomic profiles of other microorganisms without source code modifications.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    弯曲杆菌是许多国家食物中毒的常见原因,肉鸡是主要来源。与常规饲养的肉鸡相比,有机和自由饲养的肉鸡更频繁地出现弯曲杆菌阳性,并且可能构成更高的人类感染风险。有机和自由放养的肉鸡可能会暴露于环境水库和牲畜农场的弯曲杆菌,但是这些来源的相对重要性是未知的。该研究的目的是描述从自由放养/有机肉鸡中收集的弯曲杆菌分离株与从常规肉鸡和其他动物宿主中分离出的弯曲杆菌分离株的遗传多样性之间的相似性和差异(牛,猪,和狗)在丹麦对自由放养肉鸡的弯曲杆菌的水库来源进行推断。应用的汇总监测数据包括2015年至2017年和2018年至2021年采样的测序的弯曲杆菌分离株。数据包括来自自由范围的1,102个分离株(n=209),常规肉鸡(n=577),牛(n=261),猪(n=30),和狗(n=25)。分离株是从两种粪便中培养出来的(n=434),食物矩阵(n=569),或来源未公开(n=99)。空肠弯曲菌(94.5%)占主导地位,亚型分析发现170种不同的序列类型(ST)分为75种克隆复合物(CC)。结果表明,CC-21和CC-45是肉鸡中最常见的CC。调查来源中CC之间的关系表明,大多数动物共享不同的CC,但不是猪。自由放养肉鸡的ST谱与常规肉鸡的ST谱最相似,狗和牛,按这个顺序。常规肉鸡和牛之间的相似性比常规肉鸡和散养肉鸡之间的相似性更强。结果表明,牛可能是常规和自由放养肉鸡的空肠弯曲菌的合理水库,并且常规肉鸡是自由放养肉鸡的可能来源,或者反映了适应相同宿主环境的分离株的优势。汇总的数据为自由放养肉鸡的弯曲杆菌来源的流行病学提供了有价值的见解,但在目标区域内对来自不同来源的分离株进行有时间限制的采样将具有更高的预测价值.
    Campylobacter is a common cause of food poisoning in many countries, with broilers being the main source. Organic and free-range broilers are more frequently Campylobacter-positive than conventionally raised broilers and may constitute a higher risk for human infections. Organic and free-range broilers may get exposed to Campylobacter from environmental reservoirs and livestock farms, but the relative importance of these sources is unknown. The aim of the study was to describe similarities and differences between the genetic diversity of the Campylobacter isolates collected from free-range/organic broilers with those isolated from conventional broilers and other animal hosts (cattle, pigs, and dogs) in Denmark to make inferences about the reservoir sources of Campylobacter to free-range broilers. The applied aggregated surveillance data consisted of sequenced Campylobacter isolates sampled in 2015 to 2017 and 2018 to 2021. The data included 1,102 isolates from free-range (n = 209), conventional broilers (n = 577), cattle (n = 261), pigs (n = 30), and dogs (n = 25). The isolates were cultivated from either fecal material (n = 434), food matrices (n = 569), or of nondisclosed origin (n = 99). Campylobacter jejuni (94.5%) dominated and subtyping analysis found 170 different sequence types (STs) grouped into 75 clonal complexes (CCs). The results suggest that CC-21 and CC-45 are the most frequent CCs found in broilers. The relationship between the CCs in the investigated sources showed that the different CCs were shared by most of the animals, but not pigs. The ST-profiles of free-range broilers were most similar to that of conventional broilers, dogs and cattle, in that order. The similarity was stronger between conventional broilers and cattle than between conventional and free-range broilers. The results suggest that cattle may be a plausible reservoir of C. jejuni for conventional and free-range broilers, and that conventional broilers are a possible source for free-range broilers or reflect a dominance of isolates adapted to the same host environment. Aggregated data provided valuable insight into the epidemiology of Campylobacter sources for free-range broilers, but time-limited sampling of isolates from different sources within a targeted area would hold a higher predictive value.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号