环境健康研究越来越依赖于数据科学和计算方法,这些方法可以更有效地从复杂的数据集中提取信息。可以利用数据科学和计算方法来更好地识别环境中压力源暴露与人类疾病结果之间的关系。代表保护和改善全球公共卫生所需的关键信息。尽管如此,围绕研究人员对这些计算机模拟方法的培训仍然存在关键差距。我们旨在通过开发智能和机器登录(TAME)工具包来解决这一差距,促进受训者驱动的数据生成,管理,和分析方法对环境健康研究中的“TAME”数据进行分析。开发了培训模块,以提供应用程序驱动的数据组织和分析方法示例,可用于解决环境健康问题。这些模块的目标受众包括学生,学士后和博士后的学员,和专业人士有兴趣扩大他们的技能,包括与环境卫生相关的数据分析方法的最新进展,毒理学,曝光科学,流行病学,和生物信息学/化学信息学。模块由研究共同作者使用带注释的脚本开发,并在GitHubBookdown网站中分为三章。模块的第一章侧重于入门数据科学,其中包括以下主题:在R环境中设置R/RStudio和编码;数据组织基础;查找和可视化数据趋势;高维数据可视化;可访问性,互操作性,和可重用性(FAIR)数据管理实践。模块的第二章结合了化学-生物分析和预测建模,跨越以下方法:剂量反应建模;机器学习和预测建模;混合物分析;-组学分析;毒物动力学建模;和读取毒性预测。最后一章的模块进行了组织,以提供有关环境健康数据库挖掘和集成的示例,包括化学暴露,健康结果,和环境正义指标。培训模块和相关数据可在线公开获得(https://uncsrp.github.io/Data-Analysis-Training-Modules/)。一起,该资源提供了独特的机会,可以获得适用于21世纪科学和环境健康的当前数据分析方法的入门培训。
Research in environmental health is becoming increasingly reliant upon data science and computational methods that can more efficiently extract information from complex datasets. Data science and computational methods can be leveraged to better identify relationships between exposures to stressors in the environment and human disease outcomes, representing critical information needed to protect and improve global public health. Still, there remains a critical gap surrounding the training of researchers on these in silico methods. We aimed to address this gap by developing the inTelligence And Machine lEarning (TAME) Toolkit, promoting trainee-driven data generation, management, and analysis methods to \"TAME\" data in environmental health studies. Training modules were developed to provide applications-driven examples of data organization and analysis methods that can be used to address environmental health questions. Target audiences for these modules include students, post-baccalaureate and post-doctorate trainees, and professionals that are interested in expanding their skillset to include recent advances in data analysis methods relevant to environmental health, toxicology, exposure science, epidemiology, and bioinformatics/cheminformatics. Modules were developed by study coauthors using annotated script and were organized into three chapters within a GitHub Bookdown site. The first chapter of modules focuses on introductory data science, which includes the following topics: setting up R/RStudio and coding in the R environment; data organization basics; finding and visualizing data trends; high-dimensional data visualizations; and Findability, Accessibility, Interoperability, and Reusability (FAIR) data management practices. The second chapter of modules incorporates chemical-biological analyses and predictive modeling, spanning the following methods: dose-response modeling; machine learning and predictive modeling; mixtures analyses; -omics analyses; toxicokinetic modeling; and read-across toxicity predictions. The last chapter of modules was organized to provide examples on environmental health database mining and integration, including chemical exposure, health outcome, and environmental justice indicators. Training modules and associated data are publicly available online (https://uncsrp.github.io/Data-Analysis-Training-Modules/). Together, this resource provides unique opportunities to obtain introductory-level training on current data analysis methods applicable to 21st century science and environmental health.