{Reference Type}: Journal Article {Title}: Multi-task pretrained language model with novel application domains enables more comprehensive health and ecological toxicity prediction. {Author}: Tan Z;Zhao Y;Lin K;Zhou T; {Journal}: J Hazard Mater {Volume}: 477 {Issue}: 0 {Year}: 2024 Sep 15 {Factor}: 14.224 {DOI}: 10.1016/j.jhazmat.2024.135265 {Abstract}: In silico models for screening substances of healthy and ecological concern are essential for effective chemical management. However, current data-driven toxicity prediction models confront formidable challenges related to expressive capacity, data scarcity, and reliability issues. Thus, this study introduces TOX-BERT, a SMILES-based pretrained model for screening health and ecological toxicity. Results show that masked atom recovery pretraining and multi-task learning offer promising solutions to enhance model capacity and address data scarcity issues. Two novel application domain (AD) parameters, termed PCA-AD and LDS, were proposed to improve prediction reliability of TOX-BERT with accuracy surpassing 90 % and mean absolute error (MAE) below 0.52. TOX-BERT was applied to 18,905 IECSC chemicals, revealing distinct toxicity relationships that align with experimental studies such as those between cardiotoxicity and acute ecotoxicity. In addition to previous PBT screening, 156 potential high-risk chemicals for specific endpoint were identified covering 7 categories. Furthermore, a SMILES-based toxicity site detection approach was developed for structural toxicity analysis. These advancements carry profound implications to address challenges faced by current data-driven toxicity prediction models. TOX-BERT emerges as a valuable tool for more comprehensive, reliable, and applicable predictions of health and ecological toxicity in chemical risk assessment and management.