%0 Journal Article %T Classification machine learning to detect de facto reuse and cyanobacteria at a drinking water intake. %A Clements E %A Thompson KA %A Hannoun D %A Dickenson ERV %J Sci Total Environ %V 948 %N 0 %D 2024 Oct 20 %M 38992351 %F 10.753 %R 10.1016/j.scitotenv.2024.174690 %X Harmful algal blooms (HABs) or higher levels of de facto water reuse (DFR) can increase the levels of certain contaminants at drinking water intakes. Therefore, the goal of this study was to use multi-class supervised machine learning (SML) classification with data collected from six online instruments measuring fourteen total water quality parameters to detect cyanobacteria (corresponding to approximately 950 cells/mL, 2900 cells/mL, and 8600 cells/mL) or DFR (0.5, 1 and 2 % of wastewater effluent) events in the raw water entering an intake. Among 56 screened models from the caret package in R, four (mda, LogitBoost, bagFDAGCV, and xgbTree) were selected for optimization. mda had the greatest testing set accuracy, 98.09 %, after optimization with 7 false alerts. Some of the most important water parameters for the different models were phycocyanin-like fluorescence, UVA254, and pH. SML could detect algae blending events (estimated <9000 cells/mL) due in part to the phycocyanin-like fluorescence sensor. UVA254 helped identify higher concentrations of DFR. These results show that multi-class SML classification could be used at drinking water intakes in conjunction with online instrumentation to detect and differentiate HABs and DFR events. This could be used to create alert systems for the water utilities at the intake, rather than the finished water, so any adjustment to the treatment process could be implemented.