关键词: Machine learning Parameter selection Statistical analysis Uncertainty Water quality index

Mesh : Water Quality Machine Learning Environmental Monitoring / methods Models, Theoretical

来  源:   DOI:10.1016/j.watres.2024.121777

Abstract:
The determination of water quality heavily depends on the selection of parameters recorded from water samples for the water quality index (WQI). Data-driven methods, including machine learning models and statistical approaches, are frequently used to refine the parameter set for four main reasons: reducing cost and uncertainty, addressing the eclipsing problem, and enhancing the performance of models predicting the WQI. Despite their widespread use, there is a noticeable gap in comprehensive reviews that systematically examine previous studies in this area. Such reviews are essential to assess the validity of these objectives and to demonstrate the effectiveness of data-driven methods in achieving these goals. This paper sets out with two primary aims: first, to provide a review of the existing literature on methods for selecting parameters. Second, it seeks to delineate and evaluate the four principal motivations for parameter selection identified in the literature. This manuscript categorizes existing studies into two methodological groups for refining parameters: one focuses on preserving information within the dataset, and another ensures consistent prediction using the full set of parameters. It characterizes each group and evaluates how effectively each approach meets the four predefined objectives. The study presents that the minimal WQI approach, common to both categories, is the only approach that has successfully reduced recording costs. Nonetheless, it notes that simply reducing the number of parameters does not guarantee cost savings. Furthermore, the group of studies classified as preserving information within the dataset has demonstrated potential to decrease the eclipsing problem, whereas studies in the consistent prediction group have not been able to mitigate this issue. Additionally, since data-driven approaches still rely on the initial parameters chosen by experts, they do not eliminate the need for expert judgment. The study further points out that the WQI formula is a straightforward and expedient tool for assessing water quality. Consequently, the paper argues that employing machine learning solely to reduce the number of parameters to enhance WQI prediction is not a standalone solution. Rather, this objective should be integrated with a more comprehensive set of research goals. The critical analysis of research objectives and the characterization of previous studies lay the groundwork for future research. This groundwork will enable subsequent studies to evaluate how their proposed methods can effectively achieve these objectives.
摘要:
水质的确定在很大程度上取决于从水质指数(WQI)的水样中记录的参数的选择。数据驱动方法,包括机器学习模型和统计方法,经常用于细化参数集,主要有四个原因:降低成本和不确定性,解决日食问题,并提高预测WQI的模型的性能。尽管它们广泛使用,在系统审查该领域先前研究的综合审查中,存在明显的差距。这种审查对于评估这些目标的有效性和证明数据驱动方法在实现这些目标方面的有效性至关重要。本文主要有两个目的:第一,对现有的参数选择方法文献进行综述。第二,它试图描述和评估文献中确定的参数选择的四个主要动机。本手稿将现有的研究分为两个方法组,以完善参数:一个侧重于保留数据集中的信息,和另一个使用完整的参数集确保一致的预测。它表征每个组,并评估每种方法如何有效地满足四个预定义目标。研究表明,最小WQI方法,这两个类别的共同点,是成功降低记录成本的唯一方法。尽管如此,它指出,简单地减少参数的数量并不能保证节约成本。此外,被归类为在数据集中保留信息的一组研究已经证明了减少日食问题的潜力,而一致预测组的研究未能缓解这一问题。此外,由于数据驱动的方法仍然依赖于专家选择的初始参数,他们并没有消除专家判断的需要。该研究进一步指出,WQI公式是评估水质的直接便捷工具。因此,本文认为,仅采用机器学习来减少参数数量以增强WQI预测并不是一个独立的解决方案。相反,这一目标应该与一套更全面的研究目标相结合。对研究目标的批判性分析和对以往研究的表征为未来的研究奠定了基础。这项基础工作将使后续研究能够评估他们提出的方法如何有效地实现这些目标。
公众号