关键词: Bayesian optimization Chemometrics Meta learner Model stacking Near-Infrared Spectroscopy (NIRS) Partial Least Squares Regression (PLSR)

来  源:   DOI:10.1016/j.saa.2024.124492

Abstract:
Fourier transform near-infrared (FT-NIR) spectroscopy is a versatile and non-destructive analytical tool widely utilized in industries such as food, pharmaceuticals, and agriculture. While traditional FT-NIR instruments pose limitations in terms of cost and complexity, the advent of portable and affordable systems like NeoSpectra Scanners has broadened accessibility. Partial Least Squares Regression (PLSR) stands as an industry-standard method in Chemometrics for analyzing chemical compositions. This work addresses optimizing PLSR models in FT-NIR spectroscopy, focusing on enhancing accuracy and adaptability in material analysis. Unlike traditional PLSR models which often rely on grid searching a limited number of parameters, such as latent variables, the presented approach effectively expands the parameter space. A novel framework combining Bayesian search and stacking techniques is introduced to enable more customization while ensuring time and performance efficiency, along with automation in model development. Bayesian search efficiently explores hyperparameters space, enabling faster convergence to optimal model settings without exhaustive exploration. The proposed stacked model leverages learned knowledge from the top-performing PLSR models optimized through Bayesian methods, amalgamating a unified and potent body of knowledge. Bayesian-stacked models are compared with PLSR models that use grid search for a limited parameter set. Findings show a marked improvement in model performance: a 51.5% reduction in Root Mean Square Error (RMSE) for the training dataset and a 26.1% reduction for the testing dataset, alongside a 10.9% increase in the correlation coefficient square (R2) for the training dataset and a 10.4% increase for the testing dataset. Notably, Bayesian search reduces the model optimization time by approximately 90% compared with the grid search. Furthermore, when addressing instrumental variations, the models demonstrate an additional improvement, evident in the average reduction of 24.1% in the mean range of prediction. Overall, results demonstrate that the presented approach not only increases the prediction accuracy but also offers a more efficient, automated and robust solution for diverse spectroscopic applications.
摘要:
傅里叶变换近红外(FT-NIR)光谱是一种多功能和非破坏性的分析工具,广泛应用于食品等行业。制药,和农业。虽然传统的FT-NIR工具在成本和复杂性方面存在限制,像NeoSpectra扫描仪这样的便携式和负担得起的系统的出现扩大了可访问性。偏最小二乘回归(PLSR)是化学计量学中用于分析化学成分的行业标准方法。这项工作致力于优化FT-NIR光谱中的PLSR模型,重点提高材料分析的准确性和适应性。与传统的PLSR模型不同,传统的PLSR模型通常依赖于网格搜索有限数量的参数,例如潜在变量,该方法有效地扩展了参数空间。引入了结合贝叶斯搜索和堆叠技术的新颖框架,以实现更多的定制,同时确保时间和性能效率,随着模型开发的自动化。贝叶斯搜索有效地探索超参数空间,无需详尽的探索,即可更快地收敛到最佳模型设置。所提出的堆叠模型利用了从通过贝叶斯方法优化的表现最好的PLSR模型中学到的知识,合并一个统一而有力的知识体系。将贝叶斯堆叠模型与使用网格搜索有限参数集的PLSR模型进行比较。研究结果表明,模型性能有了显著提高:训练数据集均方根误差(RMSE)减少了51.5%,测试数据集减少了26.1%,训练数据集的相关系数平方(R2)增加了10.9%,测试数据集增加了10.4%。值得注意的是,与网格搜索相比,贝叶斯搜索可将模型优化时间减少约90%。此外,当解决仪器变化时,模型展示了额外的改进,在预测的平均范围内平均降低了24.1%。总的来说,结果表明,该方法不仅提高了预测精度,而且提供了更有效的,不同的光谱应用的自动化和强大的解决方案。
公众号