关键词: FIC-SS-ELM NIRS black tea catechin content prediction wavelength selection

Mesh : Catechin / analysis chemistry analogs & derivatives Spectroscopy, Near-Infrared / methods Tea / chemistry Algorithms Least-Squares Analysis Machine Learning

来  源:   DOI:10.3390/s24113362   PDF(Pubmed)

Abstract:
As a non-destructive, fast, and cost-effective technique, near-infrared (NIR) spectroscopy has been widely used to determine the content of bioactive components in tea. However, due to the similar chemical structures of various catechins in black tea, the NIR spectra of black tea severely overlap in certain bands, causing nonlinear relationships and reducing analytical accuracy. In addition, the number of NIR spectral wavelengths is much larger than that of the modeled samples, and the small-sample learning problem is rather typical. These issues make the use of NIRS to simultaneously determine black tea catechins challenging. To address the above problems, this study innovatively proposed a wavelength selection algorithm based on feature interval combination sensitivity segmentation (FIC-SS). This algorithm extracts wavelengths at both coarse-grained and fine-grained levels, achieving higher accuracy and stability in feature wavelength extraction. On this basis, the study built four simultaneous prediction models for catechins based on extreme learning machines (ELMs), utilizing their powerful nonlinear learning ability and simple model structure to achieve simultaneous and accurate prediction of catechins. The experimental results showed that for the full spectrum, the ELM model has better prediction performance than the partial least squares model for epicatechin (EC), epicatechin gallate (ECG), epigallocatechin (EGC), and epigallocatechin gallate (EGCG). For the feature wavelengths, our proposed FIC-SS-ELM model enjoys higher prediction performance than ELM models based on other wavelength selection algorithms; it can simultaneously and accurately predict the content of EC (Rp2 = 0.91, RMSEP = 0.019), ECG (Rp2 = 0.96, RMSEP = 0.11), EGC (Rp2 = 0.97, RMSEP = 0.15), and EGCG (Rp2 = 0.97, RMSEP = 0.35) in black tea. The results of this study provide a new method for the quantitative determination of the bioactive components of black tea.
摘要:
作为一种非破坏性的,快,和具有成本效益的技术,近红外(NIR)光谱技术已被广泛应用于茶叶中生物活性成分的测定。然而,由于红茶中各种儿茶素的化学结构相似,红茶的近红外光谱在某些波段严重重叠,造成非线性关系,降低分析精度。此外,近红外光谱波长的数量远远大于建模样品的数量,小样本学习问题相当典型。这些问题使得使用NIRS同时测定红茶儿茶素具有挑战性。为解决上述问题,本研究创新性地提出了一种基于特征区间组合灵敏度分割的波长选择算法(FIC-SS)。该算法提取了粗粒度和细粒度水平的波长,在特征波长提取中实现更高的精度和稳定性。在此基础上,这项研究建立了四种基于极限学习机(ELM)的儿茶素同步预测模型,利用其强大的非线性学习能力和简单的模型结构,实现对儿茶素的同步准确预测。实验结果表明,对于全光谱,ELM模型对表儿茶素(EC)的预测性能优于偏最小二乘模型,表儿茶素没食子酸酯(ECG),表没食子儿茶素(EGC),和表没食子儿茶素没食子酸酯(EGCG)。对于特征波长,我们提出的FIC-SS-ELM模型比基于其他波长选择算法的ELM模型具有更高的预测性能;它可以同时准确地预测EC的含量(Rp2=0.91,RMSEP=0.019),心电图(Rp2=0.96,RMSEP=0.11),EGC(Rp2=0.97,RMSEP=0.15),红茶中的EGCG(Rp2=0.97,RMSEP=0.35)。本研究结果为红茶中生物活性成分的定量测定提供了一种新方法。
公众号