关键词: Taraxacum kok-saghyz CNN LightGBM Natural rubber Near-infrared spectroscopy PLS RF Rapid detection

来  源:   DOI:10.1186/s13007-024-01183-6   PDF(Pubmed)

Abstract:
BACKGROUND: Taraxacum kok-saghyz Rodin (TKS) is a highly potential source of natural rubber (NR) due to its wide range of suitable planting areas, strong adaptability, and suitability for mechanized planting and harvesting. However, current methods for detecting NR content are relatively cumbersome, necessitating the development of a rapid detection model. This study used near-infrared spectroscopy technology to establish a rapid detection model for NR content in TKS root segments and powder samples. The K445 strain at different growth stages within a year and 129 TKS samples hybridized with dandelion were used to obtain their near-infrared spectral data. The rubber content in the root of the samples was detected using the alkaline boiling method. The Monte Carlo sampling method (MCS) was used to filter abnormal data from the root segments of TKS and powder samples, respectively. The SPXY algorithm was used to divide the training set and validation set in a 3:1 ratio. The original spectrum was preprocessed using moving window smoothing (MWS), standard normalized variate (SNV), multiplicative scatter correction (MSC), and first derivative (FD) algorithms. The competitive adaptive reweighted sampling (CARS) algorithm and the corresponding chemical characteristic bands of NR were used to screen the bands. Partial least squares (PLS), random forest (RF), Lightweight gradient augmentation machine (LightGBM), and convolutional neural network (CNN) algorithms were employed to establish a model using the optimal spectral processing method for three different bands: full band, CARS algorithm, and chemical characteristic bands corresponding to NR. The model with the best predictive performance for high rubber content intervals (rubber content > 15%) was identified.
RESULTS: The results indicated that the optimal rubber content prediction models for TKS root segments and powder samples were MWS-FD CASR-RF and MWS-FD chemical characteristic band RF, respectively. Their respective R P 2 , RMSEP, and RPDP values were 0.951, 0.979, 1.814, 1.133, 4.498, and 6.845. In the high rubber content range, the model based on the LightGBM algorithm had the best prediction performance, with the RMSEP of the root segments and powder samples being 0.752 and 0.918, respectively.
CONCLUSIONS: This research indicates that dried TKS root powder samples are more appropriate for constructing a rubber content prediction model than segmented samples, and the predictive capability of root powder samples is superior to that of root segmented samples. Especially in the elevated rubber content range, the model formulated using the LightGBM algorithm has superior predictive performance, which could offer a theoretical basis for the rapid detection technology of TKS content in the future.
摘要:
背景:Taraxacumkok-saghyzRodin(TKS)是天然橡胶(NR)的高度潜在来源,适应性强,以及机械化种植和收获的适用性。然而,当前检测NR含量的方法相对繁琐,需要开发快速检测模型。本研究利用近红外光谱技术建立了TKS根段和粉末样品中NR含量的快速检测模型。使用一年内不同生长阶段的K445菌株和与蒲公英杂交的129个TKS样品获得其近红外光谱数据。采用碱沸法检测样品根部的橡胶含量。采用蒙特卡罗抽样方法(MCS)对TKS和粉末样本的根段进行异常数据过滤,分别。使用SPXY算法以3:1的比率划分训练集和验证集。使用移动窗口平滑(MWS)对原始光谱进行预处理,标准归一化变量(SNV),乘法散射校正(MSC),和一阶导数(FD)算法。采用竞争自适应重加权采样(CARS)算法和NR相应的化学特征波段进行波段筛选。偏最小二乘(PLS),随机森林(RF),轻量级梯度增强机(LightGBM),采用卷积神经网络(CNN)算法,针对全波段,CARS算法,和对应于NR的化学特征带。确定了对于高橡胶含量区间(橡胶含量>15%)具有最佳预测性能的模型。
结果:结果表明,TKS根段和粉末样品的最佳橡胶含量预测模型为MWS-FDCASR-RF和MWS-FD化学特征带RF,分别。他们各自的RP2,RMSEP,RPDP值为0.951、0.979、1.814、1.133、4.498和6.845。在高橡胶含量范围内,基于LightGBM算法的模型具有最佳的预测性能,根段和粉末样品的RMSEP分别为0.752和0.918。
结论:这项研究表明,干燥的TKS根粉样品比分段样品更适合构建橡胶含量预测模型,根粉样品的预测能力优于根分段样品。特别是在升高的橡胶含量范围内,使用LightGBM算法制定的模型具有优越的预测性能,为未来TKS内容的快速检测技术提供了理论依据。
公众号