关键词: BILSTM Quantitative analysis SPI Secondary structure Two-dimensional correlation infrared spectroscopy pH perturbation

Mesh : Soybean Proteins / chemistry Protein Structure, Secondary Glycine max / chemistry Hydrogen-Ion Concentration Spectrophotometry, Infrared

来  源:   DOI:10.1016/j.foodchem.2024.139074

Abstract:
The infrared spectroscopy (IR) signal of protein is prone to being covered by impurity signals, and the accuracy of the secondary structure content calculated using spectral data is poor. To tackle this challenge, a rapid high-precision quantitative model for protein secondary structure was proposed. Firstly, a two-dimensional correlation calculation was performed based on 60 groups of soybean protein isolates (SPI) infrared spectroscopy data, resulting in a two-dimensional correlation infrared spectroscopy (2DCOS-IR). Subsequently, the optimal characteristic bands of the four secondary structures were extracted from the 2DCOS-IR. Ultimately, partial least squares (PLS), long short-term memory (LSTM), and bidirectional long short-term memory (BILSTM) algorithms were used to model the extracted characteristic bands and predict the content of SPI secondary structure. The findings suggested that BILSTM combined with 2DCOS-IR model (2DCOS-BILSTM) exhibited superior predictive performance. The prediction sets for α-helix, β-sheet, β-turn, and random coil were designated as 0.9257, 0.9077, 0.9476, and 0.8443, respectively, and their corresponding RMSEP values were 0.26, 0.48, 0.20, and 0.15. This strategy enhances the precision of IR and facilitates the rapid identification of secondary structure components within SPI, which is vital for the advancement of protein industrial production.
摘要:
蛋白质的红外光谱(IR)信号容易被杂质信号覆盖,并且使用光谱数据计算的二级结构含量的准确性较差。为了应对这一挑战,提出了一种快速高精度的蛋白质二级结构定量模型。首先,基于60组大豆分离蛋白(SPI)红外光谱数据进行了二维相关计算,产生了二维相关红外光谱(2DCOS-IR)。随后,从2DCOS-IR中提取了四个二级结构的最佳特征带。最终,偏最小二乘(PLS),长短期记忆(LSTM),和双向长短期记忆(BILSTM)算法用于对提取的特征带进行建模并预测SPI二级结构的内容。结果表明,BILSTM结合2DCOS-IR模型(2DCOS-BILSTM)表现出优异的预测性能。α-螺旋的预测集,β-sheet,β转角,和随机线圈分别指定为0.9257、0.9077、0.9476和0.8443,其相应的RMSEP值分别为0.26,0.48,0.20和0.15.该策略提高了IR的精度,促进了SPI中二级结构组分的快速识别,这对蛋白质工业生产的发展至关重要。
公众号