关键词: Abiotic stress Attribute weighting algorithms Cucumber Gene expression

来  源:   DOI:10.1186/s40529-024-00433-z   PDF(Pubmed)

Abstract:
As climate change intensifies, the frequency and severity of waterlogging are expected to increase, necessitating a deeper understanding of the cucumber response to this stress. In this study, three public RNA-seq datasets (PRJNA799460, PRJNA844418, and PRJNA678740) comprising 36 samples were analyzed. Various feature selection algorithms including Uncertainty, Relief, SVM (Support Vector Machine), Correlation, and logistic least absolute shrinkage, and selection operator (LASSO) were performed to identify the most significant genes related to the waterlogging stress response. These feature selection techniques, which have different characteristics, were used to reduce the complexity of the data and thereby identify the most significant genes related to the waterlogging stress response. Uncertainty, Relief, SVM, Correlation, and LASSO identified 4, 4, 10, 21, and 13 genes, respectively. Differential gene correlation analysis (DGCA) focusing on the 36 selected genes identified changes in correlation patterns between the selected genes under waterlogged versus control conditions, providing deeper insights into the regulatory networks and interactions among the selected genes. DGCA revealed significant changes in the correlation of 13 genes between control and waterlogging conditions. Finally, we validated 13 genes using the Random Forest (RF) classifier, which achieved 100% accuracy and a 1.0 Area Under the Curve (AUC) score. The SHapley Additive exPlanations (SHAP) values clearly showed the significant impact of LOC101209599, LOC101217277, and LOC101216320 on the model\'s predictive power. In addition, we employed the Boruta as a wrapper feature selection method to further validate our gene selection strategy. Eight of the 13 genes were common across the four feature weighting algorithms, LASSO, DGCA, and Boruta, underscoring the robustness and reliability of our gene selection strategy. Notably, the genes LOC101209599, LOC101217277, and LOC101216320 were among genes identified by multiple feature selection methods from different categories (filtering, wrapper, and embedded). Pathways associated with these specific genes play a pivotal role in regulating stress tolerance, root development, nutrient absorption, sugar metabolism, gene expression, protein degradation, and calcium signaling. These intricate regulatory mechanisms are crucial for cucumbers to adapt effectively to waterlogging conditions. These findings provide valuable insights for uncovering targets in breeding new cucumber varieties with enhanced stress tolerance.
摘要:
随着气候变化的加剧,内涝的频率和严重程度预计会增加,需要更深入地了解黄瓜对这种胁迫的反应。在这项研究中,分析了包含36个样本的3个公共RNA-seq数据集(PRJNA799460,PRJNA844418和PRJNA678740).各种特征选择算法,包括不确定性,救济,支持向量机(SVM)相关性,和逻辑最小绝对收缩率,和选择算子(LASSO)进行,以确定与淹水应激反应相关的最重要的基因。这些特征选择技术,有不同的特点,用于降低数据的复杂性,从而确定与淹水应激反应相关的最重要的基因。不确定性,救济,SVM,相关性,LASSO鉴定了4、4、10、21和13个基因,分别。针对36个选定基因的差异基因相关分析(DGCA),确定了在淹水条件下与对照条件下选定基因之间相关模式的变化。为所选基因之间的调控网络和相互作用提供更深入的见解。DGCA揭示了控制和淹水条件之间13个基因的相关性发生了显着变化。最后,我们使用随机森林(RF)分类器验证了13个基因,其实现100%的准确度和1.0的曲线下面积(AUC)评分。SHapley加性扩张(SHAP)值清楚地显示了LOC101209599、LOC101217277和LOC101216320对模型的预测能力的显著影响。此外,我们采用Boruta作为包装特征选择方法来进一步验证我们的基因选择策略.13个基因中有8个在四种特征加权算法中是常见的,拉索,DGCA,还有Boruta,强调了我们基因选择策略的稳健性和可靠性。值得注意的是,LOC101209599、LOC101217277和LOC101216320是通过不同类别的多种特征选择方法鉴定的基因(过滤,包装器,和嵌入式)。与这些特定基因相关的通路在调节胁迫耐受性中起着关键作用,根系发育,营养吸收,糖代谢,基因表达,蛋白质降解,和钙信号。这些复杂的调节机制对于黄瓜有效适应内涝条件至关重要。这些发现为发现抗逆性增强的黄瓜新品种的目标提供了有价值的见解。
公众号