关键词: NUCKS1 acetylation functional score acetylome functional acetylation sites lysine acetylation machine learning molecular environment

Mesh : Humans Lysine / metabolism Acetylation Proteomics Mass Spectrometry Protein Processing, Post-Translational Proteome / metabolism

来  源:   DOI:10.1016/j.mcpro.2023.100700   PDF(Pubmed)

Abstract:
Protein lysine acetylation is a critical post-translational modification involved in a wide range of biological processes. To date, about 20,000 acetylation sites of Homo sapiens were identified through mass spectrometry-based proteomic technology, but more than 95% of them have unclear functional annotations because of the lack of existing prioritization strategy to assess the functional importance of the acetylation sites on large scale. Hence, we established a lysine acetylation functional evaluating model (LAFEM) by considering eight critical features surrounding lysine acetylation site to high-throughput estimate the functional importance of given acetylation sites. This was achieved by selecting one of the random forest models with the best performance in 10-fold cross-validation on undersampled training dataset. The global analysis demonstrated that the molecular environment of acetylation sites with high acetylation functional scores (AFSs) mainly had the features of larger solvent-accessible surface area, stronger hydrogen bonding-donating abilities, near motif and domain, higher homology, and disordered degree. Importantly, LAFEM performed well in validation dataset and acetylome, showing good accuracy to screen out fitness directly relevant acetylation sites and assisting to explain the core reason for the difference between biological models from the perspective of acetylome. We further used cellular experiments to confirm that, in nuclear casein kinase and cyclin-dependent kinase substrate 1, acetyl-K35 with higher AFS was more important than acetyl-K9 with lower AFS in the proliferation of A549 cells. LAFEM provides a prioritization strategy to large scale discover the fitness directly relevant acetylation sites, which constitutes an unprecedented resource for better understanding of functional acetylome.
摘要:
蛋白质赖氨酸乙酰化是涉及多种生物过程的关键翻译后修饰(PTM)。迄今为止,通过基于质谱的蛋白质组学技术鉴定了约20,000个智人的乙酰化位点,但由于缺乏现有的优先级策略来大规模评估乙酰化位点的功能重要性,因此超过95%的智人具有不清楚的功能注释。因此,我们通过考虑围绕赖氨酸乙酰化位点的8个关键特征来建立赖氨酸乙酰化功能评估模型(LAFEM),以高通量评估给定乙酰化位点的功能重要性.这是通过选择在欠采样训练数据集上的10倍交叉验证中具有最佳性能的随机森林模型之一来实现的。全局分析表明,高乙酰化功能分数(AFS)乙酰化位点的分子环境主要具有溶剂可及表面积(SASA)较大的特点,更强的氢键捐赠能力,靠近主题和域,较高的同源性和无序程度。重要的是,LAFEM在验证数据集和乙酰基组中表现良好,显示出良好的准确性,以筛选出适合直接相关的乙酰化位点,并有助于从乙酰基组的角度解释生物模型之间差异的核心原因。我们进一步使用细胞实验来证实,在核酪蛋白激酶和细胞周期蛋白依赖性激酶底物1(NUCKS1)中,在A549细胞的增殖中,AFS较高的乙酰K35比AFS较低的乙酰K9更为重要。LAFEM提供了一个优先排序策略,以大规模发现直接相关的乙酰化位点,这是一个前所未有的资源,可以更好地理解功能性乙酰组学。
公众号