关键词: Amino acid property Dissociation constant Fast Fourier transform Higuchi’s fractal dimension Multi-sequence alignment Phylogenetic analysis Protein sequence similarity

Mesh : Algorithms Amino Acids Fourier Analysis Fractals Phylogeny Proteins / chemistry

来  源:   DOI:10.1186/s12859-022-04889-3

Abstract:
BACKGROUND: Amino acid property-aware phylogenetic analysis (APPA) refers to the phylogenetic analysis method based on amino acid property encoding, which is used for understanding and inferring evolutionary relationships between species from the molecular perspective. Fast Fourier transform (FFT) and Higuchi\'s fractal dimension (HFD) have excellent performance in describing sequences\' structural and complexity information for APPA. However, with the exponential growth of protein sequence data, it is very important to develop a reliable APPA method for protein sequence analysis.
RESULTS: Consequently, we propose a new method named FFP, it joints FFT and HFD. Firstly, FFP is used to encode protein sequences on the basis of the important physicochemical properties of amino acids, the dissociation constant, which determines acidity and basicity of protein molecules. Secondly, FFT and HFD are used to generate the feature vectors of encoded sequences, whereafter, the distance matrix is calculated from the cosine function, which describes the degree of similarity between species. The smaller the distance between them, the more similar they are. Finally, the phylogenetic tree is constructed. When FFP is tested for phylogenetic analysis on four groups of protein sequences, the results are obviously better than other comparisons, with the highest accuracy up to more than 97%.
CONCLUSIONS: FFP has higher accuracy in APPA and multi-sequence alignment. It also can measure the protein sequence similarity effectively. And it is hoped to play a role in APPA\'s related research.
摘要:
背景:氨基酸特性感知系统发育分析(APPA)是指基于氨基酸特性编码的系统发育分析方法,用于从分子角度理解和推断物种之间的进化关系。快速傅里叶变换(FFT)和Higuchi的分形维数(HFD)在描述APPA的序列结构和复杂性信息方面具有出色的性能。然而,随着蛋白质序列数据的指数增长,开发一种可靠的APPA蛋白序列分析方法非常重要。
结果:因此,我们提出了一种名为FFP的新方法,它联合FFT和HFD。首先,FFP用于根据氨基酸的重要物理化学性质编码蛋白质序列。解离常数,决定蛋白质分子的酸度和碱度。其次,FFT和HFD用于生成编码序列的特征向量,此后,距离矩阵由余弦函数计算,描述了物种之间的相似程度。它们之间的距离越小,他们越相似。最后,构建了系统发育树。当FFP在四组蛋白质序列上进行系统发育分析测试时,结果明显优于其他比较,最高精度可达97%以上。
结论:FFP在APPA和多序列比对中具有更高的准确性。它还可以有效地测量蛋白质序列的相似性。并希望对APPA的相关研究起到一定的作用。
公众号