关键词: amino acid substitution models maximum likelihood estimation method multi-matrix mixture models site rate heterogeneity time-reversible models

Mesh : Algorithms Software Amino Acid Substitution Phylogeny Models, Genetic Computational Biology / methods Likelihood Functions

来  源:   DOI:10.1089/cmb.2023.0403

Abstract:
The single-matrix amino acid (AA) substitution models are widely used in phylogenetic analyses; however, they are unable to properly model the heterogeneity of AA substitution rates among sites. The multi-matrix mixture models can handle the site rate heterogeneity and outperform the single-matrix models. Estimating multi-matrix mixture models is a complex process and no computer program is available for this task. In this study, we implemented a computer program of the so-called QMix based on the algorithm of LG4X and LG4M with several enhancements to automatically estimate multi-matrix mixture models from large datasets. QMix employs QMaker algorithm instead of XRATE algorithm to accurately and rapidly estimate the parameters of models. It is able to estimate mixture models with different number of matrices and supports multi-threading computing to efficiently estimate models from thousands of genes. We re-estimate mixture models LG4X and LG4M from 1471 HSSP alignments. The re-estimated models (HP4X and HP4M) are slightly better than LG4X and LG4M in building maximum likelihood trees from HSSP and TreeBASE datasets. QMix program required about 10 hours on a computer with 18 cores to estimate a mixture model with four matrices from 200 HSSP alignments. It is easy to use and freely available for researchers.
摘要:
单矩阵氨基酸(AA)取代模型广泛用于系统发育分析;然而,他们无法正确模拟站点之间AA替代率的异质性。多矩阵混合模型可以处理站点速率异质性并且优于单矩阵模型。估计多矩阵混合模型是一个复杂的过程,没有计算机程序可用于此任务。在这项研究中,我们基于LG4X和LG4M算法实现了所谓的QMix的计算机程序,并进行了一些增强,可以从大型数据集中自动估计多矩阵混合模型。QMix采用QMaker算法而不是XRATE算法来准确快速地估计模型的参数。它能够估计具有不同数量矩阵的混合模型,并支持多线程计算,以有效地估计来自数千个基因的模型。我们从1471个HSSP比对中重新估计了混合模型LG4X和LG4M。在从HSSP和TreeBASE数据集构建最大似然树方面,重新估计的模型(HP4X和HP4M)略优于LG4X和LG4M。QMix程序需要在具有18个核心的计算机上大约10个小时来估计具有来自200个HSSP比对的四个矩阵的混合模型。它易于使用,可供研究人员免费使用。
公众号