关键词: metagenomic data profile hidden Markov models virus detection

Mesh : Metagenomics / methods Markov Chains Viruses / genetics classification Databases, Genetic Humans Computational Biology / methods Algorithms

来  源:   DOI:10.1093/bib/bbae292   PDF(Pubmed)

Abstract:
Profile hidden Markov models (pHMMs) are able to achieve high sensitivity in remote homology search, making them popular choices for detecting novel or highly diverged viruses in metagenomic data. However, many existing pHMM databases have different design focuses, making it difficult for users to decide the proper one to use. In this review, we provide a thorough evaluation and comparison for multiple commonly used profile HMM databases for viral sequence discovery in metagenomic data. We characterized the databases by comparing their sizes, their taxonomic coverage, and the properties of their models using quantitative metrics. Subsequently, we assessed their performance in virus identification across multiple application scenarios, utilizing both simulated and real metagenomic data. We aim to offer researchers a thorough and critical assessment of the strengths and limitations of different databases. Furthermore, based on the experimental results obtained from the simulated and real metagenomic data, we provided practical suggestions for users to optimize their use of pHMM databases, thus enhancing the quality and reliability of their findings in the field of viral metagenomics.
摘要:
轮廓隐马尔可夫模型(pHMMs)能够在远程同源搜索中实现高灵敏度,使它们成为检测宏基因组数据中新颖或高度分歧的病毒的流行选择。然而,许多现有的pHMM数据库具有不同的设计重点,这使得用户很难决定正确的使用。在这次审查中,我们对宏基因组数据中病毒序列发现的多个常用谱HMM数据库进行了全面评估和比较.我们通过比较数据库的大小来表征数据库,它们的分类范围,以及使用定量指标的模型属性。随后,我们评估了它们在多个应用程序场景中的病毒识别性能,利用模拟和真实的宏基因组数据。我们的目标是为研究人员提供对不同数据库的优势和局限性的全面和批判性评估。此外,根据从模拟和真实的宏基因组数据中获得的实验结果,我们为用户提供了实用的建议,以优化他们对pHMM数据库的使用,从而提高他们在病毒宏基因组学领域发现的质量和可靠性。
公众号