关键词: Annotation IMGT Immunoglobulin

Mesh : Animals Mice Humans Immunoglobulins / genetics Software Databases, Factual

来  源:   DOI:10.1186/s12859-023-05624-2   PDF(Pubmed)

Abstract:
BACKGROUND: The advent and continual improvement of high-throughput sequencing technologies has made immunoglobulin repertoire sequencing accessible and informative regardless of study species. However, to fully map dynamic changes in polyclonal responses precise framework and complementarity determining region annotation of rearranging genes is pivotal. Most sequence annotation tools are designed primarily for use with human and mouse antibody sequences which use databases with fixed species lists, applying very specific assumptions which select against unique structural characteristics. For this reason, data agnostic tools able to learn from presented data can be very useful with new species or with novel datasets.
RESULTS: We have developed IgMAT, which utilises a reduced amino acid alphabet, that incorporates multiple HMM alignments into a single consensus to automatically annotate immunoglobulin sequences from most organisms. Additionally, the software allows the incorporation of user defined databases to better represent the species and/or antibody class of interest. To demonstrate the accuracy and utility of IgMAT, we present analysis of sequences extracted from structural data and immunoglobulin sequence datasets from several different species.
CONCLUSIONS: IgMAT is fully open-sourced and freely available on GitHub ( https://github.com/TPI-Immunogenetics/igmat ) for download under GPLv3 license. It can be used as a CLI application or as a python module to be integrated in custom scripts.
摘要:
背景:高通量测序技术的出现和持续改进使得无论研究物种如何,免疫球蛋白库测序都可以获得和提供信息。然而,完全绘制多克隆反应的动态变化,重排基因的精确框架和互补决定区注释是关键。大多数序列注释工具主要设计用于使用具有固定物种列表的数据库的人和小鼠抗体序列。应用非常具体的假设,这些假设根据独特的结构特征进行选择。出于这个原因,能够从呈现的数据中学习的数据不可知工具对于新物种或新数据集非常有用。
结果:我们开发了IgMAT,使用减少的氨基酸字母,将多个HMM比对整合到单个共识中以自动注释来自大多数生物体的免疫球蛋白序列。此外,软件允许合并用户定义的数据库,以更好地表示感兴趣的物种和/或抗体类别。为了证明IgMAT的准确性和实用性,我们对从几个不同物种的结构数据和免疫球蛋白序列数据集中提取的序列进行了分析。
结论:IgMAT是完全开源的,可以在GitHub(https://github.com/TPI-Immunogenetics/igmat)上免费获得,可在GPLv3许可下下载。它可以用作CLI应用程序或Python模块集成在自定义脚本中。
公众号