关键词: Microsatellite SATIN Simple sequence repeats

Mesh : Microsatellite Repeats / genetics Software Polymorphism, Genetic Data Mining / methods Algorithms Open Reading Frames / genetics DNA, Satellite / genetics

来  源:   DOI:10.1186/s12859-024-05842-2   PDF(Pubmed)

Abstract:
BACKGROUND: Tandem repeats are specific sequences in genomic DNA repeated in tandem that are present in all organisms. Among the subcategories of TRs we have Satellite repeats, that is divided into macrosatellites, minisatellites, and microsatellites, being the last two of specific interest because they can identify polymorphisms between organisms due to their instability. Currently, most mining tools focus on Simple Sequence Repeats (SSR) mining, and only a few can identify SSRs in the coding regions.
RESULTS: We developed a microsatellite mining software called SATIN (Micro and Mini SATellite IdentificatioN tool) based on a new sliding window algorithm written in C and Python. It represents a new approach to SSR mining by addressing the limitations of existing tools, particularly in coding region SSR mining. SATIN is available at https://github.com/labgm/SATIN.git . It was shown to be the second fastest for perfect and compound SSR mining. It can identify SSRs from coding regions plus SSRs with motif sizes bigger than 6. Besides the SSR mining, SATIN can also analyze SSRs polymorphism on coding-regions from pre-determined groups, and identify SSRs differentially abundant among them on a per-gene basis. To validate, we analyzed SSRs from two groups of Escherichia coli (K12 and O157) and compared the results with 5 known SSRs from coding regions. SATIN identified all 5 SSRs from 237 genes with at least one SSR on it.
CONCLUSIONS: The SATIN is a novel microsatellite search software that utilizes an innovative sliding window technique based on a numerical list for repeat region search to identify perfect, and composite SSRs while generating comprehensible and analyzable outputs. It is a tool capable of using files in fasta or GenBank format as input for microsatellite mining, also being able to identify SSRs present in coding regions for GenBank files. In conclusion, we expect SATIN to help identify potential SSRs to be used as genetic markers.
摘要:
背景:串联重复是基因组DNA中串联重复的特定序列,存在于所有生物体中。在TR的子类别中,我们有卫星重复,它分为宏观卫星,小型卫星,和微型卫星,是最后两个特别感兴趣的,因为它们可以由于生物体的不稳定性而识别生物体之间的多态性。目前,大多数挖掘工具专注于简单序列重复(SSR)挖掘,只有少数可以识别编码区的SSR。
结果:我们基于用C和Python编写的新滑动窗口算法,开发了一种名为SATIN(Micro和MiniSATelliteIdentifatioN工具)的微卫星采矿软件。通过解决现有工具的局限性,它代表了一种新的SSR挖掘方法,特别是在编码区SSR挖掘中。SATIN可在https://github.com/labgm/SATIN获得。git.它被证明是完美和复合SSR采矿的第二快。它可以识别来自编码区的SSR加上基序大小大于6的SSR。除了SSR采矿,SATIN还可以分析来自预定组的编码区的SSRs多态性,并在每个基因的基础上鉴定它们之间差异丰富的SSR。要验证,我们分析了来自两组大肠杆菌(K12和O157)的SSR,并将结果与来自编码区的5个已知SSR进行了比较。SATIN从237个基因中鉴定出所有5个SSR,其中至少有一个SSR。
结论:SATIN是一种新颖的微卫星搜索软件,它利用基于数字列表的创新滑动窗口技术进行重复区域搜索,以识别完美的,和复合SSR,同时产生可理解和可分析的输出。它是一种能够使用fasta或GenBank格式的文件作为微型卫星采矿的输入的工具,还能够识别GenBank文件的编码区中存在的SSR。总之,我们希望SATIN能够帮助识别潜在的SSR用作遗传标记。
公众号