alignment-free

无对齐
  • 文章类型: Journal Article
    BACKGROUND: Biological macromolecules namely, DNA, RNA, and protein have their building blocks organized in a particular sequence and the sequential arrangement encodes evolutionary history of the organism (species). Hence, biological sequences have been used for studying evolutionary relationships among the species. This is usually carried out by multiple sequence algorithms (MSA). Due to certain limitations of MSA, alignment-free sequence comparison methods were developed. The present review is on alignment-free sequence comparison methods carried out using numerical characterization of DNA sequences.

    Discussion: The graphical representation of DNA sequences by chaos game representation and other 2-dimesnional and 3-dimensional methods are discussed. The evolution of numerical characterization from the various graphical representations and the application of the DNA invariants thus computed in phylogenetic analysis is presented. The extension of computing molecular descriptors in chemometrics to the calculation of new set of DNA invariants and their use in alignment-free sequence comparison in a N-dimensional space and construction of phylogenetic tress is also reviewed.

    Conclusion: The phylogenetic tress constructed by the alignment-free sequence comparison methods using DNA invariants were found to be better than those constructed using alignment-based tools such as PHLYIP and ClustalW. One of the graphical representation methods is now extended to study viral sequences of infectious diseases for the identification of conserved regions to design peptide-based vaccine by combining numerical characterization and graphical representation.

    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, \'Chaos Game Representation\'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Comparative Study
    现代测序和基因组组装技术提供了丰富的数据,这将很快需要通过比较进行分析才能发现。序列比对,生物信息学研究的一项基本任务,可以使用,但有一些注意事项。由于处理大量序列数据时固有的计算费用,因此来自动态编程的开创性技术和方法对这项工作无效。由于基因重组,这些方法容易提供误导性信息,遗传改组和其他固有的生物事件。信息论的新方法,频率分析和数据压缩是可用的,并提供了强大的替代动态规划。这些新方法通常是首选,因为他们的算法更简单,并且不受sinteny相关问题的影响。在这次审查中,我们提供了计算工具的详细讨论,它源于基于词频统计分析的无对齐方法。我们提供了几个清晰的例子来演示应用和解释在几个不同的领域的无对齐分析,如基本的相关性,特征频率曲线,组成向量,改进的字符串组合和D2统计度量。此外,我们提供了详细的讨论,并通过Lempel-Ziv技术从数据压缩中进行了分析。
    Modern sequencing and genome assembly technologies have provided a wealth of data, which will soon require an analysis by comparison for discovery. Sequence alignment, a fundamental task in bioinformatics research, may be used but with some caveats. Seminal techniques and methods from dynamic programming are proving ineffective for this work owing to their inherent computational expense when processing large amounts of sequence data. These methods are prone to giving misleading information because of genetic recombination, genetic shuffling and other inherent biological events. New approaches from information theory, frequency analysis and data compression are available and provide powerful alternatives to dynamic programming. These new methods are often preferred, as their algorithms are simpler and are not affected by synteny-related problems. In this review, we provide a detailed discussion of computational tools, which stem from alignment-free methods based on statistical analysis from word frequencies. We provide several clear examples to demonstrate applications and the interpretations over several different areas of alignment-free analysis such as base-base correlations, feature frequency profiles, compositional vectors, an improved string composition and the D2 statistic metric. Additionally, we provide detailed discussion and an example of analysis by Lempel-Ziv techniques from data compression.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号