basic local alignment search tool

  • 文章类型: Journal Article
    BACKGROUND: The Basic Local Alignment Search Tool (BLAST) from NCBI is the preferred utility for sequence alignment and identification for bioinformatics and genomics research. Among researchers using NCBI\'s BLAST software, it is well known that analyzing the results of a large BLAST search can be tedious and time-consuming. Furthermore, with the recent discussions over the effects of parameters such as \'-max_target_seqs\' on the BLAST heuristic search process, the use of these search options are questionable. This leaves using a stand-alone parser as one of the only options of condensing these large datasets, and with few available for download online, the task is left to the researcher to create a specialized piece of software anytime they need to analyze BLAST results. The need for a streamlined and fast script that solves these issues and can be easily implemented into a variety of bioinformatics and genomics workflows was the initial motivation for developing this software.
    RESULTS: In this study, we demonstrate the effectiveness of BLAST-QC for analysis of BLAST results and its desirability over the other available options. Applying genetic sequence data from our bioinformatic workflows, we establish BLAST_QC\'s superior runtime when compared to existing parsers developed with commonly used BioPerl and BioPython modules, as well as C and Java implementations of the BLAST_QC program. We discuss the \'max_target_seqs\' parameter, the usage of and controversy around the use of the parameter, and offer a solution by demonstrating the ability of our software to provide the functionality this parameter was assumed to produce, as well as a variety of other parsing options. Executions of the script on example datasets are given, demonstrating the implemented functionality and providing test-cases of the program. BLAST-QC is designed to be integrated into existing software, and we establish its effectiveness as a module of workflows or other processes.
    CONCLUSIONS: BLAST-QC provides the community with a simple, lightweight and portable Python script that allows for easy quality control of BLAST results while avoiding the drawbacks of other options. This includes the uncertain results of applying the -max_target_seqs parameter or relying on the cumbersome dependencies of other options like BioPerl, Java, etc. which add complexity and run time when running large data sets of sequences. BLAST-QC is ideal for use in high-throughput workflows and pipelines common in bioinformatic and genomic research, and the script has been designed for portability and easy integration into whatever type of processes the user may be running.







  • 文章类型: Journal Article
    Numerical representation of biological sequences plays an important role in bioinformatics and has many practical applications. One of the most popular approaches is the chaos game representation. In this paper, the authors propose a novel look into chaos game construction - an analytical description of this procedure. This type enables to build more general number sequences using different weight functions. The authors suggest three conditions that these functions should hold. Additionally, they present some criteria to compare them and check whether they provide a unique representation. One of the most important advantages of our approach is the possibility to construct such a description that is less sensitive to mutations and as a result, give more reliable values for free-alignment phylogenetic trees constructions. Finally, the authors applied the DFT method using four types of functions and compared the obtained results using the BLAST tool.






  • 文章类型: Journal Article
    BLAST, the Basic Local Alignment Search Tool, is used more frequently than any other biosequence database search program. We show how to run searches on the Web, and demonstrate how to increase performance by fine-tuning arguments for a specific research project. We offer guidance for interpreting results, statistical significance and biological relevance issues, and suggest complementary analyses. This unit covers both protein-to-protein (blastp) searches and translated searches (blastx, tblastn, tfastx). blastx conceptually translates the query sequence and tblastn translates all nucleotide sequences in a database, while tblastx translates both the query and the database sequences into amino acid sequences. © 2017 by John Wiley & Sons, Inc.






  • 文章类型: Journal Article
    Calpastatin has been introduced as a potential candidate gene for growth and meat quality traits. In this study, genetic variability was investigated in the exon 6 and its intron boundaries of ovine CAST gene by PCR-SSCP analysis and DNA sequencing. Also a protein sequence and structural analysis were performed to predict the possible impact of amino acid substitutions on physicochemical properties and structure of the CAST protein. A total of 487 animals belonging to four ancient Iranian sheep breeds with different fat metabolisms, Lori-Bakhtiari and Chall (fat-tailed), Zel-Atabay cross-bred (medium fat-tailed) and Zel (thin-tailed), were analyzed. Eight unique SSCP patterns, representing eight different sequences or haplotypes, CAST-1, CAST-2 and CAST-6 to CAST-11, were identified. Haplotypes CAST-1 and CAST-2 were most common with frequency of 0.365 and 0.295. The novel haplotype CAST-8 had considerable frequency in Iranian sheep breeds (0.129). All the consensus sequences showed 98-99%, 94-98%, 92-93% and 82-83% similarity to the published ovine, caprine, bovine and porcine CAST locus sequences, respectively. Sequence analysis revealed four SNPs in intron 5 (C24T, G62A, G65T and T69-) and three SNPs in exon 6 (c.197A>T, c.282G>T and c.296C>G). All three SNPs in exon 6 were missense mutations which would result in p.Gln 66 Leu, p.Glu 94 Asp and p.Pro 99 Arg substitutions, respectively, in CAST protein. All three amino acid substitutions affected the physicochemical properties of ovine CAST protein including hydrophobicity, amphiphilicity and net charge and subsequently might influence its structure and effect on the activity of Ca2+ channels; hence, they might regulate calpain activity and afterwards meat tenderness and growth rate. The Lori-Bakhtiari population showed the highest heterozygosity in the ovine CAST locus (0.802). Frequency difference of haplotypes CAST-10 and CAST-8 between Lori-Bakhtiari (fat-tailed) and Zel (thin-tailed) breeds was highly significant (P<0.001), indicating that these two haplotypes might be breed-specific haplotypes that distinguish between fat-tailed and thin-tailed sheep breeds.






  • 文章类型: Journal Article
    Freshwater fishes in India are poorly known and plagued by many unresolved cryptic species complexes that masks some latent and endemic species. Limitations in traditional taxonomy have resulted in this crypticism. Hence, molecular approaches like DNA barcoding, are needed to diagnose these latent species. We have analyzed 1383 barcode sequences of 175 Indian freshwater fish species available in the databases, of which 172 sequences of 70 species were generated. The congeneric and conspecific genetic divergences were calculated using Kimura\'s 2 parameter distance model followed by the construction of a Neighbor Joining tree using the MEGA 5.1. DNA barcoding principle at its first hand approach, led to the straightforward identification of 82% of the studied species with 2.9% (S.E=0.2) divergence between the nearest congeners. However, after validating some cases of synonymy and mislabeled sequences, 5% more species were found to be valid. Sequences submitted to the database under different names were found to represent single species. On the other hand, some sequences of the species like Barilius barna, Barilius bendelisis and Labeo bata were submitted to the database under a single name but were found to represent either some unexplored species or latent species. Overall, 87% of the available Indian freshwater fish barcodes were diagnosed as true species in parity with the existing checklist and can act as reference barcode for the particular taxa. For the remaining 13% (21 species) the correct species name was difficult to assign as they depicted some erroneous identification and cryptic species complex. Thus, these barcodes will need further assay and inclusion of barcodes of more specimens from same and sister species.






  • 文章类型: Journal Article
    The present study aims to investigate small RNA interactions with putative disease response genes in the model grass species Brachypodium distachyon. The fungal pathogen Fusarium culmorum (Fusarium herein) and phytohormone salicylic acid treatment were used to induce the disease response in Brachypodium. Initially, 121 different putative disease response genes were identified using bioinformatic and homology based approaches. Computational prediction was used to identify 33 candidate new miRNA coding sequences, of which 9 were verified by analysis of small RNA sequence libraries. Putative Brachypodium miRNA target sites were identified in the disease response genes, and a subset of which were screened for expression and possible miRNA interactions in 5 different Brachypodium lines infected with Fusarium. An NBS-LRR family gene, 1g34430, was polymorphic among the lines, forming two major genotypes, one of which has its miRNA target sites deleted, resulting in altered gene expression during infection. There were siRNAs putatively involved in regulation of this gene, indicating a role of small RNAs in the B. distachyon disease response.






  • 文章类型: Journal Article
    The WUSCHEL (WUS)-related homeobox (WOX) gene family plays an important role in coordinating gene transcription in the early phases of embryogenesis. In this study, we isolated and characterized WOX5 from common wheat and its relatives Triticum monococcum, Triticum urartu, Aegilops speltoides, Aegilops searsii, Aegilops sharonensis, Aegilops longissima, Aegilops bicornis, Aegilops tauschii, and Triticum turgidum. The size of the characterized WOX5 alleles ranged from 1029 to 1038 bp and encompassed the complete open reading frame (ORF) as well as 5\' upstream and 3\' downstream sequences. Domain prediction analysis showed that the putative primary structures of wheat WOX5 protein include the highly conserved homeodomain besides the WUS-box domain and the EAR-like domain, which is/are present in some members of the WOX protein family. The full-length ORF was subcloned into a prokaryotic expression vector pET30a, and an approximate 26-kDa protein was successfully expressed in Escherichia coli BL21 (DE3) cells with IPTG induction. The WOX5 genes from wheat-related species exhibit a similar structure to and high sequence similarity with WOX5 genes from common wheat. The degree of divergence and phylogenetic tree analysis among WOX5 alleles suggested the existence of three homoeologous copies in the A, B, or D genome of common wheat. Quantitative PCR results showed that TaWOX5 was primarily expressed in the root and calli induced by auxin and cytokinin, indicating that TaWOX5 may play a role related to root formation or development and is associated with hormone regulation in somatic embryogenesis.






  • 文章类型: Journal Article
    Abnormal glycosylation of dystroglycan (DG), a transmembrane glycoprotein, results in a group of diseases known as dystroglycanopathy. A severe dystroglycanopathy known as the limb girdle disease MDDGC9 [OMIM: 613818] occurs as a result of hypoglycosylation of alpha subunit of DG. Reasons behind this has been traced back to a point mutation (T192M) in DG that leads to weakening of interactions of DG protein with laminin and subsequent loss of signal flow through the DG protein. In this work we have tried to analyze the molecular details of the interactions between DG and laminin1 in order to propose a mechanism about the onset of the disease MDDGC9. We have observed noticeable changes between the modeled structures of wild type and mutant DG proteins. We also have employed molecular docking techniques to study and compare the binding interactions between laminin1 and both the wild type and mutant DG proteins. The docking simulations have revealed that the mutant DG has weaker interactions with laminin1 as compared to the wild type DG. Till date there are no previous reports that deal with the elucidation of the interactions of DG with laminin1 from the molecular level. Our study is therefore the first of its kind which analyzes the differences in binding patterns of laminin1 with both the wild type and mutant DG proteins. Our work would therefore facilitate analysis of the molecular mechanism of the disease MDDGC9. Future work based on our results may be useful for the development of suitable drugs against this disease.






  • 文章类型: Journal Article
    The couch potato (CPO) protein is a key biomolecule involved in regulating diapause through the RNA-binding process of the peripheral and central nervous systems in insects and also recently discovered in a few crustacean species. As such, ectoparasitic copepods are interesting model species that have no evidence of developmental arrest. The present study is the first to report on the cloning of a putative CPO gene from the salmon louse Caligus rogercresseyi (CrCPO), as identified by high-throughput transcriptome sequencing. In addition, the transcription expression in larvae and adults was evaluated using quantitative real-time PCR. The CrCPO cDNA sequence showed 3261 base pairs (bp), consisting of 713bp of 5\' UTR, 1741bp of 3\' UTR, and an open reading frame of 807bp encoding for 268 amino acids. The highly conserved RNA binding regions RNP2 (LFVSGL) and RNP1 (SPVGFVTF), as well the dimerization site (LEF), were also found. Furthermore, eight single nucleotide polymorphisms located in the untranslated regions and one located in the coding region were detected. Gene transcription analysis revealed that CrCPO has ubiquitous expression across larval stages and in adult individuals, with the highest expression from nauplius to copepodid stages. The present study suggests a putative biological function of CrCPO associated with the development of the nervous system in salmon lice and contributes molecular evidence for candidate genes related to host-parasite interactions.






  • 文章类型: Journal Article
    Imprinted genes play important roles in mammalian growth, development and behavior. The Rasgrf1 (Ras protein-specific guanine nucleotide exchange factor 1) gene has been identified as an imprinted gene in mouse and rat. In the present study, we detected its sequence, imprinting status and expression pattern in the domestic pigs. A 228 bp partial sequence located in exon 14 and a 193 bp partial sequence located in exon 1 of the Rasgrf1 gene in domestic pigs were obtained. A G/A transition, was identified in Rasgrf1 exon 14, and then, the reciprocal Berkshire × Wannan black F1 hybrid model and the RT-PCR-RFLP method were used to detect the imprinting status of porcine Rasgrf1 gene at the developmental stage of 1-day-old. The expression profile results indicated that the porcine Rasgrf1 mRNA was highly expressed in brain, pituitary and pancreas, followed by kidney, stomach, lung, testis, small intestine, ovary, spleen and liver, and at low levels of expression in longissimus dorsi, heart, and backfat. The expression levels of Rasgrf1 gene in brain, pituitary and pancreas tissues were significantly different between the two reciprocal F1 hybrids. Imprinting analysis showed that porcine Rasgrf1 gene was maternally expressed in the liver, small intestine, paternally expressed in the lung, but biallelically expressed in brain, heart, spleen, kidney, stomach, pancreas, backfat, testis, ovary, longissimus dorsi and pituitary tissues.





