basic local alignment search tool

基本局部路线搜索工具
  • 文章类型: Journal Article
    BACKGROUND: The Basic Local Alignment Search Tool (BLAST) from NCBI is the preferred utility for sequence alignment and identification for bioinformatics and genomics research. Among researchers using NCBI\'s BLAST software, it is well known that analyzing the results of a large BLAST search can be tedious and time-consuming. Furthermore, with the recent discussions over the effects of parameters such as \'-max_target_seqs\' on the BLAST heuristic search process, the use of these search options are questionable. This leaves using a stand-alone parser as one of the only options of condensing these large datasets, and with few available for download online, the task is left to the researcher to create a specialized piece of software anytime they need to analyze BLAST results. The need for a streamlined and fast script that solves these issues and can be easily implemented into a variety of bioinformatics and genomics workflows was the initial motivation for developing this software.
    RESULTS: In this study, we demonstrate the effectiveness of BLAST-QC for analysis of BLAST results and its desirability over the other available options. Applying genetic sequence data from our bioinformatic workflows, we establish BLAST_QC\'s superior runtime when compared to existing parsers developed with commonly used BioPerl and BioPython modules, as well as C and Java implementations of the BLAST_QC program. We discuss the \'max_target_seqs\' parameter, the usage of and controversy around the use of the parameter, and offer a solution by demonstrating the ability of our software to provide the functionality this parameter was assumed to produce, as well as a variety of other parsing options. Executions of the script on example datasets are given, demonstrating the implemented functionality and providing test-cases of the program. BLAST-QC is designed to be integrated into existing software, and we establish its effectiveness as a module of workflows or other processes.
    CONCLUSIONS: BLAST-QC provides the community with a simple, lightweight and portable Python script that allows for easy quality control of BLAST results while avoiding the drawbacks of other options. This includes the uncertain results of applying the -max_target_seqs parameter or relying on the cumbersome dependencies of other options like BioPerl, Java, etc. which add complexity and run time when running large data sets of sequences. BLAST-QC is ideal for use in high-throughput workflows and pipelines common in bioinformatic and genomic research, and the script has been designed for portability and easy integration into whatever type of processes the user may be running.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Numerical representation of biological sequences plays an important role in bioinformatics and has many practical applications. One of the most popular approaches is the chaos game representation. In this paper, the authors propose a novel look into chaos game construction - an analytical description of this procedure. This type enables to build more general number sequences using different weight functions. The authors suggest three conditions that these functions should hold. Additionally, they present some criteria to compare them and check whether they provide a unique representation. One of the most important advantages of our approach is the possibility to construct such a description that is less sensitive to mutations and as a result, give more reliable values for free-alignment phylogenetic trees constructions. Finally, the authors applied the DFT method using four types of functions and compared the obtained results using the BLAST tool.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    BLAST, the Basic Local Alignment Search Tool, is used more frequently than any other biosequence database search program. We show how to run searches on the Web, and demonstrate how to increase performance by fine-tuning arguments for a specific research project. We offer guidance for interpreting results, statistical significance and biological relevance issues, and suggest complementary analyses. This unit covers both protein-to-protein (blastp) searches and translated searches (blastx, tblastn, tfastx). blastx conceptually translates the query sequence and tblastn translates all nucleotide sequences in a database, while tblastx translates both the query and the database sequences into amino acid sequences. © 2017 by John Wiley & Sons, Inc.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    Calpastatin已被引入作为生长和肉质性状的潜在候选基因。在这项研究中,通过PCR-SSCP分析和DNA测序研究了绵羊CAST基因外显子6及其内含子边界的遗传变异性。还进行了蛋白质序列和结构分析以预测氨基酸取代对CAST蛋白质的物理化学性质和结构的可能影响。共有487只动物属于四个古代伊朗绵羊品种,具有不同的脂肪代谢,Lori-Bakhtiari和Chall(肥尾),Zel-Atabay杂交(中厚尾)和Zel(薄尾),进行了分析。八个独特的SSCP模式,代表八种不同的序列或单倍型,鉴定了CAST-1、CAST-2和CAST-6至CAST-11。单倍型CAST-1和CAST-2最常见,频率分别为0.365和0.295。新型单倍型CAST-8在伊朗绵羊品种中具有相当大的频率(0.129)。所有的共有序列显示98-99%,94-98%,与已发表的绵羊有92-93%和82-83%的相似性,caprine,牛和猪CAST基因座序列,分别。序列分析显示内含子5(C24T,G62A,G65T和T69-)和外显子6中的三个SNP(c.197A>T,c.2822G>T和c.296C>G)。外显子6中的所有三个SNP都是错义突变,会导致p.Gln66Leu,p.Glu94Asp和p.Pro99Arg替换,分别,在CAST蛋白。所有三个氨基酸取代影响绵羊CAST蛋白的理化性质,包括疏水性,两亲性和净电荷,随后可能会影响其结构和对Ca2通道活性的影响;因此,它们可能会调节钙蛋白酶的活性,然后调节肉的嫩度和生长速度。Lori-Bakhtiari种群在绵羊CAST基因座中显示出最高的杂合性(0.802)。单倍型CAST-10和CAST-8在Lori-Bakhtiari(肥尾)和Zel(薄尾)品种之间的频率差异非常显着(P<0.001)。这表明这两种单倍型可能是品种特异性单倍型,可以区分肥尾和细尾绵羊品种。
    Calpastatin has been introduced as a potential candidate gene for growth and meat quality traits. In this study, genetic variability was investigated in the exon 6 and its intron boundaries of ovine CAST gene by PCR-SSCP analysis and DNA sequencing. Also a protein sequence and structural analysis were performed to predict the possible impact of amino acid substitutions on physicochemical properties and structure of the CAST protein. A total of 487 animals belonging to four ancient Iranian sheep breeds with different fat metabolisms, Lori-Bakhtiari and Chall (fat-tailed), Zel-Atabay cross-bred (medium fat-tailed) and Zel (thin-tailed), were analyzed. Eight unique SSCP patterns, representing eight different sequences or haplotypes, CAST-1, CAST-2 and CAST-6 to CAST-11, were identified. Haplotypes CAST-1 and CAST-2 were most common with frequency of 0.365 and 0.295. The novel haplotype CAST-8 had considerable frequency in Iranian sheep breeds (0.129). All the consensus sequences showed 98-99%, 94-98%, 92-93% and 82-83% similarity to the published ovine, caprine, bovine and porcine CAST locus sequences, respectively. Sequence analysis revealed four SNPs in intron 5 (C24T, G62A, G65T and T69-) and three SNPs in exon 6 (c.197A>T, c.282G>T and c.296C>G). All three SNPs in exon 6 were missense mutations which would result in p.Gln 66 Leu, p.Glu 94 Asp and p.Pro 99 Arg substitutions, respectively, in CAST protein. All three amino acid substitutions affected the physicochemical properties of ovine CAST protein including hydrophobicity, amphiphilicity and net charge and subsequently might influence its structure and effect on the activity of Ca2+ channels; hence, they might regulate calpain activity and afterwards meat tenderness and growth rate. The Lori-Bakhtiari population showed the highest heterozygosity in the ovine CAST locus (0.802). Frequency difference of haplotypes CAST-10 and CAST-8 between Lori-Bakhtiari (fat-tailed) and Zel (thin-tailed) breeds was highly significant (P<0.001), indicating that these two haplotypes might be breed-specific haplotypes that distinguish between fat-tailed and thin-tailed sheep breeds.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    Freshwater fishes in India are poorly known and plagued by many unresolved cryptic species complexes that masks some latent and endemic species. Limitations in traditional taxonomy have resulted in this crypticism. Hence, molecular approaches like DNA barcoding, are needed to diagnose these latent species. We have analyzed 1383 barcode sequences of 175 Indian freshwater fish species available in the databases, of which 172 sequences of 70 species were generated. The congeneric and conspecific genetic divergences were calculated using Kimura\'s 2 parameter distance model followed by the construction of a Neighbor Joining tree using the MEGA 5.1. DNA barcoding principle at its first hand approach, led to the straightforward identification of 82% of the studied species with 2.9% (S.E=0.2) divergence between the nearest congeners. However, after validating some cases of synonymy and mislabeled sequences, 5% more species were found to be valid. Sequences submitted to the database under different names were found to represent single species. On the other hand, some sequences of the species like Barilius barna, Barilius bendelisis and Labeo bata were submitted to the database under a single name but were found to represent either some unexplored species or latent species. Overall, 87% of the available Indian freshwater fish barcodes were diagnosed as true species in parity with the existing checklist and can act as reference barcode for the particular taxa. For the remaining 13% (21 species) the correct species name was difficult to assign as they depicted some erroneous identification and cryptic species complex. Thus, these barcodes will need further assay and inclusion of barcodes of more specimens from same and sister species.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    The present study aims to investigate small RNA interactions with putative disease response genes in the model grass species Brachypodium distachyon. The fungal pathogen Fusarium culmorum (Fusarium herein) and phytohormone salicylic acid treatment were used to induce the disease response in Brachypodium. Initially, 121 different putative disease response genes were identified using bioinformatic and homology based approaches. Computational prediction was used to identify 33 candidate new miRNA coding sequences, of which 9 were verified by analysis of small RNA sequence libraries. Putative Brachypodium miRNA target sites were identified in the disease response genes, and a subset of which were screened for expression and possible miRNA interactions in 5 different Brachypodium lines infected with Fusarium. An NBS-LRR family gene, 1g34430, was polymorphic among the lines, forming two major genotypes, one of which has its miRNA target sites deleted, resulting in altered gene expression during infection. There were siRNAs putatively involved in regulation of this gene, indicating a role of small RNAs in the B. distachyon disease response.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    WUSCHEL(WUS)相关的同源盒(WOX)基因家族在协调胚胎发育早期的基因转录中起着重要作用。在这项研究中,我们从普通小麦及其近缘小麦中分离并表征了WOX5,普通小麦,埃希洛普斯,Aegilopssearsii,夏龙士,龙骨埃希洛普斯,Aegilopsbicornis,Aegilopstauschii,和小麦。表征的WOX5等位基因的大小范围为1029至1038bp,包括完整的开放阅读框(ORF)以及5'上游和3'下游序列。结构域预测分析表明,小麦WOX5蛋白的一级结构除了WUS盒结构域和EAR样结构域外,还包括高度保守的同源结构域。存在于WOX蛋白家族的一些成员中。将全长ORF亚克隆到原核表达载体pET30a中,通过IPTG诱导,在大肠杆菌BL21(DE3)细胞中成功表达了约26kDa的蛋白。来自小麦相关物种的WOX5基因表现出与普通小麦的WOX5基因相似的结构和高度的序列相似性。WOX5等位基因之间的分歧程度和系统发育树分析表明,在A中存在三个同源拷贝,B,或普通小麦的D基因组。定量PCR结果表明,TaWOX5主要在生长素和细胞分裂素诱导的根和愈伤组织中表达,这表明TaWOX5可能与根的形成或发育有关,并且与体细胞胚胎发生中的激素调节有关。
    The WUSCHEL (WUS)-related homeobox (WOX) gene family plays an important role in coordinating gene transcription in the early phases of embryogenesis. In this study, we isolated and characterized WOX5 from common wheat and its relatives Triticum monococcum, Triticum urartu, Aegilops speltoides, Aegilops searsii, Aegilops sharonensis, Aegilops longissima, Aegilops bicornis, Aegilops tauschii, and Triticum turgidum. The size of the characterized WOX5 alleles ranged from 1029 to 1038 bp and encompassed the complete open reading frame (ORF) as well as 5\' upstream and 3\' downstream sequences. Domain prediction analysis showed that the putative primary structures of wheat WOX5 protein include the highly conserved homeodomain besides the WUS-box domain and the EAR-like domain, which is/are present in some members of the WOX protein family. The full-length ORF was subcloned into a prokaryotic expression vector pET30a, and an approximate 26-kDa protein was successfully expressed in Escherichia coli BL21 (DE3) cells with IPTG induction. The WOX5 genes from wheat-related species exhibit a similar structure to and high sequence similarity with WOX5 genes from common wheat. The degree of divergence and phylogenetic tree analysis among WOX5 alleles suggested the existence of three homoeologous copies in the A, B, or D genome of common wheat. Quantitative PCR results showed that TaWOX5 was primarily expressed in the root and calli induced by auxin and cytokinin, indicating that TaWOX5 may play a role related to root formation or development and is associated with hormone regulation in somatic embryogenesis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    Abnormal glycosylation of dystroglycan (DG), a transmembrane glycoprotein, results in a group of diseases known as dystroglycanopathy. A severe dystroglycanopathy known as the limb girdle disease MDDGC9 [OMIM: 613818] occurs as a result of hypoglycosylation of alpha subunit of DG. Reasons behind this has been traced back to a point mutation (T192M) in DG that leads to weakening of interactions of DG protein with laminin and subsequent loss of signal flow through the DG protein. In this work we have tried to analyze the molecular details of the interactions between DG and laminin1 in order to propose a mechanism about the onset of the disease MDDGC9. We have observed noticeable changes between the modeled structures of wild type and mutant DG proteins. We also have employed molecular docking techniques to study and compare the binding interactions between laminin1 and both the wild type and mutant DG proteins. The docking simulations have revealed that the mutant DG has weaker interactions with laminin1 as compared to the wild type DG. Till date there are no previous reports that deal with the elucidation of the interactions of DG with laminin1 from the molecular level. Our study is therefore the first of its kind which analyzes the differences in binding patterns of laminin1 with both the wild type and mutant DG proteins. Our work would therefore facilitate analysis of the molecular mechanism of the disease MDDGC9. Future work based on our results may be useful for the development of suitable drugs against this disease.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    沙发马铃薯(CPO)蛋白是通过昆虫的外周和中枢神经系统的RNA结合过程参与调节滞育的关键生物分子,并且最近在一些甲壳类物种中发现了这种蛋白。因此,外寄生co足类动物是有趣的模型物种,没有发育停滞的证据。本研究是第一个报道从鲑鱼虱Caligusrogercresseyi(CrCPO)克隆推定的CPO基因的报告,通过高通量转录组测序鉴定。此外,使用定量实时PCR评估幼虫和成虫的转录表达。CrCPOcDNA序列显示3261个碱基对(bp),由713bp的5UTR组成,3UTR的1741bp,和807bp的开放阅读框,编码268个氨基酸。高度保守的RNA结合区RNP2(LFVSGL)和RNP1(SPVGFVTF),以及二聚化位点(LEF),也被发现了。此外,检测到八个位于非翻译区的单核苷酸多态性和一个位于编码区的单核苷酸多态性。基因转录分析显示,CrCPO在幼虫阶段和成年个体中普遍存在表达,从无节幼体到co足阶段的表达最高。本研究表明CrCPO的推定生物学功能与鲑鱼虱神经系统的发育有关,并为与宿主-寄生虫相互作用相关的候选基因提供了分子证据。
    The couch potato (CPO) protein is a key biomolecule involved in regulating diapause through the RNA-binding process of the peripheral and central nervous systems in insects and also recently discovered in a few crustacean species. As such, ectoparasitic copepods are interesting model species that have no evidence of developmental arrest. The present study is the first to report on the cloning of a putative CPO gene from the salmon louse Caligus rogercresseyi (CrCPO), as identified by high-throughput transcriptome sequencing. In addition, the transcription expression in larvae and adults was evaluated using quantitative real-time PCR. The CrCPO cDNA sequence showed 3261 base pairs (bp), consisting of 713bp of 5\' UTR, 1741bp of 3\' UTR, and an open reading frame of 807bp encoding for 268 amino acids. The highly conserved RNA binding regions RNP2 (LFVSGL) and RNP1 (SPVGFVTF), as well the dimerization site (LEF), were also found. Furthermore, eight single nucleotide polymorphisms located in the untranslated regions and one located in the coding region were detected. Gene transcription analysis revealed that CrCPO has ubiquitous expression across larval stages and in adult individuals, with the highest expression from nauplius to copepodid stages. The present study suggests a putative biological function of CrCPO associated with the development of the nervous system in salmon lice and contributes molecular evidence for candidate genes related to host-parasite interactions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    印迹基因在哺乳动物的生长中起着重要作用,发展和行为。Rasgrf1(Ras蛋白特异性鸟嘌呤核苷酸交换因子1)基因已被鉴定为小鼠和大鼠的印迹基因。在本研究中,我们检测到了它的序列,家猪的印记状态和表达模式。在家猪中获得了位于Rasgrf1基因外显子14的228bp部分序列和位于外显子1的193bp部分序列。G/A过渡,在Rasgrf1外显子14中鉴定,然后,采用倒数Berkshire×Wannan黑色F1杂交模型和RT-PCR-RFLP方法检测1日龄发育阶段猪Rasgrf1基因的印迹状态。表达谱结果表明,猪Rasgrf1mRNA在脑中高表达,垂体和胰腺,其次是肾脏,胃,肺,睾丸,小肠,子房,脾脏和肝脏,在背最长肌中表达水平较低,心,和背脂。Rasgrf1基因在脑内的表达水平,两个相互的F1杂种之间的垂体和胰腺组织显着不同。印迹分析表明,猪Rasgrf1基因在肝脏中表达,小肠,父系在肺中表达,但是在大脑中双等位基因表达,心,脾,脾肾,胃,胰腺,背脂,睾丸,子房,背最长肌和垂体组织。
    Imprinted genes play important roles in mammalian growth, development and behavior. The Rasgrf1 (Ras protein-specific guanine nucleotide exchange factor 1) gene has been identified as an imprinted gene in mouse and rat. In the present study, we detected its sequence, imprinting status and expression pattern in the domestic pigs. A 228 bp partial sequence located in exon 14 and a 193 bp partial sequence located in exon 1 of the Rasgrf1 gene in domestic pigs were obtained. A G/A transition, was identified in Rasgrf1 exon 14, and then, the reciprocal Berkshire × Wannan black F1 hybrid model and the RT-PCR-RFLP method were used to detect the imprinting status of porcine Rasgrf1 gene at the developmental stage of 1-day-old. The expression profile results indicated that the porcine Rasgrf1 mRNA was highly expressed in brain, pituitary and pancreas, followed by kidney, stomach, lung, testis, small intestine, ovary, spleen and liver, and at low levels of expression in longissimus dorsi, heart, and backfat. The expression levels of Rasgrf1 gene in brain, pituitary and pancreas tissues were significantly different between the two reciprocal F1 hybrids. Imprinting analysis showed that porcine Rasgrf1 gene was maternally expressed in the liver, small intestine, paternally expressed in the lung, but biallelically expressed in brain, heart, spleen, kidney, stomach, pancreas, backfat, testis, ovary, longissimus dorsi and pituitary tissues.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号