Tandem repeats

串联重复
  • 文章类型: Journal Article
    冬瓜是一年生草本植物,不仅具有很高的营养价值和许多药用用途,而且还用作嫁接西瓜等瓜类作物的砧木,黄瓜和甜瓜.细胞器基因组为遗传育种提供了宝贵的资源。
    使用Illumina和OxfordNanoporeTechnology测序数据的混合策略来组装葫芦线粒体和叶绿体基因组。
    冬瓜线粒体基因组长度为357547bp,叶绿体基因组为157121bp。这些基因组有27个同源片段,占葫芦线粒体基因组总长度的6.50%。在线粒体基因组中,鉴定了101个简单序列重复(SSR)和10个串联重复。此外,显示1对重复序列介导同源重组为1个主要构象和1个次要构象。通过PCR扩增和Sanger测序验证了这些构象的存在。进化分析表明,葫芦的线粒体基因组序列高度保守。此外,共线性分析显示南瓜及其亲属的同源片段之间有许多重排。大多数基因的Ka/Ks值在0.3~0.9之间,这意味着冬瓜线粒体基因组中的大多数基因都在纯化选择中。我们还基于长非编码RNA(lncRNA)-seq数据在38个线粒体蛋白质编码基因(PCGs)上鉴定了总共589个潜在的RNA编辑位点。通过PCR扩增和Sanger测序成功验证了nad1-2,nad4L-2,atp6-718,atp9-223和rps10-391的RNA编辑位点。
    总而言之,我们组装并注释了冬瓜线粒体和叶绿体基因组,为类似的细胞器基因组研究提供了理论基础。
    UNASSIGNED: Bottle gourd is an annual herbaceous plant that not only has high nutritional value and many medicinal applications but is also used as a rootstock for the grafting of cucurbit crops such as watermelon, cucumber and melon. Organellar genomes provide valuable resources for genetic breeding.
    UNASSIGNED: A hybrid strategy with Illumina and Oxford Nanopore Technology sequencing data was used to assemble bottle gourd mitochondrial and chloroplast genomes.
    UNASSIGNED: The length of the bottle gourd mitochondrial genome was 357547 bp, and that of the chloroplast genome was 157121 bp. These genomes had 27 homologous fragments, accounting for 6.50% of the total length of the bottle gourd mitochondrial genome. In the mitochondrial genome, 101 simple sequence repeats (SSRs) and 10 tandem repeats were identified. Moreover, 1 pair of repeats was shown to mediate homologous recombination into 1 major conformation and 1 minor conformation. The existence of these conformations was verified via PCR amplification and Sanger sequencing. Evolutionary analysis revealed that the mitochondrial genome sequence of bottle gourd was highly conserved. Furthermore, collinearity analysis revealed many rearrangements between the homologous fragments of Cucurbita and its relatives. The Ka/Ks values for most genes were between 0.3~0.9, which means that most of the genes in the bottle gourd mitochondrial genome are under purifying selection. We also identified a total of 589 potential RNA editing sites on 38 mitochondrial protein-coding genes (PCGs) on the basis of long noncoding RNA (lncRNA)-seq data. The RNA editing sites of nad1-2, nad4L-2, atp6-718, atp9-223 and rps10-391 were successfully verified via PCR amplification and Sanger sequencing.
    UNASSIGNED: In conclusion, we assembled and annotated bottle gourd mitochondrial and chloroplast genomes to provide a theoretical basis for similar organelle genomic studies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    复制是分子进化的基础,也是基因组和复杂疾病的驱动因素。这里,我们开发了一种名为扩增编辑(AE)的基因组编辑工具,该工具可以在染色体尺度上精确地进行可编程的DNA复制。AE可以复制20bp至100Mb的人类基因组,与人类染色体相当的大小。AE在各种细胞类型中表现出活性,包括二倍体,单倍体,和原代细胞。1Mb的AE效率高达73.0%,100Mb的AE效率高达3.4%,分别。编辑序列的连接的全基因组测序和深度测序证实了复制的精确性。AE可以在胚胎干细胞的疾病相关区域内产生染色体微复制,表明其产生细胞和动物模型的潜力。AE是一种精确有效的染色体工程和DNA复制工具,将精确基因组编辑的前景从个体遗传基因位点扩展到染色体尺度。
    Duplication is a foundation of molecular evolution and a driver of genomic and complex diseases. Here, we develop a genome editing tool named Amplification Editing (AE) that enables programmable DNA duplication with precision at chromosomal scale. AE can duplicate human genomes ranging from 20 bp to 100 Mb, a size comparable to human chromosomes. AE exhibits activity across various cell types, encompassing diploid, haploid, and primary cells. AE exhibited up to 73.0% efficiency for 1 Mb and 3.4% for 100 Mb duplications, respectively. Whole-genome sequencing and deep sequencing of the junctions of edited sequences confirm the precision of duplication. AE can create chromosomal microduplications within disease-relevant regions in embryonic stem cells, indicating its potential for generating cellular and animal models. AE is a precise and efficient tool for chromosomal engineering and DNA duplication, broadening the landscape of precision genome editing from an individual genetic locus to the chromosomal scale.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    C型凝集素(CTL)是一类重要的模式识别受体(PRR),在无脊椎动物中表现出结构和功能多样性。重复的DNA序列在真核生物基因组中普遍存在,代表不同的基因组进化模式和促进新基因的产生。我们的研究揭示了一种新的CTL,它由两个长串联重复序列组成,丰富的苏氨酸,和一个碳水化合物识别域(CRD)在Exopalaemoncarinicauda,并已命名为EcTR-CTL。EcTR-CTL的全长cDNA长1242bp,开放阅读框(ORF)为999bp,编码332个氨基酸的蛋白质。EcTR-CTL的基因组构造包含4个外显子和3个内含子。EcTR-CTL中每个重复单元的长度为198bp,这与先前在对虾和小龙虾中报道的短串联重复不同。EcTR-CTL在肠和血细胞中大量表达。副溶血性弧菌和白斑综合征病毒(WSSV)攻击后,肠EcTR-CTL的表达水平上调。EcTR-CTL基因敲除下调抗脂多糖因子的表达,Crustin,和溶菌酶在弧菌感染期间。重组EcTR-CTLCRD(rCRD)可与细菌结合,脂多糖,和肽聚糖。此外,rCRD可以直接与WSSV结合。这些发现表明,1)具有串联重复的CTL可能在甲壳类动物中普遍存在,2)EcTR-CTL可能作为PRR通过非自我识别和抗菌肽调节参与细菌的先天免疫防御,3)EcTR-CTL可能通过捕获病毒粒子在WSSV感染过程中发挥积极或消极作用。
    C-type lectins (CTLs) are an important class of pattern recognition receptors (PRRs) that exhibit structural and functional diversity in invertebrates. Repetitive DNA sequences are ubiquitous in eukaryotic genomes, representing distinct modes of genome evolution and promoting new gene generation. Our study revealed a new CTL that is composed of two long tandem repeats, abundant threonine, and one carbohydrate recognition domain (CRD) in Exopalaemon carinicauda and has been designated EcTR-CTL. The full-length cDNA of EcTR-CTL was 1242 bp long and had an open reading frame (ORF) of 999 bp that encoded a protein of 332 amino acids. The genome structure of EcTR-CTL contains 4 exons and 3 introns. The length of each repeat unit in EcTR-CTL was 198 bp, which is different from the short tandem repeats reported previously in prawns and crayfish. EcTR-CTL was abundantly expressed in the intestine and hemocytes. After Vibrio parahaemolyticus and white spot syndrome virus (WSSV) challenge, the expression level of EcTR-CTL in the intestine was upregulated. Knockdown of EcTR-CTL downregulated the expression of anti-lipopolysaccharide factor, crustin, and lysozyme during Vibrio infection. The recombinant CRD of EcTR-CTL (rCRD) could bind to bacteria, lipopolysaccharides, and peptidoglycans. Additionally, rCRD can directly bind to WSSV. These findings indicate that 1) CTLs with tandem repeats may be ubiquitous in crustaceans, 2) EcTR-CTL may act as a PRR to participate in the innate immune defense against bacteria via nonself-recognition and antimicrobial peptide regulation, and 3) EcTR-CTL may play a positive or negative role in the process of WSSV infection by capturing virions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    罕见疾病包括影响一小部分人口的不同类型的遗传疾病。由于它们的遗传异质性和复杂性,鉴定这些病症的潜在遗传原因提出了重大挑战。传统的短序列测序(SRS)技术已广泛应用于罕见疾病的诊断和研究。由于短读长度的性质而具有局限性。近年来,长读取测序(LRS)技术已成为克服这些限制的有价值的工具.这篇小型综述简要概述了LRS在罕见疾病研究和诊断中的应用,包括识别引起疾病的串联重复扩增,结构变化,并综合分析LRS的致病变异。
    Rare diseases encompass a diverse group of genetic disorders that affect a small proportion of the population. Identifying the underlying genetic causes of these conditions presents significant challenges due to their genetic heterogeneity and complexity. Conventional short-read sequencing (SRS) techniques have been widely used in diagnosing and investigating of rare diseases, with limitations due to the nature of short-read lengths. In recent years, long read sequencing (LRS) technologies have emerged as a valuable tool in overcoming these limitations. This minireview provides a concise overview of the applications of LRS in rare disease research and diagnosis, including the identification of disease-causing tandem repeat expansions, structural variations, and comprehensive analysis of pathogenic variants with LRS.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    快速,准确地鉴定寄生虫对于迅速进行寄生虫病的治疗干预和有效的流行病学监测至关重要。为了准确有效的临床诊断,必须开发一种基于核酸的诊断工具,该工具将核酸扩增测试(NAAT)的灵敏度和特异性与速度相结合,成本效益,等温扩增方法的方便性。一种新的核酸检测方法,利用成簇的规则间隔短回文重复序列(CRISPR)相关(Cas)核酸酶,在即时测试(POCT)中持有承诺。CRISPR/Cas12a目前用于检测恶性疟原虫,弓形虫,血吸虫,血液中的其他寄生虫,尿液,或者粪便.与传统的化验相比,CRISPR检测已显示出显著的优势,包括可比的敏感性和特异性,反应结果的简单观察,方便和稳定的运输条件,设备依赖性低。然而,一个常见的问题是,扩增和顺式切割在一罐测定中竞争,导致反应时间延长。次优crRNA的使用,光活化的crRNA,和空间分离可能会削弱或完全消除扩增和顺式切割之间的竞争。这可以导致在一锅测定中增强的灵敏度和减少的反应时间。然而,较高的成本和复杂的预测试基因组提取阻碍了CRISPR/Cas12a在POCT中的普及。
    The rapid and accurate identification of parasites is crucial for prompt therapeutic intervention in parasitosis and effective epidemiological surveillance. For accurate and effective clinical diagnosis, it is imperative to develop a nucleic-acid-based diagnostic tool that combines the sensitivity and specificity of nucleic acid amplification tests (NAATs) with the speed, cost-effectiveness, and convenience of isothermal amplification methods. A new nucleic acid detection method, utilizing the clustered regularly interspaced short palindromic repeats (CRISPR)-associated (Cas) nuclease, holds promise in point-of-care testing (POCT). CRISPR/Cas12a is presently employed for the detection of Plasmodium falciparum, Toxoplasma gondii, Schistosoma haematobium, and other parasites in blood, urine, or feces. Compared to traditional assays, the CRISPR assay has demonstrated notable advantages, including comparable sensitivity and specificity, simple observation of reaction results, easy and stable transportation conditions, and low equipment dependence. However, a common issue arises as both amplification and cis-cleavage compete in one-pot assays, leading to an extended reaction time. The use of suboptimal crRNA, light-activated crRNA, and spatial separation can potentially weaken or entirely eliminate the competition between amplification and cis-cleavage. This could lead to enhanced sensitivity and reduced reaction times in one-pot assays. Nevertheless, higher costs and complex pre-test genome extraction have hindered the popularization of CRISPR/Cas12a in POCT.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    蒿属是一个大属,包括大约400种不同的物种,其中许多具有相当的药用和生态价值。然而,复杂的形态信息以及倍性水平和核DNA含量的变化对该属的进化研究提出了挑战。因此,属内分类不一致持续存在,阻碍了这种大型植物资源的利用。研究人员利用卫星DNA来帮助染色体识别,物种分类,和进化研究,因为它们在物种和近亲之间存在显著的序列和拷贝数变异。在本研究中,RepeatExplorer2管道被用来识别来自三个物种的10个卫星DNA(黄花蒿,寻常蒿,Artemisiaviridisquama),荧光原位杂交证实了它们在24个物种的染色体上的分布,包括19种蒿属,其中5种来自Ajania和菊花。卫星DNA的信号在物种之间表现出实质性差异。我们从序列中获得了一个属特异性卫星。此外,构建了寻常蒿的分子细胞遗传学图谱,白叶蒿,和艾蒿。一个物种(马鞭草蒿)显示出FISH分布模式,暗示了同种三倍体起源。在蒿属植物的同源染色体之间观察到高水平的异形FISH信号。此外,通过比较表意文字讨论了物种之间的相对关系。本研究的结果为使用分子细胞学方法对蒿属的准确鉴定和分类学提供了新的见解。
    Artemisia is a large genus encompassing about 400 diverse species, many of which have considerable medicinal and ecological value. However, complex morphological information and variation in ploidy level and nuclear DNA content have presented challenges for evolution studies of this genus. Consequently, taxonomic inconsistencies within the genus persist, hindering the utilization of such large plant resources. Researchers have utilized satellite DNAs to aid in chromosome identification, species classification, and evolutionary studies due to their significant sequence and copy number variation between species and close relatives. In the present study, the RepeatExplorer2 pipeline was utilized to identify 10 satellite DNAs from three species (Artemisia annua, Artemisia vulgaris, Artemisia viridisquama), and fluorescence in situ hybridization confirmed their distribution on chromosomes in 24 species, including 19 Artemisia species with 5 outgroup species from Ajania and Chrysanthemum. Signals of satellite DNAs exhibited substantial differences between species. We obtained one genus-specific satellite from the sequences. Additionally, molecular cytogenetic maps were constructed for Artemisia vulgaris, Artemisia leucophylla, and Artemisia viridisquama. One species (Artemisia verbenacea) showed a FISH distribution pattern suggestive of an allotriploid origin. Heteromorphic FISH signals between homologous chromosomes in Artemisia plants were observed at a high level. Additionally, the relative relationships between species were discussed by comparing ideograms. The results of the present study provide new insights into the accurate identification and taxonomy of the Artemisia genus using molecular cytological methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    糖基化磷脂酰肌醇(GPI)锚定蛋白质是真核生物中保守的翻译后修饰。GPI锚定蛋白广泛分布于真菌植物病原菌中,但是GPI锚定蛋白在菌核病致病性中的特定作用,一种全球分布的毁灭性的坏死植物病原体,基本上是未知的。这项研究针对SsGSR1,它编码一种名为SsGsr1的硬化链球菌甘氨酸和丝氨酸富集蛋白,具有N末端分泌信号和C末端GPI锚定信号。SsGsr1位于菌丝的细胞壁,SsGSR1的缺失导致细胞壁结构异常和菌丝细胞壁完整性受损。在感染的初始阶段,SsGSR1的转录水平最高,和SsGSR1缺失菌株在多个宿主中显示出毒力受损,表明SsGSR1对致病性至关重要。有趣的是,SsGsr1靶向宿主植物的质外体以诱导细胞死亡,这依赖于串联排列的富含甘氨酸的11个氨基酸重复序列。菌核病中SsGsr1的同源物,葡萄孢菌,和Monilinia物种包含较少的重复单元,并且失去了细胞死亡活性。此外,SsGSR1的等位基因变异体存在于油菜菌核菌的田间分离物中,并且缺少一个重复单元的变体之一导致相对于细胞死亡诱导活性和硬化链球菌的毒力表现出功能丧失的蛋白质。一起来看,我们的结果表明,串联重复序列的变化提供了GPI锚定细胞壁蛋白的功能多样性,在硬化链球菌和其他坏死性病原体中,允许宿主植物成功定殖。重要性菌核病是一种重要的植物坏死性病原菌,主要应用细胞壁降解酶和草酸在定殖前杀死植物细胞。在这项研究中,我们鉴定了一种名为SsGsr1的糖基磷脂酰肌醇(GPI)锚定的细胞壁蛋白,该蛋白对细胞壁结构和硬化链球菌的致病性至关重要。此外,SsGsr1诱导依赖于富含甘氨酸的串联重复的宿主植物的快速细胞死亡。有趣的是,重复单元的数量在SsGsr1的同源物和等位基因之间有所不同,并且这种变异会在细胞死亡诱导活性和致病性中产生改变。这项工作促进了我们对串联重复序列变异的理解,以加速与坏死性真菌病原体致病性相关的GPI锚定细胞壁蛋白的进化,并为更全面地理解硬核链球菌和寄主植物之间的相互作用做好了准备。
    Glycosylphosphatidylinositol (GPI) anchoring of proteins is a conserved posttranslational modification in eukaryotes. GPI-anchored proteins are widely distributed in fungal plant pathogens, but the specific roles of the GPI-anchored proteins in the pathogenicity of Sclerotinia sclerotiorum, a devastating necrotrophic plant pathogen with a worldwide distribution, remain largely unknown. This research addresses SsGSR1, which encodes an S. sclerotiorum glycine- and serine-rich protein named SsGsr1 with an N-terminal secretory signal and a C-terminal GPI-anchor signal. SsGsr1 is located at the cell wall of hyphae, and deletion of SsGSR1 leads to abnormal cell wall architecture and impaired cell wall integrity of hyphae. The transcription levels of SsGSR1 were maximal in the initial stage of infection, and SsGSR1-deletion strains showed impaired virulence in multiple hosts, indicating that SsGSR1 is critical for the pathogenicity. Interestingly, SsGsr1 targeted the apoplast of host plants to induce cell death that relies on the glycine-rich 11-amino-acid repeats arranged in tandem. The homologs of SsGsr1 in Sclerotinia, Botrytis, and Monilinia species contain fewer repeat units and have lost their cell death activity. Moreover, allelic variants of SsGSR1 exist in field isolates of S. sclerotiorum from rapeseed, and one of the variants lacking one repeat unit results in a protein that exhibits loss of function relative to the cell death-inducing activity and the virulence of S. sclerotiorum. Taken together, our results demonstrate that a variation in tandem repeats provides the functional diversity of GPI-anchored cell wall protein that, in S. sclerotiorum and other necrotrophic pathogens, allows successful colonization of the host plants. IMPORTANCE Sclerotinia sclerotiorum is an economically important necrotrophic plant pathogen and mainly applies cell wall-degrading enzymes and oxalic acid to kill plant cells before colonization. In this research, we characterized a glycosylphosphatidylinositol (GPI)-anchored cell wall protein named SsGsr1, which is critical for the cell wall architecture and the pathogenicity of S. sclerotiorum. Additionally, SsGsr1 induces rapid cell death of host plants that is dependent on glycine-rich tandem repeats. Interestingly, the number of repeat units varies among homologs and alleles of SsGsr1, and such a variation creates alterations in the cell death-inducing activity and the role in pathogenicity. This work advances our understanding of the variation of tandem repeats in accelerating the evolution of a GPI-anchored cell wall protein associated with the pathogenicity of necrotrophic fungal pathogens and prepares the way toward a fuller understanding of the interaction between S. sclerotiorum and host plants.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:等温扩增被认为是即时检测分子诊断的最有前途的工具之一。然而,非特异性扩增严重阻碍了其临床应用。因此,研究非特异性扩增的确切机制和开发高特异性等温扩增方法非常重要。
    方法:将四组引物与BstDNA聚合酶一起孵育以产生非特异性扩增。凝胶电泳,DNA测序,和序列功能分析用于研究非特异性产物产生的机制,被发现是非特异性拖尾和复制滑移介导的串联重复序列产生(NT&RS)。利用这些知识,一种新的等温扩增技术,桥接引物辅助滑动等温扩增(BASIS),已开发。
    结果:在NT&RS期间,BstDNA聚合酶在DNA的3'末端触发非特异性拖尾,从而随着时间的推移产生粘性末端DNA。这些粘性DNA之间的杂交和延伸产生重复的DNA,这可以通过复制滑移触发自我扩展,从而导致非特异性串联重复(TRs)产生和非特异性扩增。根据NT&RS,我们开发了基础检测方法。TheBASISiscarriedoutbyusingawell-designedbridgingprimer,可以与基于引物的扩增子形成杂交体,从而产生特定的重复DNA并触发特定的扩增。BASIS可以检测10个拷贝的目标DNA,抵抗干扰DNA破坏,并提供基因分型能力,从而为16型人乳头瘤病毒检测提供100%的准确性。
    结论:我们发现了Bst介导的非特异性TR产生的机制,并开发了一种新型的等温扩增测定法(BASIS),可以检测核酸,具有较高的灵敏度和特异性。
    Isothermal amplification is considered to be one of the most promising tools for point-of-care testing molecular diagnosis. However, its clinical application is severely hindered by nonspecific amplification. Thus, it is important to investigate the exact mechanism of nonspecific amplification and develop a high-specific isothermal amplification assay.
    Four sets of primer pairs were incubated with Bst DNA polymerase to produce nonspecific amplification. Gel electrophoresis, DNA sequencing, and sequence function analysis were used to investigate the mechanism of nonspecific product generation, which was discovered to be nonspecific tailing and replication slippage mediated tandem repeats generation (NT&RS). Using this knowledge, a novel isothermal amplification technology, bridging primer assisted slippage isothermal amplification (BASIS), was developed.
    During NT&RS, the Bst DNA polymerase triggers nonspecific tailing on the 3\'-ends of DNAs, thereby producing sticky-end DNAs over time. The hybridization and extension between these sticky DNAs generate repetitive DNAs, which can trigger self-extension via replication slippage, thereby leading to nonspecific tandem repeats (TRs) generation and nonspecific amplification. Based on the NT&RS, we developed the BASIS assay. The BASIS is carried out by using a well-designed bridging primer, which can form hybrids with primer-based amplicons, thereby generating specific repetitive DNA and triggering specific amplification. The BASIS can detect 10 copies of target DNA, resist interfering DNA disruption, and provide genotyping ability, thereby offering 100% accuracy for type 16 human papillomavirus detection.
    We discovered the mechanism for Bst-mediated nonspecific TRs generation and developed a novel isothermal amplification assay (BASIS), which can detect nucleic acids with high sensitivity and specificity.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:CircRNAs对于转录后基因表达的调节至关重要,包括miRNA海绵,并在疾病发展中发挥重要作用。最近已经提出了一些计算工具来预测circRNA,由于只使用一个分类器,还有很多可以做的,以提高性能。
    结果:提出了StackCirRNAPred,基于堆叠策略的长circRNA与其他lncRNA的计算分类。为了解决一个潜在的问题,即单个特征可能无法很好地区分circRNA和其他lncRNA,我们首先从不同的来源提取特征,包括核酸成分,序列空间特征和物理化学性质,Alu和串联重复。我们创新地应用堆叠策略来集成RF的更有优势的分类器,LightGBM,XGBoost。这允许模型更灵活地结合这些特征。发现StackCirRNAPred明显优于其他工具,精确地,准确度,F1、召回率和MCC分别为0.843、0.833、0.831、0.819和0.666。我们直接在鼠标数据集上测试了它。StackCirRNAPred仍然明显优于其他方法,精确地,准确度,F1、召回率和MCC的0.837、0.839、0.839、0.841、0.677。
    结论:我们基于堆叠策略提出了StackCirRNAPred,以区分长circRNAs与其他lncRNAs。测试结果证明了StackCirRNAPred的有效性和鲁棒性,我们希望StackCirRNAPred将补充现有的circRNA预测方法,并有助于下游研究。
    BACKGROUND: CircRNAs are essential for the regulation of post-transcriptional gene expression, including as miRNA sponges, and play an important role in disease development. Some computational tools have been proposed recently to predict circRNA, since only one classifier is used, there is still much that can be done to improve the performance.
    RESULTS: StackCirRNAPred was proposed, the computational classification of long circRNA from other lncRNA based on stacking strategy. In order to cope with the potential problem that a single feature might not be able to distinguish circRNA well from other lncRNA, we first extracted features from different sources, including nucleic acid composition, sequence spatial features and physicochemical properties, Alu and tandem repeats. We innovatively apply the stacking strategy to integrate the more advantageous classifiers of RF, LightGBM, XGBoost. This allows the model to incorporate these features more flexibly. StackCirRNAPred was found to be significantly better than other tools, with precision, accuracy, F1, recall and MCC of 0.843, 0.833, 0.831, 0.819 and 0.666 respectively. We tested it directly on the mouse dataset. StackCirRNAPred was still significantly better than other methods, with precision, accuracy, F1, recall and MCC of 0.837, 0.839, 0.839, 0.841, 0.677.
    CONCLUSIONS: We proposed StackCirRNAPred based on stacking strategy to distinguish long circRNAs from other lncRNAs. With the test results demonstrating the validity and robustness of StackCirRNAPred, we hope StackCirRNAPred will complement existing circRNA prediction methods and is helpful in down-stream research.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    昆虫线粒体基因组(有丝分裂基因组)对探索分子进化非常感兴趣,系统发育学,和生物地理学。到目前为止,在GenBank中只有12个叶蝇部落的有丝分裂基因组被释放,尽管该部落由488个已知物种组成,包括一些农业物种,林业,和园艺害虫。为了比较和分析衣藻科13个蛋白质编码基因(PCGs)的线粒体基因组结构,包括牙本质在内的五个物种的完整有丝分裂基因组,Sahlbergotettixsalicola,乳香,结香,通过下一代测序确定了一个新属的新种。新确定的有丝分裂基因组的大小范围从14,733bp到15,044bp,包括13个PCG的标准集,22个转移RNA基因,两个核糖体RNA基因,和长非编码控制区(CR)。净化选择的程度在部落和家庭中呈现出不同的画面。Idiocerin中不太明显的基因(0.5 Insect mitochondrial genomes (mitogenomes) are of great interest in exploring molecular evolution, phylogenetics, and biogeography. So far, only 12 mitogenomes of the leafhopper tribe Idiocerini have been released in GenBank, although the tribe comprises 488 known species including some agricultural, forestry, and horticultural pests. In order to compare and analyze the mitochondrial genome structure of Idiocerini and even the selective pressure of 13 protein-coding genes (PCGs) of the family Cicadellidae, the complete mitogenomes of five species including Nabicerus dentimus, Sahlbergotettix salicicola, Podulmorinus opacus, Podulmorinus consimilis, and a new species of a new genus were determined by next-generation sequencing. The size of the newly determined mitogenomes ranged from 14,733 bp to 15,044 bp, comprising the standard set of 13 PCGs, 22 transfer RNA genes, two ribosomal RNA genes, and a long non-coding control region (CR). The extent of purifying selection presented different pictures in the tribe and the family. The less pronounced genes (0.5 < dN/dS < 1) were nad5 and nad4l in Idiocerin, whereas in the family Cicadellidae including the sequences of Idiocerin, nad1-nad6 and cox1 genes were less pronounced. The codon encoding leucine was the most common in all species, and the codon encoding serine 1 was the most common in all species except for P. opacus. Interestingly, in P. opacus, another of the most common codons is that encoding serine 2. Among the 17 examined species of the Idiocerini, 14 species contained the tandem repeats, and 11 species of them contained the motif \"TTATA\". These findings will promote research on the structure and evolution of the mitochondrial genome and highlight the need for more mitogenomes in Cicadellidae.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号