Conserved sequence

保守序列
  • 文章类型: Journal Article
    蓝舌(BT)对畜牧业构成重大威胁,影响各种动物物种,造成巨大的经济损失。许多BT病毒(BTV)血清型的存在阻碍了控制工作,强调广谱疫苗的需要。
    在这项研究中,我们评估了BTV关键非结构(NS)蛋白中的保守氨基酸序列,并鉴定了大量高度保守的鼠和牛特异性MHCI类限制性(MHC-I)CD8+和MHC-II限制性CD4+表位.然后我们筛选了这些保守表位的抗原性,变应原性,毒性,和溶解度。利用这些表位,我们开发了以Toll样受体(TLR-4)激动剂为基础的广谱多表位疫苗.使用C-IMMSIM服务器在计算机中评估预测的促炎细胞因子应答。使用Robetta和GalaxyWEB服务器实现了结构建模和细化。最后,我们通过广泛的100纳秒分子动力学模拟评估了对接复合物的稳定性,然后考虑将疫苗用于密码子优化和计算机模拟克隆.
    我们在NS1和NS2蛋白中发现了许多符合这些标准的表位,并开发了硅广谱疫苗。免疫模拟研究表明,这些疫苗在接种组中诱导高水平的IFN-γ和IL-2。蛋白质-蛋白质对接分析证明了对TLR-4具有强结合亲和力的有希望的表位。对接的复合物是稳定的,具有最小的均方根偏差和均方根波动值。最后,模拟克隆质粒的GC含量较高,密码子适应指数>0.8,表明它们适合在原核系统中表达蛋白质疫苗。
    这些下一代疫苗设计很有前景,需要在湿实验室实验中进一步研究以评估其免疫原性。安全,和功效在家畜中的实际应用。我们的发现为开发一个全面的,广谱疫苗,可能彻底改变畜牧业的BT控制和预防策略。
    UNASSIGNED: Bluetongue (BT) poses a significant threat to the livestock industry, affecting various animal species and resulting in substantial economic losses. The existence of numerous BT virus (BTV) serotypes has hindered control efforts, highlighting the need for broad-spectrum vaccines.
    UNASSIGNED: In this study, we evaluated the conserved amino acid sequences within key non-structural (NS) proteins of BTV and identified numerous highly conserved murine- and bovine-specific MHC class I-restricted (MHC-I) CD8+ and MHC-II-restricted CD4+ epitopes. We then screened these conserved epitopes for antigenicity, allergenicity, toxicity, and solubility. Using these epitopes, we developed in silico-based broad-spectrum multiepitope vaccines with Toll-like receptor (TLR-4) agonists. The predicted proinflammatory cytokine response was assessed in silico using the C-IMMSIM server. Structural modeling and refinement were achieved using Robetta and GalaxyWEB servers. Finally, we assessed the stability of the docking complexes through extensive 100-nanosecond molecular dynamics simulations before considering the vaccines for codon optimization and in silico cloning.
    UNASSIGNED: We found many epitopes that meet these criteria within NS1 and NS2 proteins and developed in silico broad-spectrum vaccines. The immune simulation studies revealed that these vaccines induce high levels of IFN-γ and IL-2 in the vaccinated groups. Protein-protein docking analysis demonstrated promising epitopes with strong binding affinities to TLR-4. The docked complexes were stable, with minimal Root Mean Square Deviation and Root Mean Square Fluctuation values. Finally, the in silico-cloned plasmids have high % of GC content with > 0.8 codon adaptation index, suggesting they are suitable for expressing the protein vaccines in prokaryotic system.
    UNASSIGNED: These next-generation vaccine designs are promising and warrant further investigation in wet lab experiments to assess their immunogenicity, safety, and efficacy for practical application in livestock. Our findings offer a robust framework for developing a comprehensive, broad-spectrum vaccine, potentially revolutionizing BT control and prevention strategies in the livestock industry.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    胰岛素降解酶(IDE)是属于M16A金属蛋白酶家族的锌依赖性金属内肽酶。IDE在大脑中明显表达,其中由于其体外淀粉样β(Aβ)降解活性而特别相关。IDE的亚细胞定位,了解这种酶如何在体内执行其蛋白水解功能的最重要方面,仍然极具争议。在这项工作中,我们从进化的角度讨论了IDE亚细胞定位。进行了基于蛋白质序列以及基因和蛋白质结构的系统发育分析。IDE信号肽的计算机模拟分析表明,在原核生物/真核生物的分裂中,IDE的出口发生了进化变化。小胶质细胞的亚细胞定位实验表明,IDE主要是胞质。此外,IDE通过其细胞质侧与膜结合,并在移植物和非移植物结构域之间进一步分配。当受到刺激时,小胶质细胞转变为分泌活跃状态,产生许多多囊泡体,并与它们的膜结合。随后这些膜的向内出芽使IDE内在腔内囊泡中,这后来允许IDE在小的细胞外囊泡中输出到细胞外。我们进一步证明,在生理条件下以及在衰老和神经变性时,这种IDE的输出机制受到与小胶质细胞相关的刺激的调节。
    The insulin-degrading enzyme (IDE) is a zinc-dependent metalloendopeptidase that belongs to the M16A metalloprotease family. IDE is markedly expressed in the brain, where it is particularly relevant due to its in vitro amyloid beta (Aβ)-degrading activity. The subcellular localization of IDE, a paramount aspect to understand how this enzyme can perform its proteolytic functions in vivo, remains highly controversial. In this work, we addressed IDE subcellular localization from an evolutionary perspective. Phylogenetic analyses based on protein sequence and gene and protein structure were performed. An in silico analysis of IDE signal peptide suggests an evolutionary shift in IDE exportation at the prokaryote/eukaryote divide. Subcellular localization experiments in microglia revealed that IDE is mostly cytosolic. Furthermore, IDE associates to membranes by their cytoplasmatic side and further partitions between raft and non-raft domains. When stimulated, microglia change into a secretory active state, produces numerous multivesicular bodies and IDE associates with their membranes. The subsequent inward budding of such membranes internalizes IDE in intraluminal vesicles, which later allows IDE to be exported outside the cells in small extracellular vesicles. We further demonstrate that such an IDE exportation mechanism is regulated by stimuli relevant for microglia in physiological conditions and upon aging and neurodegeneration.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    ATP敏感性钾(K-ATP)通道在几个器官的细胞质膜上普遍表达,包括心脏,胰腺,和大脑,它们控制着广泛的生理过程。在胰腺β细胞中,由Kir6.2和SUR1组成的K-ATP通道在偶联血糖和胰岛素分泌中起关键作用。位于跨膜螺旋的胞质端的色氨酸残基在真核生物和原核生物Kir通道中高度保守。该氨基酸上的任何突变都会导致功能增强和新生儿糖尿病。在这项研究中,我们已经研究了突变对KirBac通道(哺乳动物Kir6.2的原核同源物)上这种高度保守的残基的影响。我们使用HDX-MS提供了突变体Kirbac3.1W46R(相当于Kir6.2中的W68R)的晶体结构及其构象灵活性。此外,使用计算机模拟方法研究了门控过程中突变体的详细动力学视图。最后,已经进行了功能测定。野生型KirBac和突变体W46R之间的门控机制的重要结构决定子的比较表明了有趣的结构和动力学线索以及导致功能获得的突变的作用机制。
    ATP-sensitive potassium (K-ATP) channels are ubiquitously expressed on the plasma membrane of cells in several organs, including the heart, pancreas, and brain, and they govern a wide range of physiological processes. In pancreatic β-cells, K-ATP channels composed of Kir6.2 and SUR1 play a key role in coupling blood glucose and insulin secretion. A tryptophan residue located at the cytosolic end of the transmembrane helix is highly conserved in eukaryote and prokaryote Kir channels. Any mutation on this amino acid causes a gain of function and neonatal diabetes mellitus. In this study, we have investigated the effect of mutation on this highly conserved residue on a KirBac channel (prokaryotic homolog of mammalian Kir6.2). We provide the crystal structure of the mutant KirBac3.1 W46R (equivalent to W68R in Kir6.2) and its conformational flexibility properties using HDX-MS. In addition, the detailed dynamical view of the mutant during the gating was investigated using the in silico method. Finally, functional assays have been performed. A comparison of important structural determinants for the gating mechanism between the wild type KirBac and the mutant W46R suggests interesting structural and dynamical clues and a mechanism of action of the mutation that leads to the gain of function.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:长链非编码RNA代表一大类转录本,具有两个共同特征:它们超过200nt的任意长度阈值,并被认为不编码蛋白质。尽管越来越多的证据表明绝大多数lncRNAs可能是无功能的,数以百计的它们已经被发现执行必要的基因调节功能或与许多细胞过程有关,包括与人类疾病的病因有关的那些。为了更好地理解lncRNAs的生物学,对它们的演变进行更深入的研究至关重要。与蛋白质编码转录物相反,然而,它们没有显示出通常由纯化选择产生的强序列保守性;因此,通常用于解析蛋白质编码基因和转录本的进化关系的软件不适用于lncRNAs的研究。
    结果:为了解决这个问题,我们开发了lncEvo,一个由三个模块组成的计算管道:(1)从RNA-Seq数据组装转录组,(2)lncRNAs的预测,和(3)保守研究-两个感兴趣的物种之间的lncRNA转录组的全基因组比较,包括搜索直系同源物。重要的是,可以选择仅将lncEvo应用于转录组组装或lncRNA预测,而不调用与保护有关的部分。
    结论:lncEvo是使用Nextflow框架构建的一体化工具,利用最先进的软件和算法,在速度和灵敏度之间进行可定制的权衡,易于使用和内置的报告功能。管道的源代码可在MIT许可下免费用于学术和非学术用途,网址为https://gitlab.com/spirit678/lncrna_conservation_nf。
    BACKGROUND: Long noncoding RNAs represent a large class of transcripts with two common features: they exceed an arbitrary length threshold of 200 nt and are assumed to not encode proteins. Although a growing body of evidence indicates that the vast majority of lncRNAs are potentially nonfunctional, hundreds of them have already been revealed to perform essential gene regulatory functions or to be linked to a number of cellular processes, including those associated with the etiology of human diseases. To better understand the biology of lncRNAs, it is essential to perform a more in-depth study of their evolution. In contrast to protein-encoding transcripts, however, they do not show the strong sequence conservation that usually results from purifying selection; therefore, software that is typically used to resolve the evolutionary relationships of protein-encoding genes and transcripts is not applicable to the study of lncRNAs.
    RESULTS: To tackle this issue, we developed lncEvo, a computational pipeline that consists of three modules: (1) transcriptome assembly from RNA-Seq data, (2) prediction of lncRNAs, and (3) conservation study-a genome-wide comparison of lncRNA transcriptomes between two species of interest, including search for orthologs. Importantly, one can choose to apply lncEvo solely for transcriptome assembly or lncRNA prediction, without calling the conservation-related part.
    CONCLUSIONS: lncEvo is an all-in-one tool built with the Nextflow framework, utilizing state-of-the-art software and algorithms with customizable trade-offs between speed and sensitivity, ease of use and built-in reporting functionalities. The source code of the pipeline is freely available for academic and nonacademic use under the MIT license at https://gitlab.com/spirit678/lncrna_conservation_nf .
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    蛋白质相互作用在细胞的不同功能中起着至关重要的作用,并且对于我们对健康和疾病中的细胞过程的理解至关重要。在这里,我们介绍GalaxyInteractoMIX(http://星系。interactomix.com),一个由13种不同的计算工具组成的平台,每个工具都解决蛋白质-蛋白质相互作用研究的特定方面,从大规模的跨物种蛋白质范围的相互作用到蛋白质复合物的原子分辨率水平。GalaxyInteractoMIX提供了一个直观的界面,用户可以在其中检索分布在多个数据库中的整合的互动数据,或者通过分析这些疾病的潜在互动来发现疾病和基因之间的联系。该平台利用基序的保守性,使大规模预测和管理蛋白质相互作用成为可能,interology,或是否存在密钥序列签名。基于结构的工具范围包括蛋白质复合物的建模和分析,界面的描绘和作为蛋白质-蛋白质相互作用抑制剂的肽的建模。GalaxyInteractoMIX包括一系列随时可用的工作流程,以运行复杂的分析,需要用户进行最少的干预。该平台的潜在应用范围涵盖生命科学的不同方面,生物医学,研究蛋白质关联的生物技术和药物发现。
    Protein interactions play a crucial role among the different functions of a cell and are central to our understanding of cellular processes both in health and disease. Here we present Galaxy InteractoMIX (http://galaxy.interactomix.com), a platform composed of 13 different computational tools each addressing specific aspects of the study of protein-protein interactions, ranging from large-scale cross-species protein-wide interactomes to atomic resolution level of protein complexes. Galaxy InteractoMIX provides an intuitive interface where users can retrieve consolidated interactomics data distributed across several databases or uncover links between diseases and genes by analyzing the interactomes underlying these diseases. The platform makes possible large-scale prediction and curation protein interactions using the conservation of motifs, interology, or presence or absence of key sequence signatures. The range of structure-based tools includes modeling and analysis of protein complexes, delineation of interfaces and the modeling of peptides acting as inhibitors of protein-protein interactions. Galaxy InteractoMIX includes a range of ready-to-use workflows to run complex analyses requiring minimal intervention by users. The potential range of applications of the platform covers different aspects of life science, biomedicine, biotechnology and drug discovery where protein associations are studied.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    一般情况下,有助于对科学知识进行系统的审查和探索的知识管理工具在循证医学中具有明显的潜在重要性。而且还基于这里考虑的病毒蛋白质的蛋白质子序列和折叠基序设计治疗方法。在探索病毒中不太明显的治疗目标(对于研究人员来说,新的知识很重要)时,快速访问从互联网上的各种来源和不断增长的知识库中收集的相关知识元素的束(集群)似乎特别有帮助。以及使用以下概念时。跨菌株和物种保守的蛋白质的氨基酸残基序列的亚序列(a)更可能是重要的靶标,和(b)不太可能表现出使其对疫苗和治疗剂具有抗性的逃逸突变。然而,作者使用的术语“保守”甚至“高度保守”都是学位问题,根据他们与SARS-CoV-2的距离,他们希望去比较其他序列。与作为病毒受体的人ACE2蛋白的结合位点和刺突糖蛋白上的人抗体CR3022结合位点根据本研究和先前研究中使用的标准是相当可变的。为了寻找更严格保守的目标,检查SARS-CoV-2的开放阅读框的高度保守区域,意味着在许多病毒和生物体中都可以识别。最突出的是在SARS-CoV-2非结构蛋白3(Nsp3)中发现的基序。它涉及一种称为宏域类型的折叠,并且在包括人类在内的生物体中具有非常广泛的分布,具有显着的同源性,涉及三个特别保守的子序列(a)VVVNAANVYLKHGGGVAGALNK,(b)LHVVGPNVNKG,和(c)PLLSAGIFG。仔细研究这些和它们之间和周围的更多可变序列的变化可能会提供更精细的“手术刀”,以确保抑制病毒的重要功能,而不会损害相关宿主宏域的功能。
    Knowledge management tools that assist in systematic review and exploration of scientific knowledge generally are of obvious potential importance in evidence based medicine in general, but also to the design of therapeutics based on the protein subsequences and fold motifs of virus proteins as considered here. Rapid access to bundles (clusters) of related elements of knowledge gathered from diverse sources on the Internet and from growing knowledge repositories seem particularly helpful when exploring less obvious therapeutic targets in viruses (for which knowledge new to the researcher is important), and when using the following concept. Subsequences of amino acid residue sequences of proteins that are conserved across strains and species are (a) more likely to be important targets and (b) less likely to exhibit escape mutations that would make them resistant to vaccines and therapeutic agents. However, the terms \"conserved\" and even \"highly conserved\" used by authors are matters of degree, depending on how distant from SARS-CoV-2 they wished to go in comparing other sequences. The binding site to the human ACE2 protein as virus receptor and human antibody CR3022 binding site on the spike glycoprotein are rather variable by the criteria used in the present and preceding studies. To look for more strongly conserved targets, open reading frames of SARS-CoV-2 were examined for extremely highly conserved regions, meaning recognizable across many viruses and organisms. Most prominent is a motif found in SARS-CoV-2 non-structural protein 3 (Nsp3). It relates to a fold called type called the macro domain and has remarkably wide distribution across organisms including humans with significant homologies involving three especially conserved subsequences (a) VVVNAANVYLKHGGGVAGALNK, (b) LHVVGPNVNKG, and (c) PLLSAGIFG. Careful study of the variations of these and of the more variable sequences between and around them might provide a finer \"scalpel\" to ensure inhibition of a vital function of the virus without impairing the functions of related host macro domains.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Leishmaniasis is a tropical neglected disease that imposes major health concerns in many endemic countries worldwide and requires urgent attention to the identification of new drug targets as well as drug candidates. In the current study, we propose homoserine kinase (HSK) inhibition as a strategy to induce pathogen mortality via generating threonine deficiency. We introduce a homology-based molecular model of leishmanial HSK that appears to possess all conserved structural as well as functional features in the GHMP kinase family. Furthermore, 200 ns molecular dynamics data of the enzyme in open and closed state attempts to provide the mechanistic details involved in the substrate as well as phosphate binding to this enzyme. We discuss the structural and functional significance of movements involved in various loops (motif 1, 2, 3) and lips (upper and lower) in the transition of leishmanial HSK from closed to open state. Virtual screening data of more than 40,000 compounds from the present investigation tries to identify a few potential HSK inhibitors that possess important features to act as efficient HSK inhibitors. These compounds can be considered an effective starting point for the identification of novel drug-like scaffolds. We hope the structural wealth that is offered in this report will be utilized in designing competent experimental and therapeutic interventions for leishmaniasis management. Graphical abstract.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    BACKGROUND: Type 2 diabetes mellitus (T2DM) is a worldwide disease that have an impact on individuals of all ages causing micro and macro vascular impairments due to hyperglycemic internal environment. For ultimate treatment to cure T2DM, association of diabetes with immune components provides a strong basis for immunotherapies and vaccines developments that could stimulate the immune cells to minimize the insulin resistance and initiate gluconeogenesis through an insulin independent route.
    METHODS: Immunoinformatics based approach was used to design a polyvalent vaccine for T2DM that involved data accession, antigenicity analysis, T-cell epitopes prediction, conservation and proteasomal evaluation, functional annotation, interactomic and in silico binding affinity analysis.
    RESULTS: We found the binding affinity of antigenic peptides with major histocompatibility complex (MHC) Class-I molecules for immune activation to control T2DM. We found 13-epitopes of 9 amino acid residues for multiple alleles of MHC class-I bears significant binding affinity. The downstream signaling resulted by T-cell activation is directly regulated by the molecular weight, amino acid properties and affinity of these epitopes. Each epitope has important percentile rank with significant ANN IC50 values. These high score potential epitopes were linked using AAY, EAAAK linkers and HBHA adjuvant to generate T-cell polyvalent vaccine with a molecular weight of 35.6 kDa containing 322 amino acids residues. In silico analysis of polyvalent construct showed the significant binding affinity (- 15.34 Kcal/mol) with MHC Class-I. This interaction would help to understand our hypothesis, potential activation of T-cells and stimulatory factor of cytokines and GLUT1 receptors.
    CONCLUSIONS: Our system-level immunoinformatics approach is suitable for designing potential polyvalent therapeutic vaccine candidates for T2DM by reducing hyperglycemia and enhancing metabolic activities through the immune system.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    蛋白质合成的调节是基因表达的重要步骤。该过程在顺式作用RNA元件和反式作用因子的控制下。Gemin5是组织在不同结构域中的多功能RNA结合蛋白。该蛋白质带有一个非规范的RNA结合位点,指定为RBS1,在C端。在其他细胞RNA中,RBS1区识别位于Gemin5mRNA编码区内的序列,称为H12。RBS1的表达刺激携带H12序列的RNA报告基因的翻译,抵消Gemin5对整体蛋白质合成的负面影响。对整个进化规模的RBS1蛋白和H12RNA变异性的计算分析预测了氨基酸和核苷酸的共同进化对。RBS1足迹和凝胶移位测定表明鉴定的共进化对与RNA-蛋白质相互作用之间呈正相关。RBS1的共进化残基有助于识别茎环SL1,这是H12的RNA结构元件,包含共进化核苷酸。的确,RBS1蛋白在共进化残基P1297或S1299S1300上携带取代,大大降低了SL1结合。与野生型RBS1蛋白不同,这些突变蛋白在细胞中的表达未能增强携带H12序列的mRNA报告基因的翻译刺激。因此,Gemin5的RBS1结构域内的PXSS基序及其mRNA的RNA结构基序SL1似乎在微调该必需蛋白的表达水平中起关键作用。
    Regulation of protein synthesis is an essential step of gene expression. This process is under the control of cis-acting RNA elements and trans-acting factors. Gemin5 is a multifunctional RNA-binding protein organized in distinct domains. The protein bears a non-canonical RNA-binding site, designated RBS1, at the C-terminal end. Among other cellular RNAs, the RBS1 region recognizes a sequence located within the coding region of Gemin5 mRNA, termed H12. Expression of RBS1 stimulates translation of RNA reporters carrying the H12 sequence, counteracting the negative effect of Gemin5 on global protein synthesis. A computational analysis of RBS1 protein and H12 RNA variability across the evolutionary scale predicts coevolving pairs of amino acids and nucleotides. RBS1 footprint and gel-shift assays indicated a positive correlation between the identified coevolving pairs and RNA-protein interaction. The coevolving residues of RBS1 contribute to the recognition of stem-loop SL1, an RNA structural element of H12 that contains the coevolving nucleotides. Indeed, RBS1 proteins carrying substitutions on the coevolving residues P1297 or S1299S1300, drastically reduced SL1-binding. Unlike the wild type RBS1 protein, expression of these mutant proteins in cells failed to enhance translation stimulation of mRNA reporters carrying the H12 sequence. Therefore, the PXSS motif within the RBS1 domain of Gemin5 and the RNA structural motif SL1 of its mRNA appears to play a key role in fine-tuning the expression level of this essential protein.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    基因组数据集对生态学和进化生物学越来越重要,但可用于无脊椎动物的资源要少得多。强大的新计算工具和Illumina测序成本的快速下降开始改变这一点,实现快速基因组组装和参考标记提取。我们已经开发并测试了一种实用的工作流程,用于在非模型组中使用有关Collembola(springtails)的实际数据开发基因组资源,地球上最主要的土壤动物之一。我们设计了通用分子标记集,单拷贝直系同源物(BUSCOs)和超保守元件(UCE),使用三个现有基因组和11个新产生的基因组。通过标记捕获成功和系统发育性能在计算机上测试了两种标记类型。新的基因组用Illumina短读数组装,并且用从头算和蛋白质同源性证据预测了9,585-14,743个蛋白质编码基因。我们在14个基因组中确定了1,997个基准通用单拷贝直向同源物(BUSCO),并创建和评估了用于提取单拷贝基因的自定义BUSCO数据集。我们还开发了包含靶向1,885个基因座的46,087个诱饵的新UCE探针组。我们成功捕获了14个基因组中的1,437-1,865个BUSCO和975-1,186个UCE。使用这些标记的系统发育重建被证明是可靠的,为深层的共谋关系提供新的见解。我们的研究证明了从高效的全基因组测序中产生数千个通用标记的可行性,为进化生物学和生态学的基因组规模研究提供了宝贵的资源。
    Genomic data sets are increasingly central to ecological and evolutionary biology, but far fewer resources are available for invertebrates. Powerful new computational tools and the rapidly decreasing cost of Illumina sequencing are beginning to change this, enabling rapid genome assembly and reference marker extraction. We have developed and tested a practical workflow for developing genomic resources in nonmodel groups with real-world data on Collembola (springtails), one of the most dominant soil animals on Earth. We designed universal molecular marker sets, single-copy orthologues (BUSCOs) and ultraconserved elements (UCEs), using three existing and 11 newly generated genomes. Both marker types were tested in silico via marker capture success and phylogenetic performance. The new genomes were assembled with Illumina short reads and 9,585-14,743 protein-coding genes were predicted with ab initio and protein homology evidence. We identified 1,997 benchmarking universal single-copy orthologues (BUSCOs) across 14 genomes and created and assessed a custom BUSCO data set for extracting single-copy genes. We also developed a new UCE probe set containing 46,087 baits targeting 1,885 loci. We successfully captured 1,437-1,865 BUSCOs and 975-1,186 UCEs across 14 genomes. Phylogenomic reconstructions using these markers proved robust, giving new insight on deep-time collembolan relationships. Our study demonstrates the feasibility of generating thousands of universal markers from highly efficient whole-genome sequencing, providing a valuable resource for genome-scale investigations in evolutionary biology and ecology.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号