Sequence Analysis

序列分析
  • 文章类型: Journal Article
    蛋白质需要位于适当的时空环境中,以发挥其多种生物学功能。错位的蛋白质可能会导致广泛的疾病,如癌症和老年痴呆症。了解目标蛋白在细胞内的位置将为疾病提供量身定制的药物设计的见解。作为黄金验证标准,传统的湿式实验室使用荧光显微镜成像,免疫电子显微镜,和用于蛋白质亚细胞位置识别的荧光生物标记物标签。然而,蛋白质组学和高通量测序的蓬勃发展时代产生了大量新发现的蛋白质,通过湿实验室实验使蛋白质亚细胞定位成为不可能的任务。为了解决这个问题,在过去的几十年里,人工智能(AI)和机器学习(ML),特别是深度学习方法,在这一研究领域取得了重大进展。在这篇文章中,我们回顾了三种典型方法中基于人工智能的方法开发的最新进展,包括基于序列的,以知识为基础,和基于图像的方法。我们还详细讨论了该研究领域基于AI的方法开发的现有挑战和未来方向。
    Proteins need to be located in appropriate spatiotemporal contexts to carry out their diverse biological functions. Mislocalized proteins may lead to a broad range of diseases, such as cancer and Alzheimer\'s disease. Knowing where a target protein resides within a cell will give insights into tailored drug design for a disease. As the gold validation standard, the conventional wet lab uses fluorescent microscopy imaging, immunoelectron microscopy, and fluorescent biomarker tags for protein subcellular location identification. However, the booming era of proteomics and high-throughput sequencing generates tons of newly discovered proteins, making protein subcellular localization by wet-lab experiments a mission impossible. To tackle this concern, in the past decades, artificial intelligence (AI) and machine learning (ML), especially deep learning methods, have made significant progress in this research area. In this article, we review the latest advances in AI-based method development in three typical types of approaches, including sequence-based, knowledge-based, and image-based methods. We also elaborately discuss existing challenges and future directions in AI-based method development in this research field.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    果胶酸裂解酶和果胶裂解酶在各种生物技术应用中具有重要作用,例如纺织工业,造纸,果胶废水预处理,果汁澄清和油提取。它们可以通过β-消除反应有效地裂解果胶分子背骨的α-1,4-糖苷键以产生果胶寡糖。这样,它不会产生高毒性的甲醇,具有良好的酶选择性的优点,更少的副产品,反应条件温和,效率高。然而,几十年来已经进行了大量的研究;仍然没有全面的综述来总结果胶酸裂解酶和果胶裂解酶的最新进展。这篇综述试图通过提供所有相关信息来填补这一空白,包括基材,origin,生化特性,序列分析,行动模式,三维结构和催化机理。
    Pectate lyases and pectin lyases have essential roles in various biotechnological applications, such as textile industry, paper making, pectic wastewater pretreatment, juice clarification and oil extraction. They can effectively cleave the α-1,4-glycosidic bond of pectin molecules back bone by β-elimination reaction to produce pectin oligosaccharides. In this way, it will not generate highly toxic methanol and has the advantages of good enzymatic selectivity, less by-products, mild reaction conditions and high efficiency. However, numerous researches have been done for several decades; there are still no comprehensive reviews to summarize the recent advances of pectate lyases and pectin lyases. This review tries to fill this gap by providing all relevant information, including the substrate, origin, biochemical properties, sequence analysis, mode of action, the three-dimensional structure and catalytic mechanism.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    端粒位于染色体的末端,具有具有保护基因的独特结构的特定序列。它们具有保护染色体末端免受融合事件并确保染色体稳定性的加帽结构。端粒在细胞分裂的每个周期中长度缩短。当这个长度达到某个阈值时,它会导致基因组不稳定,因此牵涉到各种疾病,包括癌症和神经退行性疾病。正在探索端粒作为衰老和年龄相关疾病的生物标志物的可能性,其意义仍在研究中。这是因为有丝分裂后的细胞,它们是不经历有丝分裂的成熟细胞,不要因年龄而经历端粒缩短。相反,其他原因,例如,暴露于氧化应激,会直接损伤端粒,导致基因组不稳定。尽管如此,已经建立了一个普遍的共识,即测量端粒长度提供了有价值的见解,并为分析基因表达和表观遗传数据奠定了重要的基础。已经开发了许多方法来精确测量端粒长度。在这次审查中,我们总结了评估端粒长度的各种方法及其优点和局限性。
    Telomeres are located at the ends of chromosomes and have specific sequences with a distinctive structure that safeguards genes. They possess capping structures that protect chromosome ends from fusion events and ensure chromosome stability. Telomeres shorten in length during each cycle of cell division. When this length reaches a certain threshold, it can lead to genomic instability, thus being implicated in various diseases, including cancer and neurodegenerative disorders. The possibility of telomeres serving as a biomarker for aging and age-related disease is being explored, and their significance is still under study. This is because post-mitotic cells, which are mature cells that do not undergo mitosis, do not experience telomere shortening due to age. Instead, other causes, for example, exposure to oxidative stress, can directly damage the telomeres, causing genomic instability. Nonetheless, a general agreement has been established that measuring telomere length offers valuable insights and forms a crucial foundation for analyzing gene expression and epigenetic data. Numerous approaches have been developed to accurately measure telomere lengths. In this review, we summarize various methods and their advantages and limitations for assessing telomere length.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Systematic Review
    诺罗病毒是人类急性胃肠炎的主要病原体之一,诺如病毒爆发的性质可能有很大不同。菌株之间的单核苷酸多态性(SNP)的数量用于评估它们的关系。目前,对于定义爆发或链接所涉及的个体的聚类菌株,尚无普遍接受的临界值。进行这项研究是为了估计诺如病毒暴发中相关菌株之间基因组变异的阈值。我们在PubMed和WebofScience数据库中进行了文献检索。SNP率定义为SNP数/序列长度(bp)×100%。Mann-WhitneyU检验用于比较不同序列区域的SNP率分布。基因组(GI和GII),传输路线,和测序方法。共包括25篇报告108例诺如病毒暴发的文章。在99.1%的疫情中,SNP率低于0.50%,在89.8%中,SNP率低于0.20%。当P2结构域用于序列分析时(Z=-2.652,p=0.008)和当使用NGS方法时(Z=-3.686,p<0.001),爆发菌株显示更高的SNP率。由不同诺如病毒基因型引起的爆发在SNP发生率上没有显着差异。与人对人爆发相比,在共同来源的疫情中,SNP发生率较低,但是当考虑到测序方法的差异时,没有发现显着差异。低于0.20%和0.50%的SNP率可以被认为是严格和宽松的阈值,分别,诺如病毒爆发中的菌株相似性。需要更多数据来评估各种诺如病毒暴发内部和之间的差异。
    Noroviruses are among the major causative agents of human acute gastroenteritis, and the nature of norovirus outbreaks can differ considerably. The number of single-nucleotide polymorphisms (SNPs) between strains is used to assess their relationships. There is currently no universally accepted cutoff value for clustering strains that define an outbreak or linking the individuals involved. This study was conducted to estimate the threshold value of genomic variations among related strains within norovirus outbreaks. We carried out a literature search in the PubMed and Web of Science databases. SNP rates were defined as the number of SNPs/sequence length (bp) × 100%. The Mann-Whitney U-test was used in comparisons of the distribution of SNP rates for different sequence regions, genogroups (GI and GII), transmission routes, and sequencing methods. A total of 25 articles reporting on 108 norovirus outbreaks were included. In 99.1% of the outbreaks, the SNP rates were below 0.50%, and in 89.8%, the SNP rates were under 0.20%. Outbreak strains showed higher SNP rates when the P2 domain was used for sequence analysis (Z = -2.652, p = 0.008) and when an NGS method was used (Z = -3.686, p < 0.001). Outbreaks caused by different norovirus genotypes showed no significant difference in SNP rates. Compared with person-to-person outbreaks, SNP rates were lower in common-source outbreaks, but no significant difference was found when differences in sequencing methods were taken into consideraton. SNP rates under 0.20% and 0.50% could be considered as the rigorous and relaxed threshold, respectively, of strain similarity within a norovirus outbreak. More data are needed to evaluate differences within and between various norovirus outbreaks.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Systematic Review
    目的:本系统综述旨在阐明与序列分析(SA)相关的方法学实践和报告标准,以识别现实世界中的临床路径。使用常规收集的数据。
    方法:我们进行了方法学系统综述,搜索五个医疗和健康数据库:MEDLINE,PsycINFO,CINAHL,EMBASE和WebofScience。搜索涵盖了从这些数据库开始到2023年2月28日的文章。搜索策略包括两组独特的搜索术语,特别侧重于序列分析和临床路径。
    结果:19项研究符合本系统评价的资格标准。近60%的纳入研究发表于2021年或之后,其中很大一部分来自加拿大(n=7)和法国(n=5)。90%的研究坚持基本的SA步骤。最佳匹配(OM)方法是最常用的相异度量(63%),而使用Ward连锁的凝聚层次聚类是首选的聚类算法(53%)。然而,必须强调的是,大多数研究未充分报告与SA相关的关键方法学决策.
    结论:本综述强调了在报告数据管理程序和SA流程中的关键方法学选择方面提高透明度的必要性。制定报告指南和为评估SA质量而量身定制的强大评估工具对于该领域的研究人员来说是无价的。
    OBJECTIVE: This systematic review aims to elucidate the methodological practices and reporting standards associated with sequence analysis (SA) for the identification of clinical pathways in real-world scenarios, using routinely collected data.
    METHODS: We conducted a methodological systematic review, searching five medical and health databases: MEDLINE, PsycINFO, CINAHL, EMBASE and Web of Science. The search encompassed articles from the inception of these databases up to February 28, 2023. The search strategy comprised two distinctive sets of search terms, specifically focused on sequence analysis and clinical pathways.
    RESULTS: 19 studies met the eligibility criteria for this systematic review. Nearly 60% of the included studies were published in or after 2021, with a significant proportion originating from Canada (n = 7) and France (n = 5). 90% of the studies adhered to the fundamental SA steps. The optimal matching (OM) method emerged as the most frequently employed dissimilarity measure (63%), while agglomerative hierarchical clustering using Ward\'s linkage was the preferred clustering algorithm (53%). However, it is imperative to underline that a majority of the studies inadequately reported key methodological decisions pertaining to SA.
    CONCLUSIONS: This review underscores the necessity for enhanced transparency in reporting both data management procedures and key methodological choices within SA processes. The development of reporting guidelines and a robust appraisal tool tailored to assess the quality of SA would be invaluable for researchers in this field.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Review
    序列对齐/映射(SAM)格式文件是用于记录对齐信息的文本文件。比对是测序分析的核心,和下游任务接受映射结果进行进一步处理。在测序行业快速发展的今天,全面了解SAM格式和相关工具对于应对数据处理和分析的挑战是必要的。本文致力于在SAM的广泛领域中检索知识。首先,介绍SAM的格式,了解测序分析的总体过程。然后,现有工作按照生成进行系统分类,压缩和应用,和所涉及的SAM工具是专门开采的。最后,总结和对未来方向的一些思考。
    The Sequence Alignment/Map (SAM) format file is the text file used to record alignment information. Alignment is the core of sequencing analysis, and downstream tasks accept mapping results for further processing. Given the rapid development of the sequencing industry today, a comprehensive understanding of the SAM format and related tools is necessary to meet the challenges of data processing and analysis. This paper is devoted to retrieving knowledge in the broad field of SAM. First, the format of SAM is introduced to understand the overall process of the sequencing analysis. Then, existing work is systematically classified in accordance with generation, compression and application, and the involved SAM tools are specifically mined. Lastly, a summary and some thoughts on future directions are provided.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    深度学习的出现和快速发展,特别是基于变压器的架构和注意力机制,已经在多个领域产生了变革性的影响,包括生物信息学和基因组数据分析。基因组序列与语言文本的类似性质使得能够应用在从自然语言处理到基因组数据的领域中表现出成功的技术。这篇综述全面分析了在基因组和转录组数据中应用变换器结构和注意力机制的最新进展。这篇综述的重点是对这些技术的批判性评估,在基因组数据分析的背景下讨论它们的优点和局限性。随着深度学习方法的快速发展,不断评估和反思研究的当前地位和未来方向变得至关重要。因此,这篇综述旨在为经验丰富的研究人员和新来者提供及时的资源,提供了最新进展的全景视图,并阐明了该领域的最新应用。此外,这篇综述论文通过批判性地评估2019年至2023年的研究来突出未来研究的潜在领域,从而成为进一步研究工作的垫脚石.
    The emergence and rapid development of deep learning, specifically transformer-based architectures and attention mechanisms, have had transformative implications across several domains, including bioinformatics and genome data analysis. The analogous nature of genome sequences to language texts has enabled the application of techniques that have exhibited success in fields ranging from natural language processing to genomic data. This review provides a comprehensive analysis of the most recent advancements in the application of transformer architectures and attention mechanisms to genome and transcriptome data. The focus of this review is on the critical evaluation of these techniques, discussing their advantages and limitations in the context of genome data analysis. With the swift pace of development in deep learning methodologies, it becomes vital to continually assess and reflect on the current standing and future direction of the research. Therefore, this review aims to serve as a timely resource for both seasoned researchers and newcomers, offering a panoramic view of the recent advancements and elucidating the state-of-the-art applications in the field. Furthermore, this review paper serves to highlight potential areas of future investigation by critically evaluating studies from 2019 to 2023, thereby acting as a stepping-stone for further research endeavors.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    MicroRNAs(miRNA)与儿童急性淋巴细胞白血病(ALL)发病机制有关。我们对儿童ALL与健康儿童相比的miRNA单核苷酸多态性(SNPs)进行了系统评价和荟萃分析。这表明(i)pri-miR-34b/c中rs4938723的CC基因型和miR-100中rs543412的TT基因型赋予儿童针对ALL发生的保护作用;(ii)miR-146a中的rs2910164基因型与儿童ALL之间没有显着关联;(iii)DROSHA中的SNP,miR-449b,miR-938、miR-3117和miR-3689d-2基因似乎与儿童对B-ALL的易感性相关。对已发表的关于与对照组相比,ALL儿童中miRNAs差异表达的文献的综述揭示了miR-128家族的显着上调。miR-130b,miR-155,miR-181家族,miR-210、miR-222、miR-363和miR-708,以及miR-143和miR-148a的显著下调,似乎对儿童的所有发展都有明确的作用。儿童所有亚型中的microRNA特征,以及B-ALL和T-ALL病例之间的差异miRNA表达模式,被仔细检查。关于T-ALL儿科病例,我们用稳健而敏感的管道重新分析了RNA-seq数据集,并证实了hsa-miR-16-5p的显著差异表达,hsa-miR-19b-3p,hsa-miR-92a-2-5p,hsa-miR-128-3p(排名第一),hsa-miR-130b-3p和-5p,hsa-miR-181a-5p,-2-3p和-3p,hsa-miR-181b-5p和-3p,hsa-miR-145-5p和hsa-miR-574-3p,正如文献中所描述的,以及新鉴定的miRNA。
    MicroRNAs (miRNAs) have been implicated in childhood acute lymphoblastic leukemia (ALL) pathogenesis. We performed a systematic review and meta-analysis of miRNA single-nucleotide polymorphisms (SNPs) in childhood ALL compared with healthy children, which revealed (i) that the CC genotype of rs4938723 in pri-miR-34b/c and the TT genotype of rs543412 in miR-100 confer protection against ALL occurrence in children; (ii) no significant association between rs2910164 genotypes in miR-146a and childhood ALL; and (iii) SNPs in DROSHA, miR-449b, miR-938, miR-3117 and miR-3689d-2 genes seem to be associated with susceptibility to B-ALL in childhood. A review of published literature on differential expression of miRNAs in children with ALL compared with controls revealed a significant upregulation of the miR-128 family, miR-130b, miR-155, miR-181 family, miR-210, miR-222, miR-363 and miR-708, along with significant downregulation of miR-143 and miR-148a, seem to have a definite role in childhood ALL development. MicroRNA signatures among childhood ALL subtypes, along with differential miRNA expression patterns between B-ALL and T-ALL cases, were scrutinized. With respect to T-ALL pediatric cases, we reanalyzed RNA-seq datasets with a robust and sensitive pipeline and confirmed the significant differential expression of hsa-miR-16-5p, hsa-miR-19b-3p, hsa-miR-92a-2-5p, hsa-miR-128-3p (ranked first), hsa-miR-130b-3p and -5p, hsa-miR-181a-5p, -2-3p and -3p, hsa-miR-181b-5p and -3p, hsa-miR-145-5p and hsa-miR-574-3p, as described in the literature, along with novel identified miRNAs.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    据报道,自非洲猪瘟病毒(ASFV)报道以来,有几个“在野外分离的突变株”,这可能是ASFV不断适应和进化的结果。ASFV田间突变体的出现可能导致猪慢性或无症状的“非典型临床症状”,阻碍养猪业的发展。在这里,我们分析了已发表的ASFV“田间减毒株”基因序列,并回顾了田间减毒株和强毒株之间的遗传差异,为ASF的科学防控和新型疫苗的开发提供参考。在这项研究中,我们发现EP153R和EP402R的缺失发生在4个田间减毒株中,田间减毒株的所有差异基因主要分布在GC含量低的地区。通过分析来自葡萄牙的两个田间减毒ASFV菌株,鉴定了MGF110家族基因的进化。我们还发现,某些串联重复序列在NH/P68和OURT88/3菌株的进化中起着重要作用,而在爱沙尼亚菌株2014,HuB20和Pig/黑龙江/HRB1/2020中没有作用。
    It has been reported that there were several \"mutant isolated in the field \" of African swine fever virus (ASFV) since ASFV was reported, which may be the result of the continuous adaptation and evolution of ASFV. The emergence of ASFV field mutants may lead to chronic or asymptomatic \"atypical clinical symptoms\" in pigs and hinder the development of porcine industry. Here we analyzed the published ASFV \"field attenuated strain\" gene sequences and reviewed the genetic differences between field attenuated and virulent ASFV strains, hoping for providing a reference for the scientific prevention and control of ASF and the development of new vaccines. In this study we found the deletion of EP153R and EP402R occurred in 4 field attenuated strains, and all the differential genes of field attenuated strains mainly range in regions with low GC content. The evolution of MGF110 family genes was identified by analysis of two field attenuated ASFV strains from Portugal. We also found that some tandem repeat sequence plays an important role in the evolution of strains of NH/P68 and OURT 88/3 but not in strains Estonia 2014, HuB20 and Pig/Heilongjiang/HRB1/2020.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 赖氨酸琥珀酰化是蛋白质的翻译后修饰(PTM),其中琥珀酰基(-CO-CH2-CH2-CO2H)被添加到蛋白质的赖氨酸残基上,从而将赖氨酸的正电荷逆转为负电荷并导致蛋白质结构和功能的显着变化。它发生在广泛的蛋白质上,并在真核生物和原核生物的各种细胞和生物过程中起重要作用。除了实验确定的琥珀酰化位点,已经有很多研究使用机器学习方法开发基于序列的预测,因为它有非常节省时间的希望,准确,健壮,和成本效益。尽管对于不同物种的赖氨酸琥珀酰化位点的计算预测有这些好处,在琥珀酰化位点预测因子的设计和开发中,有许多问题需要解决。尽管许多研究使用了不同的统计和机器学习计算工具,只有少数研究深入关注这些生物信息学问题。因此,在这次全面的比较审查中,尝试介绍预测模型的最新进展,数据集,和在线资源,以及障碍和限制,为开发更合适和有效的琥珀酰化位点预测工具提供有利的指导。
    Lysine succinylation is a post-translational modification (PTM) of protein in which a succinyl group (-CO-CH2-CH2-CO2H) is added to a lysine residue of protein that reverses lysine\'s positive charge to a negative charge and leads to the significant changes in protein structure and function. It occurs on a wide range of proteins and plays an important role in various cellular and biological processes in both eukaryotes and prokaryotes. Beyond experimentally identified succinylation sites, there have been a lot of studies for developing sequence-based prediction using machine learning approaches, because it has the promise of being extremely time-saving, accurate, robust, and cost-effective. Despite these benefits for computational prediction of lysine succinylation sites for different species, there are a number of issues that need to be addressed in the design and development of succinylation site predictors. In spite of the fact that many studies used different statistical and machine learning computational tools, only a few studies have focused on these bioinformatics issues in depth. Therefore, in this comprehensive comparative review, an attempt is made to present the latest advances in the prediction models, datasets, and online resources, as well as the obstacles and limits, to provide an advantageous guideline for developing more suitable and effective succinylation site prediction tools.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号