Software

软件
  • 文章类型: Journal Article
    知识图谱的作用包括表示,组织,检索,推理,和知识的应用,为人工智能系统和应用提供丰富而强大的认知基础。当我们学习新事物时,发现一些旧信息是错误的,看到正在发生的变化和进步,并采用新的技术标准,我们需要更新知识图。然而,在某些环境中,最初的知识是无法知道的。例如,我们不能访问软件的完整代码,即使我们买了它。在这种情况下,有没有办法在没有先验知识的情况下更新知识图谱?在本文中,我们正在调查在Dalal修订运算符的框架内是否有解决这种情况的方法。我们首先证明,在这种环境中找到最优解是一个强NP完全问题。为此,我们提出了两种算法:Flaccid_search和Tight_search,有不同的条件,并且我们已经证明了这两种算法都可以找到所需的结果。
    The role of knowledge graph encompasses the representation, organization, retrieval, reasoning, and application of knowledge, providing a rich and robust cognitive foundation for artificial intelligence systems and applications. When we learn new things, find out that some old information was wrong, see changes and progress happening, and adopt new technology standards, we need to update knowledge graphs. However, in some environments, the initial knowledge cannot be known. For example, we cannot have access to the full code of a software, even if we purchased it. In such circumstances, is there a way to update a knowledge graph without prior knowledge? In this paper, We are investigating whether there is a method for this situation within the framework of Dalal revision operators. We first proved that finding the optimal solution in this environment is a strongly NP-complete problem. For this purpose, we proposed two algorithms: Flaccid_search and Tight_search, which have different conditions, and we have proved that both algorithms can find the desired results.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    分析过去二十年中国一票否决制度的演变特征和内在逻辑,在考虑改革和标准化时具有非常重要的意义。为了进行这种分析,利用Nvivo12软件对福建颁发的一票否决相关政策文本进行审查,湖北,和甘肃省。通过对关键词频率统计的比较分析,政策文本形式,和三个省的内容特点,人们发现,经过20年的发展,政府部门对一票否决制度的利用经历了根本变化。这些变化主要体现在对政策文本中一票否决的描述的完善,逐步减少一票否决的退出机制,扩大了一票否决的应用领域。
    Analyzing the evolutionary features and internal logic of the one-vote veto system in China over the past two decades is highly significant when considering reform and standardization. In order to conduct this analysis, the Nvivo 12 software was used to examine policy texts related to the one-vote veto issued by Fujian, Hubei, and Gansu provinces. Through a comparative analysis of keyword frequency statistics, policy text form, and content characteristics across the three provinces, it was discovered that governmental departments have experienced fundamental changes in their utilization of the one-vote veto system after 20 years of development. These changes are primarily seen in the refinement of the description of the one-vote veto in policy texts, the gradual reduction in the withdrawal mechanism of the one-vote veto, and an expanded application field for the one-vote veto.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    从宏基因组中鉴定病毒是探索人类肠道中病毒组成的常见步骤。这里,我们介绍VirRep,混合语言表示学习框架,用于从人类肠道宏基因组中鉴定病毒。VirRep结合了上下文感知编码器和进化感知编码器,通过结合k聚体模式和序列同源性来改善序列表示。在具有不同病毒比例的模拟和真实数据集上进行基准测试表明,VirRep优于最先进的方法。当应用于结直肠癌队列的粪便宏基因组时,VirRep鉴定出39种与该疾病相关的高质量病毒,其中许多是现有方法无法检测到的。
    Identifying viruses from metagenomes is a common step to explore the virus composition in the human gut. Here, we introduce VirRep, a hybrid language representation learning framework, for identifying viruses from human gut metagenomes. VirRep combines a context-aware encoder and an evolution-aware encoder to improve sequence representation by incorporating k-mer patterns and sequence homologies. Benchmarking on both simulated and real datasets with varying viral proportions demonstrates that VirRep outperforms state-of-the-art methods. When applied to fecal metagenomes from a colorectal cancer cohort, VirRep identifies 39 high-quality viral species associated with the disease, many of which cannot be detected by existing methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    计算生物模型已被证明是理解和预测许多生物系统行为的宝贵工具。虽然对于有经验的研究人员来说,从头开始构建这样的模型可能不会太具有挑战性,对于早期研究人员来说,这不是一项简单的任务。设计模式是软件工程中广泛应用的众所周知的技术,因为它们为软件设计中的常见问题提供了一套典型的解决方案。在本文中,我们收集并讨论在构建和执行计算生物模型过程中通常使用的常见模式。我们采用Petri网作为建模语言,以提供每种模式的可视化说明;但是,本文提出的想法也可以使用其他建模形式来实现。为了说明的目的,我们提供了两个案例研究,并展示了如何从所呈现的较小模块中构建这些模型。我们希望本文讨论的想法将有助于许多研究人员建立自己的未来模型。
    Computational biological models have proven to be an invaluable tool for understanding and predicting the behaviour of many biological systems. While it may not be too challenging for experienced researchers to construct such models from scratch, it is not a straightforward task for early stage researchers. Design patterns are well-known techniques widely applied in software engineering as they provide a set of typical solutions to common problems in software design. In this paper, we collect and discuss common patterns that are usually used during the construction and execution of computational biological models. We adopt Petri nets as a modelling language to provide a visual illustration of each pattern; however, the ideas presented in this paper can also be implemented using other modelling formalisms. We provide two case studies for illustration purposes and show how these models can be built up from the presented smaller modules. We hope that the ideas discussed in this paper will help many researchers in building their own future models.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    基因组组装的最新进展极大地改善了转座因子(TE)的综合注释的前景。然而,使用基因组组装进行TE注释的现有方法的准确性和鲁棒性有限,需要大量的手动编辑。此外,当前可用的黄金标准TE数据库并不全面,即使是广泛研究的物种,强调了对自动TE检测方法的迫切需要,以补充现有的存储库。在这项研究中,我们介绍HITE,一种快速准确的动态边界调整方法,旨在检测全长TEs。实验结果表明,HiTE优于最先进的工具RepeatModeler2,跨越各种物种。此外,HiTE已经鉴定了许多新的转座子,这些转座子具有明确的结构,含有蛋白质编码域,其中一些直接插入关键基因中,导致基因表达的直接改变。一个Nextflow版本的HiTE也可用,具有增强的并行性,再现性,和便携性。
    Recent advancements in genome assembly have greatly improved the prospects for comprehensive annotation of Transposable Elements (TEs). However, existing methods for TE annotation using genome assemblies suffer from limited accuracy and robustness, requiring extensive manual editing. In addition, the currently available gold-standard TE databases are not comprehensive, even for extensively studied species, highlighting the critical need for an automated TE detection method to supplement existing repositories. In this study, we introduce HiTE, a fast and accurate dynamic boundary adjustment approach designed to detect full-length TEs. The experimental results demonstrate that HiTE outperforms RepeatModeler2, the state-of-the-art tool, across various species. Furthermore, HiTE has identified numerous novel transposons with well-defined structures containing protein-coding domains, some of which are directly inserted within crucial genes, leading to direct alterations in gene expression. A Nextflow version of HiTE is also available, with enhanced parallelism, reproducibility, and portability.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    食物包含蛋白质,脂质,构成多组分生物系统的糖和各种其他分子。仅通过进行常规实验来研究食物系统中的微观变化是具有挑战性的。分子动力学(MD)模拟是解决这一研究空白的关键桥梁。格罗宁根化学模拟机(GROMACS)是一个开源的,高性能的分子动力学模拟软件,由于其高度的灵活性和强大的功能,在食品科学研究中发挥着重要的作用;它已用于在微观水平上探索食品分子之间的分子构象和相互作用机理,并分析其性质和功能。这篇综述介绍了GROMACS软件的工作流程,并强调了其在食品科学研究中的应用的最新发展和成就,从而为深入了解食品的性质和功能提供重要的理论指导和技术支持。
    Food comprises proteins, lipids, sugars and various other molecules that constitute a multicomponent biological system. It is challenging to investigate microscopic changes in food systems solely by performing conventional experiments. Molecular dynamics (MD) simulation serves as a crucial bridge in addressing this research gap. The Groningen Machine for Chemical Simulations (GROMACS) is an open-source, high-performing molecular dynamics simulation software that plays a significant role in food science research owing to its high flexibility and powerful functionality; it has been used to explore the molecular conformations and the mechanisms of interaction between food molecules at the microcosmic level and to analyze their properties and functions. This review presents the workflow of the GROMACS software and emphasizes the recent developments and achievements in its applications in food science research, thus providing important theoretical guidance and technical support for obtaining an in-depth understanding of the properties and functions of food.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    准确理解酶的生物学功能对于病理学和工业生物技术中的各种任务至关重要。然而,现有方法通常速度不够快,对预测结果缺乏解释,这严重限制了它们的实际应用。根据我们之前的工作,Deepre,我们通过设计新颖的自我引导注意力并结合通过大型蛋白质语言模型学习的生物学知识,提出了一种新的可解释和快速版本(ifDEEPre),以准确预测酶的佣金数量并确认其功能。新颖的自我引导注意力旨在优化表征的独特贡献,自动检测关键蛋白质基序以提供有意义的解释。从原始蛋白质序列中学习的表示经过严格筛选,以提高框架的运行速度,比DEEPre快50倍,同时需要小12.89倍的存储空间。大型语言模块被纳入,以学习数以亿计的蛋白质的物理特性,扩展整个网络的生物学知识。大量的实验表明,如果DEEPre优于所有当前的方法,在新数据集上实现超过14.22%的F1分数。此外,经过训练的ifDEEPre模型通过仅获取没有标记信息的原始序列来准确捕获多级蛋白质生物学模式并推断酶的进化趋势。同时,如果DEEPre预测不同酵母亚种之间的进化关系,这与地面事实高度一致。案例研究表明,如果DEEPre能够检测到关键的氨基酸基序,这对设计新型酶具有重要意义。运行ifDEEPre的Web服务器可在https://proj获得。CSE。中大。edu.hk/aihlab/ifdeepre/为公众提供便捷的服务。同时,ifDEEPre可在GitHub上免费获得,网址为https://github.com/ml4bio/ifDEEPre/。
    Accurate understanding of the biological functions of enzymes is vital for various tasks in both pathologies and industrial biotechnology. However, the existing methods are usually not fast enough and lack explanations on the prediction results, which severely limits their real-world applications. Following our previous work, DEEPre, we propose a new interpretable and fast version (ifDEEPre) by designing novel self-guided attention and incorporating biological knowledge learned via large protein language models to accurately predict the commission numbers of enzymes and confirm their functions. Novel self-guided attention is designed to optimize the unique contributions of representations, automatically detecting key protein motifs to provide meaningful interpretations. Representations learned from raw protein sequences are strictly screened to improve the running speed of the framework, 50 times faster than DEEPre while requiring 12.89 times smaller storage space. Large language modules are incorporated to learn physical properties from hundreds of millions of proteins, extending biological knowledge of the whole network. Extensive experiments indicate that ifDEEPre outperforms all the current methods, achieving more than 14.22% larger F1-score on the NEW dataset. Furthermore, the trained ifDEEPre models accurately capture multi-level protein biological patterns and infer evolutionary trends of enzymes by taking only raw sequences without label information. Meanwhile, ifDEEPre predicts the evolutionary relationships between different yeast sub-species, which are highly consistent with the ground truth. Case studies indicate that ifDEEPre can detect key amino acid motifs, which have important implications for designing novel enzymes. A web server running ifDEEPre is available at https://proj.cse.cuhk.edu.hk/aihlab/ifdeepre/ to provide convenient services to the public. Meanwhile, ifDEEPre is freely available on GitHub at https://github.com/ml4bio/ifDEEPre/.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:RNA设计在合成生物学和治疗学中的应用越来越多,由RNA在各种生物过程中的关键作用驱动。一个基本的挑战是找到满足给定结构约束的功能性RNA序列,称为逆折叠问题。已经出现了基于二级结构的计算方法来解决这个问题。然而,直接从3D结构设计RNA序列仍然具有挑战性,由于数据的稀缺性,非唯一的结构-序列映射,和RNA构象的灵活性。
    结果:在这项研究中,我们提出了核扩散,用于RNA反向折叠的生成扩散模型,可以学习给定3D主链结构的RNA序列的条件分布。我们的模型由基于图神经网络的结构模块和基于Transformer的序列模块组成,迭代地将随机序列转换为期望的序列。通过调整采样重量,我们的模型允许在序列恢复和多样性之间进行权衡,以探索更多的候选.我们基于RNA聚类对测试集进行拆分,对序列或结构相似性具有不同的截止值。我们的模型在序列恢复方面优于基线,序列相似性分裂平均相对提高11%,结构相似性分裂平均提高16%。此外,核扩散在各种RNA长度类别和RNA类型中表现一致。我们还应用计算机折叠来验证生成的序列是否可以折叠到给定的3DRNA主链中。我们的方法可能是RNA设计的强大工具,可以探索广阔的序列空间并找到3D结构约束的新颖解决方案。
    方法:源代码可在https://github.com/ml4bio/RiboDiffusion获得。
    BACKGROUND: RNA design shows growing applications in synthetic biology and therapeutics, driven by the crucial role of RNA in various biological processes. A fundamental challenge is to find functional RNA sequences that satisfy given structural constraints, known as the inverse folding problem. Computational approaches have emerged to address this problem based on secondary structures. However, designing RNA sequences directly from 3D structures is still challenging, due to the scarcity of data, the nonunique structure-sequence mapping, and the flexibility of RNA conformation.
    RESULTS: In this study, we propose RiboDiffusion, a generative diffusion model for RNA inverse folding that can learn the conditional distribution of RNA sequences given 3D backbone structures. Our model consists of a graph neural network-based structure module and a Transformer-based sequence module, which iteratively transforms random sequences into desired sequences. By tuning the sampling weight, our model allows for a trade-off between sequence recovery and diversity to explore more candidates. We split test sets based on RNA clustering with different cut-offs for sequence or structure similarity. Our model outperforms baselines in sequence recovery, with an average relative improvement of 11% for sequence similarity splits and 16% for structure similarity splits. Moreover, RiboDiffusion performs consistently well across various RNA length categories and RNA types. We also apply in silico folding to validate whether the generated sequences can fold into the given 3D RNA backbones. Our method could be a powerful tool for RNA design that explores the vast sequence space and finds novel solutions to 3D structural constraints.
    METHODS: The source code is available at https://github.com/ml4bio/RiboDiffusion.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    近年来,随着互联网的发展,APT恶意软件的属性分类仍然是社会上的一个重要问题。现有方法在执行过程中还没有考虑DLL链接库和隐藏文件地址,并且在捕捉事件行为的局部和全局相关性方面存在缺陷。与二进制代码的结构特点相比,操作码功能反映了运行时指令,并且不考虑同一APT组织内本地操作行为的多次重用问题。混淆技术更容易影响基于单一特征的属性分类。为了解决上述问题,(1)构建基于API指令和相关操作的事件行为图,利用GNNs模型捕获主机上的执行轨迹。(2)ImageCNTM捕获操作码图像的局部空间相关性和连续长期依赖性。(3)词频和行为特征的连接和融合,提出了一个多特征,多输入深度学习模型。我们收集了一个公开的APT恶意软件数据集来评估我们的方法。基于单一特征的模型归因分类结果分别达到89.24%和91.91%。最后,与单特征分类器相比,多特征融合模型取得了较好的分类性能。
    In recent years, with the development of the Internet, the attribution classification of APT malware remains an important issue in society. Existing methods have yet to consider the DLL link library and hidden file address during the execution process, and there are shortcomings in capturing the local and global correlation of event behaviors. Compared to the structural features of binary code, opcode features reflect the runtime instructions and do not consider the issue of multiple reuse of local operation behaviors within the same APT organization. Obfuscation techniques more easily influence attribution classification based on single features. To address the above issues, (1) an event behavior graph based on API instructions and related operations is constructed to capture the execution traces on the host using the GNNs model. (2) ImageCNTM captures the local spatial correlation and continuous long-term dependency of opcode images. (3) The word frequency and behavior features are concatenated and fused, proposing a multi-feature, multi-input deep learning model. We collected a publicly available dataset of APT malware to evaluate our method. The attribution classification results of the model based on a single feature reached 89.24% and 91.91%. Finally, compared to single-feature classifiers, the multi-feature fusion model achieves better classification performance.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:在过去的十年中,单细胞RNA测序(scRNA-seq)已成为生物医学研究中转录组学分析的关键方法。精确的细胞类型识别对于随后的单细胞数据分析至关重要。注释数据的集成和细化对于构建全面的数据库至关重要。然而,流行的注释技术通常忽略了细胞类型的分层组织,导致注释不一致。同时,大多数现有的集成方法无法集成具有不同注释深度的数据集,并且它们都无法使用更复杂的注释数据集或新颖的生物学发现来增强具有较低注释分辨率的过时数据的标签。
    结果:这里,我们介绍SCPLAN,为scRNA-seq数据分析设计的分层计算框架。scPLAN擅长使用沿分层细胞类型树构造的参考数据集注释未标记的scRNA-seq数据。它在系统中识别出潜在的新型细胞类型,逐层方式。此外,scPLAN有效地整合了具有不同注释深度水平的带注释的scRNA-seq数据集,确保在分辨率较低的数据集之间一致地细化细胞类型标签。通过广泛的注释和新颖的细胞检测实验,scPLAN已经证明了它的功效。已经进行了两个案例研究,以展示scPLAN如何整合具有不同细胞类型标签分辨率的数据集并完善其细胞类型标签。
    背景:https://github.com/michaelGuo1204/scPLAN。
    BACKGROUND: In the past decade, single-cell RNA sequencing (scRNA-seq) has emerged as a pivotal method for transcriptomic profiling in biomedical research. Precise cell-type identification is crucial for subsequent analysis of single-cell data. And the integration and refinement of annotated data are essential for building comprehensive databases. However, prevailing annotation techniques often overlook the hierarchical organization of cell types, resulting in inconsistent annotations. Meanwhile, most existing integration approaches fail to integrate datasets with different annotation depths and none of them can enhance the labels of outdated data with lower annotation resolutions using more intricately annotated datasets or novel biological findings.
    RESULTS: Here, we introduce scPLAN, a hierarchical computational framework designed for scRNA-seq data analysis. scPLAN excels in annotating unlabeled scRNA-seq data using a reference dataset structured along a hierarchical cell-type tree. It identifies potential novel cell types in a systematic, layer-by-layer manner. Additionally, scPLAN effectively integrates annotated scRNA-seq datasets with varying levels of annotation depth, ensuring consistent refinement of cell-type labels across datasets with lower resolutions. Through extensive annotation and novel cell detection experiments, scPLAN has demonstrated its efficacy. Two case studies have been conducted to showcase how scPLAN integrates datasets with diverse cell-type label resolutions and refine their cell-type labels.
    BACKGROUND: https://github.com/michaelGuo1204/scPLAN.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号