Software

软件
  • 文章类型: Journal Article
    知识图谱的作用包括表示,组织,检索,推理,和知识的应用,为人工智能系统和应用提供丰富而强大的认知基础。当我们学习新事物时,发现一些旧信息是错误的,看到正在发生的变化和进步,并采用新的技术标准,我们需要更新知识图。然而,在某些环境中,最初的知识是无法知道的。例如,我们不能访问软件的完整代码,即使我们买了它。在这种情况下,有没有办法在没有先验知识的情况下更新知识图谱?在本文中,我们正在调查在Dalal修订运算符的框架内是否有解决这种情况的方法。我们首先证明,在这种环境中找到最优解是一个强NP完全问题。为此,我们提出了两种算法:Flaccid_search和Tight_search,有不同的条件,并且我们已经证明了这两种算法都可以找到所需的结果。
    The role of knowledge graph encompasses the representation, organization, retrieval, reasoning, and application of knowledge, providing a rich and robust cognitive foundation for artificial intelligence systems and applications. When we learn new things, find out that some old information was wrong, see changes and progress happening, and adopt new technology standards, we need to update knowledge graphs. However, in some environments, the initial knowledge cannot be known. For example, we cannot have access to the full code of a software, even if we purchased it. In such circumstances, is there a way to update a knowledge graph without prior knowledge? In this paper, We are investigating whether there is a method for this situation within the framework of Dalal revision operators. We first proved that finding the optimal solution in this environment is a strongly NP-complete problem. For this purpose, we proposed two algorithms: Flaccid_search and Tight_search, which have different conditions, and we have proved that both algorithms can find the desired results.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    化学信息已经变得越来越普遍,并且已经超过了分析和解释的速度。我们开发了一个R包,uafR,这可以自动进行气相色谱耦合质谱(GC-MS)数据的搜索过程,并允许对化学比较感兴趣的任何人快速执行高级结构相似性匹配。我们简化的化学信息学工作流程使具有R基本经验的任何人都可以使用已发表的对样品中分子的最佳理解(pubchem.gov)来提取成分区域以进行暂定化合物鉴定。现在可以在很短的时间内完成解释,成本,通常需要使用标准的化学生态数据分析管道。该包装在两个实验环境中进行了测试:(1)纯化的内标数据集,这表明我们的算法正确地识别了已知化合物的R2值范围为0.827-0.999,浓度范围为1×10-5至1×103ng/μl,(2)一个大的,以前发布的数据集,其中鉴定的化合物的数量和类型与传统手动峰注释过程中鉴定的化合物相当(或相同),化合物的NMDS分析产生了与原始研究相同的意义模式。使用uafR,GC-MS数据处理的速度和准确性都大大提高,因为它允许用户在试探性文库鉴定后(即在m/z光谱与已安装的化学碎片数据库(例如NIST)匹配之后)与他们的实验进行流畅地交互。使用uafR将允许快速收集和系统地解释更大的数据集。此外,uafR的功能可以允许新人员或学生在接受培训时处理以前收集和注释的积压数据。当我们进入曝光组学时代时,这一点至关重要,代谢组学,挥发物,和景观水平,高通量化学分型。该软件包旨在促进对化学数据的集体理解,适用于任何受益于GC-MS分析的研究。可以从github.org/castratton/uafR上的Github免费下载它和示例数据集,也可以使用以下开发人员工具直接从R或RStudio安装:\'devtools::install_github(\"castratton/uafR\")\'。
    Chemical information has become increasingly ubiquitous and has outstripped the pace of analysis and interpretation. We have developed an R package, uafR, that automates a grueling retrieval process for gas -chromatography coupled mass spectrometry (GC -MS) data and allows anyone interested in chemical comparisons to quickly perform advanced structural similarity matches. Our streamlined cheminformatics workflows allow anyone with basic experience in R to pull out component areas for tentative compound identifications using the best published understanding of molecules across samples (pubchem.gov). Interpretations can now be done at a fraction of the time, cost, and effort it would typically take using a standard chemical ecology data analysis pipeline. The package was tested in two experimental contexts: (1) A dataset of purified internal standards, which showed our algorithms correctly identified the known compounds with R2 values ranging from 0.827-0.999 along concentrations ranging from 1 × 10-5 to 1 × 103 ng/μl, (2) A large, previously published dataset, where the number and types of compounds identified were comparable (or identical) to those identified with the traditional manual peak annotation process, and NMDS analysis of the compounds produced the same pattern of significance as in the original study. Both the speed and accuracy of GC -MS data processing are drastically improved with uafR because it allows users to fluidly interact with their experiment following tentative library identifications [i.e. after the m/z spectra have been matched against an installed chemical fragmentation database (e.g. NIST)]. Use of uafR will allow larger datasets to be collected and systematically interpreted quickly. Furthermore, the functions of uafR could allow backlogs of previously collected and annotated data to be processed by new personnel or students as they are being trained. This is critical as we enter the era of exposomics, metabolomics, volatilomes, and landscape level, high-throughput chemotyping. This package was developed to advance collective understanding of chemical data and is applicable to any research that benefits from GC -MS analysis. It can be downloaded for free along with sample datasets from Github at github.org/castratton/uafR or installed directly from R or RStudio using the developer tools: \'devtools::install_github(\"castratton/uafR\")\'.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    分析过去二十年中国一票否决制度的演变特征和内在逻辑,在考虑改革和标准化时具有非常重要的意义。为了进行这种分析,利用Nvivo12软件对福建颁发的一票否决相关政策文本进行审查,湖北,和甘肃省。通过对关键词频率统计的比较分析,政策文本形式,和三个省的内容特点,人们发现,经过20年的发展,政府部门对一票否决制度的利用经历了根本变化。这些变化主要体现在对政策文本中一票否决的描述的完善,逐步减少一票否决的退出机制,扩大了一票否决的应用领域。
    Analyzing the evolutionary features and internal logic of the one-vote veto system in China over the past two decades is highly significant when considering reform and standardization. In order to conduct this analysis, the Nvivo 12 software was used to examine policy texts related to the one-vote veto issued by Fujian, Hubei, and Gansu provinces. Through a comparative analysis of keyword frequency statistics, policy text form, and content characteristics across the three provinces, it was discovered that governmental departments have experienced fundamental changes in their utilization of the one-vote veto system after 20 years of development. These changes are primarily seen in the refinement of the description of the one-vote veto in policy texts, the gradual reduction in the withdrawal mechanism of the one-vote veto, and an expanded application field for the one-vote veto.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    人工智能彻底改变了蛋白质结构预测领域。然而,随着更强大、更复杂的软件的开发,它是可访问性和易用性,而不是功能,正在迅速成为最终用户的限制因素。LazyAF是一个基于GoogleColaboratory的管道,它集成了现有的ColabFoldBATCH软件,以简化中等规模的蛋白质-蛋白质相互作用预测过程。LazyAF用于预测在广泛宿主范围的多药抗性质粒RK2上编码的76种蛋白质的相互作用组,证明了管道提供的易用性和可及性。
    Artificial intelligence has revolutionized the field of protein structure prediction. However, with more powerful and complex software being developed, it is accessibility and ease of use rather than capability that is quickly becoming a limiting factor to end users. LazyAF is a Google Colaboratory-based pipeline which integrates the existing ColabFold BATCH software to streamline the process of medium-scale protein-protein interaction prediction. LazyAF was used to predict the interactome of the 76 proteins encoded on the broad-host-range multi-drug resistance plasmid RK2, demonstrating the ease and accessibility the pipeline provides.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    顺式调控元件的变异将非编码基因组与人类病理学联系起来;然而,缺乏详细的分析工具来理解细胞水平的脑病理学与非编码变异之间的关联.CWAS-Plus,改编自用于类别范围关联测试(CWAS)的Python包,通过整合全基因组测序(WGS)和用户提供的功能数据来增强非编码变异分析。通过简化的参数设置和高效的多重测试校正方法,CWAS-Plus执行CWAS工作流程的速度比CWAS快50倍,使研究人员更容易获得和用户友好。这里,我们对转座酶可接近的染色质进行了单核测定,并进行了测序,以促进CWAS指导的细胞类型特异性增强子和启动子的非编码变异分析.检查自闭症谱系障碍WGS数据(n=7280),CWAS-Plus在保守基因座内的转录因子结合位点中鉴定出非编码从头变体关联。独立地,在阿尔茨海默病WGS数据(n=1087)中,CWAS-Plus在小胶质细胞特异性调控元件中检测到罕见的非编码变体关联。这些发现强调了CWAS-Plus在基因组疾病中的实用性和处理大规模WGS数据和多重测试校正的可扩展性。CWAS-Plus及其用户手册可在https://github.com/joonan-lab/cwas/和https://cwas-plus获得。readthedocs.io/en/latest/,分别。
    Variants in cis-regulatory elements link the noncoding genome to human pathology; however, detailed analytic tools for understanding the association between cell-level brain pathology and noncoding variants are lacking. CWAS-Plus, adapted from a Python package for category-wide association testing (CWAS), enhances noncoding variant analysis by integrating both whole-genome sequencing (WGS) and user-provided functional data. With simplified parameter settings and an efficient multiple testing correction method, CWAS-Plus conducts the CWAS workflow 50 times faster than CWAS, making it more accessible and user-friendly for researchers. Here, we used a single-nuclei assay for transposase-accessible chromatin with sequencing to facilitate CWAS-guided noncoding variant analysis at cell-type-specific enhancers and promoters. Examining autism spectrum disorder WGS data (n = 7280), CWAS-Plus identified noncoding de novo variant associations in transcription factor binding sites within conserved loci. Independently, in Alzheimer\'s disease WGS data (n = 1087), CWAS-Plus detected rare noncoding variant associations in microglia-specific regulatory elements. These findings highlight CWAS-Plus\'s utility in genomic disorders and scalability for processing large-scale WGS data and in multiple-testing corrections. CWAS-Plus and its user manual are available at https://github.com/joonan-lab/cwas/ and https://cwas-plus.readthedocs.io/en/latest/, respectively.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    从宏基因组中鉴定病毒是探索人类肠道中病毒组成的常见步骤。这里,我们介绍VirRep,混合语言表示学习框架,用于从人类肠道宏基因组中鉴定病毒。VirRep结合了上下文感知编码器和进化感知编码器,通过结合k聚体模式和序列同源性来改善序列表示。在具有不同病毒比例的模拟和真实数据集上进行基准测试表明,VirRep优于最先进的方法。当应用于结直肠癌队列的粪便宏基因组时,VirRep鉴定出39种与该疾病相关的高质量病毒,其中许多是现有方法无法检测到的。
    Identifying viruses from metagenomes is a common step to explore the virus composition in the human gut. Here, we introduce VirRep, a hybrid language representation learning framework, for identifying viruses from human gut metagenomes. VirRep combines a context-aware encoder and an evolution-aware encoder to improve sequence representation by incorporating k-mer patterns and sequence homologies. Benchmarking on both simulated and real datasets with varying viral proportions demonstrates that VirRep outperforms state-of-the-art methods. When applied to fecal metagenomes from a colorectal cancer cohort, VirRep identifies 39 high-quality viral species associated with the disease, many of which cannot be detected by existing methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    串联重复在整个人类基因组中频繁出现,重复长度的变化与多种性状有关。最近长读取测序技术的改进有可能极大地改善串联重复分析,尤其是长时间或复杂的重复。这里,我们介绍LongTR,从PacBio和OxfordNanoporeTechnologies提供的高保真长读数中准确地串联重复基因型。LongTR可在https://github.com/gymorek-lab/longtr和https://zenodo.org/doi/10.5281/zenodo.11403979上免费获得。
    Tandem repeats are frequent across the human genome, and variation in repeat length has been linked to a variety of traits. Recent improvements in long read sequencing technologies have the potential to greatly improve tandem repeat analysis, especially for long or complex repeats. Here, we introduce LongTR, which accurately genotypes tandem repeats from high-fidelity long reads available from both PacBio and Oxford Nanopore Technologies. LongTR is freely available at https://github.com/gymrek-lab/longtr and https://zenodo.org/doi/10.5281/zenodo.11403979 .
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:数据越来越多地用于公共卫生的改进和研究,特别是行政数据,如电子健康记录中收集的数据。患者进入和退出这些典型的开放队列数据集不均匀;这可以使关于发病率和患病率的简单问题耗时并且在分析之间具有不必要的差异。因此,我们开发了在开放队列数据集中自动分析发病率和患病率的方法,为了提高透明度,分析的生产率和可重复性。
    方法:我们提供了一套无代码的发病率和患病率规则,可以应用于任何开放队列,以及这些规则的python命令行界面实现,需要python3.9或更高版本。
    命令行界面用于根据开放队列数据计算发病率和点患病率时间序列。规则集可以用于开发其他实现,也可以重新排列以形成其他分析问题,例如时段流行。
    背景:命令行界面可从https://github.com/THINKINGGroup/alogue_publication免费获得。
    BACKGROUND: Data is increasingly used for improvement and research in public health, especially administrative data such as that collected in electronic health records. Patients enter and exit these typically open-cohort datasets non-uniformly; this can render simple questions about incidence and prevalence time-consuming and with unnecessary variation between analyses. We therefore developed methods to automate analysis of incidence and prevalence in open cohort datasets, to improve transparency, productivity and reproducibility of analyses.
    METHODS: We provide both a code-free set of rules for incidence and prevalence that can be applied to any open cohort, and a python Command Line Interface implementation of these rules requiring python 3.9 or later.
    UNASSIGNED: The Command Line Interface is used to calculate incidence and point prevalence time series from open cohort data. The ruleset can be used in developing other implementations or can be rearranged to form other analytical questions such as period prevalence.
    BACKGROUND: The command line interface is freely available from https://github.com/THINKINGGroup/analogy_publication .
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    生物学和生态学中动物声音的研究在很大程度上依赖于时频(TF)可视化,最常用的是短时傅里叶变换(STFT)谱图。这种方法,然而,对时间或频谱细节具有固有的偏见,可能导致对复杂动物声音的误解。理想的TF可视化应该在频率和时间方面准确地传达声音的结构,然而,STFT通常不能满足这一要求。我们评估了四种TF可视化方法的准确性(超级小变换[SLT],连续小波变换[CWT]和两个STFT)使用合成测试信号。然后我们应用这些方法来想象查戈斯蓝鲸的声音,亚洲象,南部食典区,东方鞭鸟,马洛韦鱼和美国鳄鱼。我们表明,SLT可视化测试信号的误差比其他方法小18.48%-28.08%。我们对动物声音的可视化与文献描述之间的比较表明,STFT的偏见可能在描述侏儒蓝鲸的歌声和大象的隆隆声时引起了误解。我们建议使用SLT可视化低频动物声音可以防止这种误解。最后,我们使用SLT来开发\'BASSA\',一个开源的,提供无代码的GUI软件应用程序,用户友好的工具,用于分析Windows平台的低频动物声音的短期记录。SLT以更高的精度可视化低频动物声音,以用户友好的格式,最大限度地减少误解的风险,同时需要比STFT更少的技术专长。使用这种方法可以推动声学驱动的动物交流研究的进展,声乐制作方法,发声和物种鉴定。
    The study of animal sounds in biology and ecology relies heavily upon time-frequency (TF) visualisation, most commonly using the short-time Fourier transform (STFT) spectrogram. This method, however, has inherent bias towards either temporal or spectral details that can lead to misinterpretation of complex animal sounds. An ideal TF visualisation should accurately convey the structure of the sound in terms of both frequency and time, however, the STFT often cannot meet this requirement. We evaluate the accuracy of four TF visualisation methods (superlet transform [SLT], continuous wavelet transform [CWT] and two STFTs) using a synthetic test signal. We then apply these methods to visualise sounds of the Chagos blue whale, Asian elephant, southern cassowary, eastern whipbird, mulloway fish and the American crocodile. We show that the SLT visualises the test signal with 18.48%-28.08% less error than the other methods. A comparison between our visualisations of animal sounds and their literature descriptions indicates that the STFT\'s bias may have caused misinterpretations in describing pygmy blue whale songs and elephant rumbles. We suggest that use of the SLT to visualise low-frequency animal sounds may prevent such misinterpretations. Finally, we employ the SLT to develop \'BASSA\', an open-source, GUI software application that offers a no-code, user-friendly tool for analysing short-duration recordings of low-frequency animal sounds for the Windows platform. The SLT visualises low-frequency animal sounds with improved accuracy, in a user-friendly format, minimising the risk of misinterpretation while requiring less technical expertise than the STFT. Using this method could propel advances in acoustics-driven studies of animal communication, vocal production methods, phonation and species identification.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    计算生物模型已被证明是理解和预测许多生物系统行为的宝贵工具。虽然对于有经验的研究人员来说,从头开始构建这样的模型可能不会太具有挑战性,对于早期研究人员来说,这不是一项简单的任务。设计模式是软件工程中广泛应用的众所周知的技术,因为它们为软件设计中的常见问题提供了一套典型的解决方案。在本文中,我们收集并讨论在构建和执行计算生物模型过程中通常使用的常见模式。我们采用Petri网作为建模语言,以提供每种模式的可视化说明;但是,本文提出的想法也可以使用其他建模形式来实现。为了说明的目的,我们提供了两个案例研究,并展示了如何从所呈现的较小模块中构建这些模型。我们希望本文讨论的想法将有助于许多研究人员建立自己的未来模型。
    Computational biological models have proven to be an invaluable tool for understanding and predicting the behaviour of many biological systems. While it may not be too challenging for experienced researchers to construct such models from scratch, it is not a straightforward task for early stage researchers. Design patterns are well-known techniques widely applied in software engineering as they provide a set of typical solutions to common problems in software design. In this paper, we collect and discuss common patterns that are usually used during the construction and execution of computational biological models. We adopt Petri nets as a modelling language to provide a visual illustration of each pattern; however, the ideas presented in this paper can also be implemented using other modelling formalisms. We provide two case studies for illustration purposes and show how these models can be built up from the presented smaller modules. We hope that the ideas discussed in this paper will help many researchers in building their own future models.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号