parser

解析器
  • 文章类型: Journal Article
    如何从序列中提取统计模式有两种主要方法:过渡概率方法提出,通过计算序列中项目之间的概率来进行统计学习。分块方法,包括PARSER和TRACX等模型,建议将单位提取为块。重要的是,分块法表明,全单元的提取会削弱子单元的处理,而过渡概率法表明,单元和子单元都应加强。以前的发现使用顺序组织,听觉刺激或空间组织,视觉刺激支持分块方法。然而,先前研究的一个局限性在于,大多数采用两种选择的强制选择任务来评估学习.相比之下,这个预先注册的实验依次检查了两种理论方法,使用在线自定进度任务的视觉刺激-可以说在学习发生时提供更敏感的学习指数-以及次要的离线熟悉度判断任务。在自定进度的任务中,抽象形状被秘密地组织成八个三胞胎(ABC),其中每八个中就有一个从规范结构中改变(BCA),从而破坏了整个单元,同时保留了一个亚基(BC)。离线熟悉度判断任务的结果显示,改变的三胞胎被认为是高度熟悉的,表明学习的表征相对灵活。更重要的是,在线自定进度任务的结果表明,对子单元的处理,但不是单位初始刺激,在改变的三联体中受阻。结果的模式符合统计学习的分块方法,更具体地说,TRACX模型。
    There are two main approaches to how statistical patterns are extracted from sequences: The transitional probability approach proposes that statistical learning occurs through the computation of probabilities between items in a sequence. The chunking approach, including models such as PARSER and TRACX, proposes that units are extracted as chunks. Importantly, the chunking approach suggests that the extraction of full units weakens the processing of subunits while the transitional probability approach suggests that both units and subunits should strengthen. Previous findings using sequentially organized, auditory stimuli or spatially organized, visual stimuli support the chunking approach. However, one limitation of prior studies is that most assessed learning with the two-alternative forced-choice task. In contrast, this pre-registered experiment examined the two theoretical approaches in sequentially organized, visual stimuli using an online self-paced task-arguably providing a more sensitive index of learning as it occurs-and a secondary offline familiarity judgment task. During the self-paced task, abstract shapes were covertly organized into eight triplets (ABC) where one in every eight was altered (BCA) from the canonical structure in a way that disrupted the full unit while preserving a subunit (BC). Results from the offline familiarity judgment task revealed that the altered triplets were perceived as highly familiar, suggesting the learned representations were relatively flexible. More importantly, results from the online self-paced task demonstrated that processing for subunits, but not unit-initial stimuli, was impeded in the altered triplet. The pattern of results is in line with the chunking approach to statistical learning and, more specifically, the TRACX model.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    RNA分子中准确的“碱基配对”,这导致了RNA二级结构的预测,对于解释未知的生物操作至关重要。最近,COVID-19,一种广泛传播的疾病,造成了很多人死亡,以前所未有的方式影响人类。SARS-CoV-2是一种单链RNA病毒,显示了分析这些分子及其结构的重要性。本文旨在在预测特定RNA结构的方向上创建一个开创性的框架,利用句法模式识别。拟议的框架,Know+,解决了预测H型伪结的问题,包括凸起和内部回路,通过展示上下文无关语法(CFG)的力量。我们将语法的优势与最大碱基配对和最小自由能相结合,以高性能的方式解决这个模糊的任务。具体来说,我们提出的方法,Know+,在核心茎预测的准确性方面优于最先进的框架。此外,它在小序列中表现得更准确,在大序列中表现出相当的准确率,而与知名平台相比,它需要的执行时间更短。Knoto+源代码和实现详细信息作为GitHub上的公共存储库提供。
    The accurate \"base pairing\" in RNA molecules, which leads to the prediction of RNA secondary structures, is crucial in order to explain unknown biological operations. Recently, COVID-19, a widespread disease, has caused many deaths, affecting humanity in an unprecedented way. SARS-CoV-2, a single-stranded RNA virus, has shown the significance of analyzing these molecules and their structures. This paper aims to create a pioneering framework in the direction of predicting specific RNA structures, leveraging syntactic pattern recognition. The proposed framework, Knotify+, addresses the problem of predicting H-type pseudoknots, including bulges and internal loops, by featuring the power of context-free grammar (CFG). We combine the grammar\'s advantages with maximum base pairing and minimum free energy to tackle this ambiguous task in a performant way. Specifically, our proposed methodology, Knotify+, outperforms state-of-the-art frameworks with regards to its accuracy in core stems prediction. Additionally, it performs more accurately in small sequences and presents a comparable accuracy rate in larger ones, while it requires a smaller execution time compared to well-known platforms. The Knotify+ source code and implementation details are available as a public repository on GitHub.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:生物分子结构数据超过了科学界几十年来依赖的传统蛋白质数据库(PDB)格式,然而,其后继PDBx/高分子晶体学信息文件格式(PDBx/mmCIF)的使用仍然不普遍。也许其中一个原因是仅支持传统格式的易于使用的工具的可用性,而且正确处理mmCIF文件的固有困难,考虑到边缘案例的数量,使有效的解析成为问题。然而,充分利用大分子结构数据及其相关注释,如综合/混合方法的多尺度结构或使用传统方法确定的大大分子复合物,有必要尽快完全采用新格式。
    结果:为此,我们开发了PDBeCIF,用于操作mmCIF和CIF文件的开源Python项目。它是wwPDB记录的mmCIF解析器官方列表的一部分,并在欧洲蛋白质数据库的过程中大量使用。该软件包可以从PyPI存储库(http://pypi.org/project/pdbecif)和GitHub(https://github.com/pdbeurope/pdbecif)以及丰富的文档和许多现成的示例免费提供。
    结论:PDBeCIF是一个高效的轻量级Python2.6+/3+包,没有外部依赖。它可以很容易地与第三方图书馆集成,也可以用于广泛的科学分析。
    BACKGROUND: Biomacromolecular structural data outgrew the legacy Protein Data Bank (PDB) format which the scientific community relied on for decades, yet the use of its successor PDBx/Macromolecular Crystallographic Information File format (PDBx/mmCIF) is still not widespread. Perhaps one of the reasons is the availability of easy to use tools that only support the legacy format, but also the inherent difficulties of processing mmCIF files correctly, given the number of edge cases that make efficient parsing problematic. Nevertheless, to fully exploit macromolecular structure data and their associated annotations such as multiscale structures from integrative/hybrid methods or large macromolecular complexes determined using traditional methods, it is necessary to fully adopt the new format as soon as possible.
    RESULTS: To this end, we developed PDBeCIF, an open-source Python project for manipulating mmCIF and CIF files. It is part of the official list of mmCIF parsers recorded by the wwPDB and is heavily employed in the processes of the Protein Data Bank in Europe. The package is freely available both from the PyPI repository ( http://pypi.org/project/pdbecif ) and from GitHub ( https://github.com/pdbeurope/pdbecif ) along with rich documentation and many ready-to-use examples.
    CONCLUSIONS: PDBeCIF is an efficient and lightweight Python 2.6+/3+ package with no external dependencies. It can be readily integrated with 3rd party libraries as well as adopted for broad scientific analyses.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    A search for the terms \"acceptability judgment tasks\" and \"language\" and \"grammaticality judgment tasks\" and \"language\" produces results which report findings that are based on the exact same elicitation technique. Although certain scholars have argued that acceptability and grammaticality are two separable notions that refer to different concepts, there are contexts in which the two terms are used interchangeably. The present work reaffirms that these two notions and their scales do not coincide: there are sentences that are acceptable, even though they are ungrammatical, and sentences that are unacceptable, despite being grammatical. First, we adduce a number of examples for both cases, including grammatical illusions, violations of Identity Avoidance, and sentences that involve a level of processing complexity that overloads the cognitive parser and tricks it into (un)acceptability. We then discuss whether the acceptability of grammatically ill-formed sentences entails that we assign a meaning to them. Last, it is shown that there are n ways of unacceptability, and two ways of ungrammaticality, in the absolute and the relative sense. Since the use of the terms \"acceptable\" and \"grammatical\" is often found in experiments that constitute the core of the evidential base of linguistics, disentangling their various uses is likely to aid the field reach a better level of terminological clarity.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    As a consequence of the epidemiological transition towards non-communicable diseases, integrated care approaches are required, not solely focused on medical purposes, but also on a range of essential activities for the maintenance of the individuals\' quality of life. In order to allow the exchange of information, these integrated approaches might be supported by digital platforms, which need to provide trustful environments and to guarantee the integrity of the information exchanged. Therefore, together with mechanisms such as authentication, logging or auditing, the definition of access control policies assumes a paramount importance. This article focuses on the development of a parser as a component of a platform to support the care of community-dwelling older adults, the SOCIAL platform, to allow the definition of access control policies and rules using natural languages.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    电子健康记录(EHR)为信息利用带来了许多机会。一种这样的用途是疾病控制和预防中心进行的监测,以跟踪自闭症谱系障碍(ASD)的病例。该过程目前包括手动收集和审查美国11个州的4岁和8岁儿童的EHR,以了解是否存在ASD标准。这项工作既耗时又昂贵。
    我们的目标是从EHR中自动提取临床医生记录的行为描述,以证明《精神障碍诊断和统计手册》(DSM)中的诊断标准。以前,我们报告了整个EHR是否为ASD的分类。在这项工作中,我们专注于提取文本中不同ASD标准的单个表达式。我们打算促进ASD的大规模监测工作,并支持随时间变化的分析,以及与其他相关数据的集成。
    我们开发了一个自然语言处理(NLP)解析器,使用104个模式和92个词典(1787个术语)提取12个DSM标准的表达式。解析器是基于规则的,以使得能够从文本精确地提取实体。实体本身包含在EHR中,作为不同人在不同时间撰写的诊断标准的非常多样化的表达(临床医生,言语病理学家,除其他外)。由于数据的稀疏性,基于规则的方法是最适合的,直到可以为机器学习算法生成更大的数据集。
    我们评估了基于规则的解析器,并将其与机器学习基线(决策树)进行了比较。使用6636个句子(50EHR)的测试集,我们发现我们的解析器达到了76%的精度,43%的召回(即,灵敏度),标准提取的特异性>99%。基于规则的方法的性能优于机器学习基线(60%的精度和30%的召回率)。对于一些个人标准,准确率高达97%,召回率为57%。由于精度很高,我们确信标准很少分配错误,我们的数字显示了它们在EHR中的存在下限。然后,我们进行了案例研究,并分析了4480个新的EHR,涵盖了亚利桑那州发育障碍监测计划的10年监测记录。社会标准(A1标准)显示出多年来最大的变化。通信标准(A2标准)没有将ASD与非ASD记录区分开。在行为和利益标准(A3标准)中,1(A3b)在ASD中的出现频率比非ASDEHR高得多。
    我们的结果表明,NLP可以支持对ASD监测和研究有用的大规模分析。在未来,我们打算促进国家数据集的详细分析和整合。
    Electronic health records (EHRs) bring many opportunities for information utilization. One such use is the surveillance conducted by the Centers for Disease Control and Prevention to track cases of autism spectrum disorder (ASD). This process currently comprises manual collection and review of EHRs of 4- and 8-year old children in 11 US states for the presence of ASD criteria. The work is time-consuming and expensive.
    Our objective was to automatically extract from EHRs the description of behaviors noted by the clinicians in evidence of the diagnostic criteria in the Diagnostic and Statistical Manual of Mental Disorders (DSM). Previously, we reported on the classification of entire EHRs as ASD or not. In this work, we focus on the extraction of individual expressions of the different ASD criteria in the text. We intend to facilitate large-scale surveillance efforts for ASD and support analysis of changes over time as well as enable integration with other relevant data.
    We developed a natural language processing (NLP) parser to extract expressions of 12 DSM criteria using 104 patterns and 92 lexicons (1787 terms). The parser is rule-based to enable precise extraction of the entities from the text. The entities themselves are encompassed in the EHRs as very diverse expressions of the diagnostic criteria written by different people at different times (clinicians, speech pathologists, among others). Due to the sparsity of the data, a rule-based approach is best suited until larger datasets can be generated for machine learning algorithms.
    We evaluated our rule-based parser and compared it with a machine learning baseline (decision tree). Using a test set of 6636 sentences (50 EHRs), we found that our parser achieved 76% precision, 43% recall (ie, sensitivity), and >99% specificity for criterion extraction. The performance was better for the rule-based approach than for the machine learning baseline (60% precision and 30% recall). For some individual criteria, precision was as high as 97% and recall 57%. Since precision was very high, we were assured that criteria were rarely assigned incorrectly, and our numbers presented a lower bound of their presence in EHRs. We then conducted a case study and parsed 4480 new EHRs covering 10 years of surveillance records from the Arizona Developmental Disabilities Surveillance Program. The social criteria (A1 criteria) showed the biggest change over the years. The communication criteria (A2 criteria) did not distinguish the ASD from the non-ASD records. Among behaviors and interests criteria (A3 criteria), 1 (A3b) was present with much greater frequency in the ASD than in the non-ASD EHRs.
    Our results demonstrate that NLP can support large-scale analysis useful for ASD surveillance and research. In the future, we intend to facilitate detailed analysis and integration of national datasets.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    尽管它们的功能和特定领域的元素存在差异,音乐和语言中的句法处理被认为共享认知资源。这项研究旨在调查语言和音乐的同时处理是否共享使用通用的句法处理器或更一般的注意力资源。为了研究这个问题,我们使用视觉呈现的句子和听觉呈现的包含语法本地和远距离依赖关系的旋律来测试音乐家和非音乐家。收集参与者反应的准确率和反应时间。在句子和旋律中,引入了意想不到的句法异常。这是第一项研究,旨在解决语言和音乐中局部和远距离依赖性的处理,同时减少感觉记忆的影响。指示参与者专注于语言(语言会话),音乐(音乐会议),或两者(双会话)。在语言会话中,音乐家和非音乐家在准确率和反应时间方面的表现相当。不出所料,团体差异出现在音乐会话中,音乐家的反应比非音乐家更准确,只有后者表现出音乐和语言语法的准确率之间的相互作用。在双重会议中,音乐家总体上比非音乐家更准确。然而,两组都表现出可比的行为,通过显示语言和音乐语法响应的准确率之间的交互。在我们的研究中,准确率似乎更好地捕捉语言和音乐语法之间的相互作用;这种相互作用似乎表明使用不同的,然而,互动机制作为决策策略的一部分。这种互动似乎受到注意力负荷和领域熟练程度增加的影响。我们的研究通过为语言和音乐之间的相互作用提供更广泛的证据,为有关语言和音乐之间的共性的长期辩论做出了贡献。
    Despite differences in their function and domain-specific elements, syntactic processing in music and language is believed to share cognitive resources. This study aims to investigate whether the simultaneous processing of language and music share the use of a common syntactic processor or more general attentional resources. To investigate this matter we tested musicians and non-musicians using visually presented sentences and aurally presented melodies containing syntactic local and long-distance dependencies. Accuracy rates and reaction times of participants\' responses were collected. In both sentences and melodies, unexpected syntactic anomalies were introduced. This is the first study to address the processing of local and long-distance dependencies in language and music combined while reducing the effect of sensory memory. Participants were instructed to focus on language (language session), music (music session), or both (dual session). In the language session, musicians and non-musicians performed comparably in terms of accuracy rates and reaction times. As expected, groups\' differences appeared in the music session, with musicians being more accurate in their responses than non-musicians and only the latter showing an interaction between the accuracy rates for music and language syntax. In the dual session musicians were overall more accurate than non-musicians. However, both groups showed comparable behavior, by displaying an interaction between the accuracy rates for language and music syntax responses. In our study, accuracy rates seem to better capture the interaction between language and music syntax; and this interaction seems to indicate the use of distinct, however, interacting mechanisms as part of decision making strategy. This interaction seems to be subject of an increase of attentional load and domain proficiency. Our study contributes to the long-lasting debate about the commonalities between language and music by providing evidence for their interaction at a more domain-general level.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    BACKGROUND: Scientific names in biology act as universal links. They allow us to cross-reference information about organisms globally. However variations in spelling of scientific names greatly diminish their ability to interconnect data. Such variations may include abbreviations, annotations, misspellings, etc. Authorship is a part of a scientific name and may also differ significantly. To match all possible variations of a name we need to divide them into their elements and classify each element according to its role. We refer to this as \'parsing\' the name. Parsing categorizes name\'s elements into those that are stable and those that are prone to change. Names are matched first by combining them according to their stable elements. Matches are then refined by examining their varying elements. This two stage process dramatically improves the number and quality of matches. It is especially useful for the automatic data exchange within the context of \"Big Data\" in biology.
    RESULTS: We introduce Global Names Parser (gnparser). It is a Java tool written in Scala language (a language for Java Virtual Machine) to parse scientific names. It is based on a Parsing Expression Grammar. The parser can be applied to scientific names of any complexity. It assigns a semantic meaning (such as genus name, species epithet, rank, year of publication, authorship, annotations, etc.) to all elements of a name. It is able to work with nested structures as in the names of hybrids. gnparser performs with ≈99% accuracy and processes 30 million name-strings/hour per CPU thread. The gnparser library is compatible with Scala, Java, R, Jython, and JRuby. The parser can be used as a command line application, as a socket server, a web-app or as a RESTful HTTP-service. It is released under an Open source MIT license.
    CONCLUSIONS: Global Names Parser (gnparser) is a fast, high precision tool for biodiversity informaticians and biologists working with large numbers of scientific names. It can replace expensive and error-prone manual parsing and standardization of scientific names in many situations, and can quickly enhance the interoperability of distributed biological information.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    The Biological Observation Matrix (BIOM) format is widely used to store data from high-throughput studies. It aims at increasing interoperability of bioinformatic tools that process this data. However, due to multiple versions and implementation details, working with this format can be tricky. Currently, libraries in Python, R and Perl are available, whilst such for JavaScript are lacking. Here, we present a BioJS component for parsing BIOM data in all format versions. It supports import, modification, and export via a unified interface. This module aims to facilitate the development of web applications that use BIOM data. Finally, we demonstrate it\'s usefulness by two applications that already use this component. Availability: https://github.com/molbiodiv/biojs-io-biom, https://dx.doi.org/10.5281/zenodo.61698.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号