parsing

解析
  • 文章类型: Journal Article
    人类语音理解的一个核心方面是能够将连续的单词逐步整合成结构化和连贯的解释,与演讲者的预期含义保持一致。这个快速过程受到多维概率约束,包括特定语境中的语言知识和非语言信息,他们的解释连贯性推动了成功的理解。为了研究这个过程的神经基质,我们从BERT中提取句子结构的逐字度量,一个深层的语言模型,这有效地逼近了各种类型的约束之间动态相互作用的连贯结果。使用代表性相似性分析,我们测试了BERT解析深度和相关的基于语料库的测量,以及参与者在听相同句子时通过脑电/脑磁图记录的时空分辨的大脑活动.我们的结果提供了结构化解释的增量构建中涉及的神经生物学过程的详细图片。这些发现表明,通过评估和整合大脑中多方面的约束,何时何地出现连贯的解释,它涉及双边大脑区域,延伸到经典的前颞叶语言系统之外。此外,这项研究提供了经验证据,支持使用人工神经网络作为计算模型来揭示支撑大脑复杂认知过程的神经动力学。
    A core aspect of human speech comprehension is the ability to incrementally integrate consecutive words into a structured and coherent interpretation, aligning with the speaker\'s intended meaning. This rapid process is subject to multidimensional probabilistic constraints, including both linguistic knowledge and non-linguistic information within specific contexts, and it is their interpretative coherence that drives successful comprehension. To study the neural substrates of this process, we extract word-by-word measures of sentential structure from BERT, a deep language model, which effectively approximates the coherent outcomes of the dynamic interplay among various types of constraints. Using representational similarity analysis, we tested BERT parse depths and relevant corpus-based measures against the spatiotemporally resolved brain activity recorded by electro-/magnetoencephalography when participants were listening to the same sentences. Our results provide a detailed picture of the neurobiological processes involved in the incremental construction of structured interpretations. These findings show when and where coherent interpretations emerge through the evaluation and integration of multifaceted constraints in the brain, which engages bilateral brain regions extending beyond the classical fronto-temporal language system. Furthermore, this study provides empirical evidence supporting the use of artificial neural networks as computational models for revealing the neural dynamics underpinning complex cognitive processes in the brain.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    心理语言学的中心主题是研究解析器如何以及何时为引用相关元素分配先行词。一个这样的引用相关元素是非有限子句的空主题。本研究的目的是研究动词控制信息在将先行词分配给此类空主题中的作用。到目前为止,结果尚无定论。一些作者认为动词控制信息具有后期影响,而其他人则认为这种特定于动词的信息具有非常迅速的影响。我们报告了一项西班牙语的自定进度阅读研究,其中动词类型(主题与对象控制)和语法性(语法与不合语法)被操纵。语法操纵是通过在不定式本身引入人异常来进行的,而不是在后面的单词(例如,\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"直译,\“我对你承诺/建议在一个月内失去自己/自己五公斤\”)。通过这样的操作,我们可以检查是否在第一个可能的点(即,不定式)动词控制信息用于分配正确的先行词(即,具有主语控制动词的句子中的主语,以及带有对象控制动词的句子中的对象)到PRO。结果表明,在不定式上存在语法性的主要影响,这意味着正确的先行词已经分配给PRO。目前的发现与假设动词特定信息在句子处理的初始阶段中起重要作用的模型一致。
    A central topic in psycholinguistics is the study of how and when the parser assigns an antecedent to referentially-dependent elements. One such referentially-dependent element is the null subject of non-finite clauses. The aim of the present study was to examine the role of verb control information in the assignment of an antecedent to such a null subject. The results so far are inconclusive. Some authors argue that verb control information has a late influence, whereas others argue that such verb-specific information has a very rapid influence. We report a self-paced reading study in Spanish in which verb type (subject vs. object control) and grammaticality (grammatical vs. ungrammatical) were manipulated. The grammaticality manipulation was carried out by introducing a person anomaly at the infinitive itself, and not at a later word (e.g., \"Te prometí/aconsejé adelgazarme/adelgazarte cinco quilos en un mes.\" Literal translation, \"I to you promised/advised to losemyself/yourself five kilos in a month\"). With such a manipulation we can examine whether at the first possible point (i.e., the infinitive) verb control information was used to assign the correct antecedent (i.e., the subject in sentences with a subject-control verb, and the object in sentences with an object-control verb) to PRO. The results showed that at the infinitive there was a main effect of grammaticality, meaning that the correct antecedent has already been assigned to PRO. The present findings are consistent with models that assume that verb-specific information plays an important role in the initial stages of sentence processing.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    音乐可以通过将句法关系归因于顺序的音乐事件来解释,and,计算上,这种音乐解释代表了语言中句法处理的类似组合任务。虽然这种观点主要在和谐领域得到解决,我们在这里关注西方音调习语的节奏,我们首次提出了一个框架,用于对音乐解释中涉及的处理操作的时刻执行进行建模。我们的方法基于(1)音乐理论动机的语法,根据三种基本类型的依赖性(准备,切分,分裂;罗尔迈尔,2020),和(2)关于结构集成和内存存储操作的复杂性的心理上合理的预测,解析分层依赖关系所必需的,源自依赖性局部性理论(吉布森,2000).通过一个行为实验,我们举例说明了所提出的理论框架的实证实施。一百名听众被要求再现在听三个有节奏的摘录时呈现的视觉闪光的位置,每个都在形式语法下举例说明不同的解释。假设的句法处理操作的执行被发现是观察到的闪烁的报告和目标位置之间的位移的重要预测因子。总的来说,这项研究提出了一种理论方法和第一个经验的概念证明,用于对认知过程进行建模,从而将这种解释作为一种句法分析形式,其算法与语言对应物相似。本小规模实验的结果不应被视为理论的最终检验,但是在控制了几种可能的混杂因素后,它们与理论预测是一致的,并且可能成为进一步大规模和生态测试的基础。
    Music can be interpreted by attributing syntactic relationships to sequential musical events, and, computationally, such musical interpretation represents an analogous combinatorial task to syntactic processing in language. While this perspective has been primarily addressed in the domain of harmony, we focus here on rhythm in the Western tonal idiom, and we propose for the first time a framework for modeling the moment-by-moment execution of processing operations involved in the interpretation of music. Our approach is based on (1) a music-theoretically motivated grammar formalizing the competence of rhythmic interpretation in terms of three basic types of dependency (preparation, syncopation, and split; Rohrmeier, 2020), and (2) psychologically plausible predictions about the complexity of structural integration and memory storage operations, necessary for parsing hierarchical dependencies, derived from the dependency locality theory (Gibson, 2000). With a behavioral experiment, we exemplify an empirical implementation of the proposed theoretical framework. One hundred listeners were asked to reproduce the location of a visual flash presented while listening to three rhythmic excerpts, each exemplifying a different interpretation under the formal grammar. The hypothesized execution of syntactic-processing operations was found to be a significant predictor of the observed displacement between the reported and the objective location of the flashes. Overall, this study presents a theoretical approach and a first empirical proof-of-concept for modeling the cognitive process resulting in such interpretation as a form of syntactic parsing with algorithmic similarities to its linguistic counterpart. Results from the present small-scale experiment should not be read as a final test of the theory, but they are consistent with the theoretical predictions after controlling for several possible confounding factors and may form the basis for further large-scale and ecological testing.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    语言理解系统优先假设代理在增量处理期间是第一位的。虽然这可能反映了生物学上固定的偏见,与其他领域和其他物种共享,证据仅限于将特工放在第一位的语言,因此,偏差也可以从使用频率中学习。这里,我们用阿伊沃脑电图(EEG)探测偏差,一种默认把病人放在第一位的语言,但是在患者或代理人角色之间,句子初始名词仍然是局部模糊的。追随者暂时将非人类名词解释为患者,当消除歧义朝着不太常见的代理初始顺序时,会产生负面影响。通过对比和频率,人类名词被暂时解释为代理,当歧义消除针对患者的初始顺序时,会引起类似N400的消极情绪。与固定财产的概念一致,代理偏见相对于人类对象的使用频率是稳健的。然而,这种偏差可以通过非人类对象的频率经验来逆转。
    The language comprehension system preferentially assumes that agents come first during incremental processing. While this might reflect a biologically fixed bias, shared with other domains and other species, the evidence is limited to languages that place agents first, and so the bias could also be learned from usage frequency. Here, we probe the bias with electroencephalography (EEG) in Äiwoo, a language that by default places patients first, but where sentence-initial nouns are still locally ambiguous between patient or agent roles. Comprehenders transiently interpreted nonhuman nouns as patients, eliciting a negativity when disambiguation was toward the less common agent-initial order. By contrast and against frequencies, human nouns were transiently interpreted as agents, eliciting an N400-like negativity when the disambiguation was toward patient-initial order. Consistent with the notion of a fixed property, the agent bias is robust against usage frequency for human referents. However, this bias can be reversed by frequency experience for nonhuman referents.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    为了模拟自然主义环境中语言理解的行为和神经相关性,研究人员已经转向自然语言处理和机器学习领域的广泛工具。在明确建模句法结构的情况下,以前的工作主要依赖于上下文无关的语法(CFG),然而,这种形式主义对人类语言来说表达不够。组合分类语法(CCG)是具有足够表达力的语法直接组成模型,具有灵活的组成成分,可以提供增量解释。在这项工作中,我们评估了在参与者听有声读物故事时,对于通过功能磁共振成像(fMRI)收集的人类神经信号,更具表现力的CCG是否提供了比CFG更好的模型.我们进一步测试了CCG的变体之间的差异,这些变体在它们如何处理可选的附属物方面有所不同。这些评估是针对基线进行的,该基线包括来自变压器神经网络语言模型的下一个单词可预测性的估计。这样的比较揭示了CCG结构构建主要在左颞叶后的独特贡献:与CFG衍生的测量相比,CCG衍生的测量提供了更好的神经信号拟合度。这些效应在空间上不同于可预测性独有的双边优越的时间效应。因此,结构建筑的神经效应与自然倾听过程中的可预测性是分开的,这些效果的最佳特征是语法,其表达能力是基于独立的语言依据。
    To model behavioral and neural correlates of language comprehension in naturalistic environments, researchers have turned to broad-coverage tools from natural-language processing and machine learning. Where syntactic structure is explicitly modeled, prior work has relied predominantly on context-free grammars (CFGs), yet such formalisms are not sufficiently expressive for human languages. Combinatory categorial grammars (CCGs) are sufficiently expressive directly compositional models of grammar with flexible constituency that affords incremental interpretation. In this work, we evaluate whether a more expressive CCG provides a better model than a CFG for human neural signals collected with functional magnetic resonance imaging (fMRI) while participants listen to an audiobook story. We further test between variants of CCG that differ in how they handle optional adjuncts. These evaluations are carried out against a baseline that includes estimates of next-word predictability from a transformer neural network language model. Such a comparison reveals unique contributions of CCG structure-building predominantly in the left posterior temporal lobe: CCG-derived measures offer a superior fit to neural signals compared to those derived from a CFG. These effects are spatially distinct from bilateral superior temporal effects that are unique to predictability. Neural effects for structure-building are thus separable from predictability during naturalistic listening, and those effects are best characterized by a grammar whose expressive power is motivated on independent linguistic grounds.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    在在线语言理解中,解析器逐步构建分层句法结构。这种结构建造过程的预测性一直是广泛辩论的主题。先前的研究观察到,当wh短语表示即将到来的wh子句和前面的子句之间的并行性时(例如,约翰讲了一些故事,但是我们不记得哪些故事...),解析器预测性地构造wh子句。此观察结果证明了预测性结构构建。然而,研究还表明,当wh短语表明并行性不成立时,解析器不会做出预测(例如,约翰讲了一些故事...其中的故事...),句法结构预测的潜在限制。至关重要的是,这些发现是有争议的,因为该研究没有观察到处理困难时,消除输入的歧义表明预测的延续与全球语法结构(花园路径效应)不一致.有争议的结果可能是由于缺乏统计能力。因此,本研究进行了大规模复制研究(324名参与者和24组材料).结果表明,解析器预测了从句结构,无论wh短语的类型如何。也有花园路径效应的证据,支持解析器进行预测的发现。这些观察结果表明,人类解析器固有的预测算法比以前的研究所假设的更强大,并且解析器试图在修订过程中构建全局语法结构。
    In online language comprehension, the parser incrementally builds hierarchical syntactic structures. The predictive nature of this structure-building process has been the subject of extensive debate. A previous study observed that when a wh-phrase indicates parallelism between the upcoming wh-clause and a preceding clause (e.g., John told some stories, but we couldn\'t remember which stories…), the parser predictively constructs the wh-clause. This observation demonstrates predictive structure building. However, the study also suggests that the parser does not make a prediction when the wh-phrase indicates that parallelism does not hold (e.g., John told some stories … with which stories…), a potential limit to the prediction of syntactic structures. Crucially, these findings are controversial because the study did not observe processing difficulty when disambiguating input indicated that the predicted continuation was inconsistent with the globally grammatical structure (garden-path effects). The controversial results may be due to a lack of statistical power. Therefore, the present study conducted a large-scale replication study (324 participants and 24 sets of materials). The results revealed that the parser predicts the clausal structure, irrespective of the type of wh-phrase. There was also evidence of garden-path effects, supporting the finding that the parser makes a prediction. These observations suggest that the prediction algorithm inherent in the human parser is more powerful than assumed by the previous study and that the parser attempts to construct globally grammatical structures during revision.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    正确识别搭配(更一般地,多单词表达(MWE),是多个NLP应用的重要定性步骤,尤其是翻译。由于许多MWE无法从字面上翻译,未能识别它们充其量会导致翻译不准确。本文主要关注搭配。我们将展示它们与其他类型的MWE有何不同,以及如何通过基于语法的解析器和翻译器来成功解析和翻译它们。
    Proper identification of collocations (and more generally of multiword expressions (MWEs), is an important qualitative step for several NLP applications and particularly so for translation. Since many MWEs cannot be translated literally, failure to identify them yields at best inaccurate translation. This paper is mostly be concerned with collocations. We will show how they differ from other types of MWEs and how they can be successfully parsed and translated by means of a grammar-based parser and translator.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    PubMed is an invaluable resource for the biomedical community. Although PubMed is freely available, the existing API is not designed for large-scale analyses and the XML structure of the underlying data is inconvenient for complex queries. We developed an R package called pmparser to convert the data in PubMed to a relational database. Our implementation of the database, called PMDB, currently contains data on over 31 million PubMed Identifiers (PMIDs) and is updated regularly. Together, pmparser and PMDB can enable large-scale, reproducible, and transparent analyses of the biomedical literature. pmparser is licensed under GPL-2 and available at https://pmparser.hugheylab.org. PMDB is available in both PostgreSQL (DOI 10.5281/zenodo.4008109) and Google BigQuery (https://console.cloud.google.com/bigquery?project=pmdb-bq&d=pmdb).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    BACKGROUND:  : Syntactic analysis, or parsing, is a key task in natural language processing and a required component for many text mining approaches. In recent years, Universal Dependencies (UD) has emerged as the leading formalism for dependency parsing. While a number of recent tasks centering on UD have substantially advanced the state of the art in multilingual parsing, there has been only little study of parsing texts from specialized domains such as biomedicine.
    METHODS:  : We explore the application of state-of-the-art neural dependency parsing methods to biomedical text using the recently introduced CRAFT-SA shared task dataset. The CRAFT-SA task broadly follows the UD representation and recent UD task conventions, allowing us to fine-tune the UD-compatible Turku Neural Parser and UDify neural parsers to the task. We further evaluate the effect of transfer learning using a broad selection of BERT models, including several models pre-trained specifically for biomedical text processing.
    RESULTS:  : We find that recently introduced neural parsing technology is capable of generating highly accurate analyses of biomedical text, substantially improving on the best performance reported in the original CRAFT-SA shared task. We also find that initialization using a deep transfer learning model pre-trained on in-domain texts is key to maximizing the performance of the parsing methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    For a genome-wide association study in humans, genotype imputation is an essential analysis tool for improving association mapping power. When IMPUTE software is used for imputation analysis, an imputation output (GEN format) should be converted to variant call format (VCF) with imputed genotype dosage for association analysis. However, the conversion requires multiple software packages in a pipeline with a large amount of processing time.
    We developed GEN2VCF, a fast and convenient GEN format to VCF conversion tool with dosage support.
    The performance of GEN2VCF was compared to BCFtools, QCTOOL, and Oncofunco. The test data set was a 1 Mb GEN-formatted file of 5000 samples. To determine the performance of various sample sizes, tests were performed from 1000 to 5000 samples with a step size of 1000. Runtime and memory usage were used as performance measures.
    GEN2VCF showed drastically increased performances with respect to runtime and memory usage. Runtime and memory usage of GEN2VCF was at least 1.4- and 7.4-fold lower compared to other methods, respectively.
    GEN2VCF provides users with efficient conversion from GEN format to VCF with the best-guessed genotype, genotype posterior probabilities, and genotype dosage, as well as great flexibility in implementation with other software packages in a pipeline.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号