parsing

解析
  • 文章类型: Journal Article
    人类语音理解的一个核心方面是能够将连续的单词逐步整合成结构化和连贯的解释,与演讲者的预期含义保持一致。这个快速过程受到多维概率约束,包括特定语境中的语言知识和非语言信息,他们的解释连贯性推动了成功的理解。为了研究这个过程的神经基质,我们从BERT中提取句子结构的逐字度量,一个深层的语言模型,这有效地逼近了各种类型的约束之间动态相互作用的连贯结果。使用代表性相似性分析,我们测试了BERT解析深度和相关的基于语料库的测量,以及参与者在听相同句子时通过脑电/脑磁图记录的时空分辨的大脑活动.我们的结果提供了结构化解释的增量构建中涉及的神经生物学过程的详细图片。这些发现表明,通过评估和整合大脑中多方面的约束,何时何地出现连贯的解释,它涉及双边大脑区域,延伸到经典的前颞叶语言系统之外。此外,这项研究提供了经验证据,支持使用人工神经网络作为计算模型来揭示支撑大脑复杂认知过程的神经动力学。
    A core aspect of human speech comprehension is the ability to incrementally integrate consecutive words into a structured and coherent interpretation, aligning with the speaker\'s intended meaning. This rapid process is subject to multidimensional probabilistic constraints, including both linguistic knowledge and non-linguistic information within specific contexts, and it is their interpretative coherence that drives successful comprehension. To study the neural substrates of this process, we extract word-by-word measures of sentential structure from BERT, a deep language model, which effectively approximates the coherent outcomes of the dynamic interplay among various types of constraints. Using representational similarity analysis, we tested BERT parse depths and relevant corpus-based measures against the spatiotemporally resolved brain activity recorded by electro-/magnetoencephalography when participants were listening to the same sentences. Our results provide a detailed picture of the neurobiological processes involved in the incremental construction of structured interpretations. These findings show when and where coherent interpretations emerge through the evaluation and integration of multifaceted constraints in the brain, which engages bilateral brain regions extending beyond the classical fronto-temporal language system. Furthermore, this study provides empirical evidence supporting the use of artificial neural networks as computational models for revealing the neural dynamics underpinning complex cognitive processes in the brain.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    心理语言学的中心主题是研究解析器如何以及何时为引用相关元素分配先行词。一个这样的引用相关元素是非有限子句的空主题。本研究的目的是研究动词控制信息在将先行词分配给此类空主题中的作用。到目前为止,结果尚无定论。一些作者认为动词控制信息具有后期影响,而其他人则认为这种特定于动词的信息具有非常迅速的影响。我们报告了一项西班牙语的自定进度阅读研究,其中动词类型(主题与对象控制)和语法性(语法与不合语法)被操纵。语法操纵是通过在不定式本身引入人异常来进行的,而不是在后面的单词(例如,\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"直译,\“我对你承诺/建议在一个月内失去自己/自己五公斤\”)。通过这样的操作,我们可以检查是否在第一个可能的点(即,不定式)动词控制信息用于分配正确的先行词(即,具有主语控制动词的句子中的主语,以及带有对象控制动词的句子中的对象)到PRO。结果表明,在不定式上存在语法性的主要影响,这意味着正确的先行词已经分配给PRO。目前的发现与假设动词特定信息在句子处理的初始阶段中起重要作用的模型一致。
    A central topic in psycholinguistics is the study of how and when the parser assigns an antecedent to referentially-dependent elements. One such referentially-dependent element is the null subject of non-finite clauses. The aim of the present study was to examine the role of verb control information in the assignment of an antecedent to such a null subject. The results so far are inconclusive. Some authors argue that verb control information has a late influence, whereas others argue that such verb-specific information has a very rapid influence. We report a self-paced reading study in Spanish in which verb type (subject vs. object control) and grammaticality (grammatical vs. ungrammatical) were manipulated. The grammaticality manipulation was carried out by introducing a person anomaly at the infinitive itself, and not at a later word (e.g., \"Te prometí/aconsejé adelgazarme/adelgazarte cinco quilos en un mes.\" Literal translation, \"I to you promised/advised to losemyself/yourself five kilos in a month\"). With such a manipulation we can examine whether at the first possible point (i.e., the infinitive) verb control information was used to assign the correct antecedent (i.e., the subject in sentences with a subject-control verb, and the object in sentences with an object-control verb) to PRO. The results showed that at the infinitive there was a main effect of grammaticality, meaning that the correct antecedent has already been assigned to PRO. The present findings are consistent with models that assume that verb-specific information plays an important role in the initial stages of sentence processing.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在在线语言理解中,解析器逐步构建分层句法结构。这种结构建造过程的预测性一直是广泛辩论的主题。先前的研究观察到,当wh短语表示即将到来的wh子句和前面的子句之间的并行性时(例如,约翰讲了一些故事,但是我们不记得哪些故事...),解析器预测性地构造wh子句。此观察结果证明了预测性结构构建。然而,研究还表明,当wh短语表明并行性不成立时,解析器不会做出预测(例如,约翰讲了一些故事...其中的故事...),句法结构预测的潜在限制。至关重要的是,这些发现是有争议的,因为该研究没有观察到处理困难时,消除输入的歧义表明预测的延续与全球语法结构(花园路径效应)不一致.有争议的结果可能是由于缺乏统计能力。因此,本研究进行了大规模复制研究(324名参与者和24组材料).结果表明,解析器预测了从句结构,无论wh短语的类型如何。也有花园路径效应的证据,支持解析器进行预测的发现。这些观察结果表明,人类解析器固有的预测算法比以前的研究所假设的更强大,并且解析器试图在修订过程中构建全局语法结构。
    In online language comprehension, the parser incrementally builds hierarchical syntactic structures. The predictive nature of this structure-building process has been the subject of extensive debate. A previous study observed that when a wh-phrase indicates parallelism between the upcoming wh-clause and a preceding clause (e.g., John told some stories, but we couldn\'t remember which stories…), the parser predictively constructs the wh-clause. This observation demonstrates predictive structure building. However, the study also suggests that the parser does not make a prediction when the wh-phrase indicates that parallelism does not hold (e.g., John told some stories … with which stories…), a potential limit to the prediction of syntactic structures. Crucially, these findings are controversial because the study did not observe processing difficulty when disambiguating input indicated that the predicted continuation was inconsistent with the globally grammatical structure (garden-path effects). The controversial results may be due to a lack of statistical power. Therefore, the present study conducted a large-scale replication study (324 participants and 24 sets of materials). The results revealed that the parser predicts the clausal structure, irrespective of the type of wh-phrase. There was also evidence of garden-path effects, supporting the finding that the parser makes a prediction. These observations suggest that the prediction algorithm inherent in the human parser is more powerful than assumed by the previous study and that the parser attempts to construct globally grammatical structures during revision.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    正确识别搭配(更一般地,多单词表达(MWE),是多个NLP应用的重要定性步骤,尤其是翻译。由于许多MWE无法从字面上翻译,未能识别它们充其量会导致翻译不准确。本文主要关注搭配。我们将展示它们与其他类型的MWE有何不同,以及如何通过基于语法的解析器和翻译器来成功解析和翻译它们。
    Proper identification of collocations (and more generally of multiword expressions (MWEs), is an important qualitative step for several NLP applications and particularly so for translation. Since many MWEs cannot be translated literally, failure to identify them yields at best inaccurate translation. This paper is mostly be concerned with collocations. We will show how they differ from other types of MWEs and how they can be successfully parsed and translated by means of a grammar-based parser and translator.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    PubMed is an invaluable resource for the biomedical community. Although PubMed is freely available, the existing API is not designed for large-scale analyses and the XML structure of the underlying data is inconvenient for complex queries. We developed an R package called pmparser to convert the data in PubMed to a relational database. Our implementation of the database, called PMDB, currently contains data on over 31 million PubMed Identifiers (PMIDs) and is updated regularly. Together, pmparser and PMDB can enable large-scale, reproducible, and transparent analyses of the biomedical literature. pmparser is licensed under GPL-2 and available at https://pmparser.hugheylab.org. PMDB is available in both PostgreSQL (DOI 10.5281/zenodo.4008109) and Google BigQuery (https://console.cloud.google.com/bigquery?project=pmdb-bq&d=pmdb).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    BACKGROUND:  : Syntactic analysis, or parsing, is a key task in natural language processing and a required component for many text mining approaches. In recent years, Universal Dependencies (UD) has emerged as the leading formalism for dependency parsing. While a number of recent tasks centering on UD have substantially advanced the state of the art in multilingual parsing, there has been only little study of parsing texts from specialized domains such as biomedicine.
    METHODS:  : We explore the application of state-of-the-art neural dependency parsing methods to biomedical text using the recently introduced CRAFT-SA shared task dataset. The CRAFT-SA task broadly follows the UD representation and recent UD task conventions, allowing us to fine-tune the UD-compatible Turku Neural Parser and UDify neural parsers to the task. We further evaluate the effect of transfer learning using a broad selection of BERT models, including several models pre-trained specifically for biomedical text processing.
    RESULTS:  : We find that recently introduced neural parsing technology is capable of generating highly accurate analyses of biomedical text, substantially improving on the best performance reported in the original CRAFT-SA shared task. We also find that initialization using a deep transfer learning model pre-trained on in-domain texts is key to maximizing the performance of the parsing methods.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    For a genome-wide association study in humans, genotype imputation is an essential analysis tool for improving association mapping power. When IMPUTE software is used for imputation analysis, an imputation output (GEN format) should be converted to variant call format (VCF) with imputed genotype dosage for association analysis. However, the conversion requires multiple software packages in a pipeline with a large amount of processing time.
    We developed GEN2VCF, a fast and convenient GEN format to VCF conversion tool with dosage support.
    The performance of GEN2VCF was compared to BCFtools, QCTOOL, and Oncofunco. The test data set was a 1 Mb GEN-formatted file of 5000 samples. To determine the performance of various sample sizes, tests were performed from 1000 to 5000 samples with a step size of 1000. Runtime and memory usage were used as performance measures.
    GEN2VCF showed drastically increased performances with respect to runtime and memory usage. Runtime and memory usage of GEN2VCF was at least 1.4- and 7.4-fold lower compared to other methods, respectively.
    GEN2VCF provides users with efficient conversion from GEN format to VCF with the best-guessed genotype, genotype posterior probabilities, and genotype dosage, as well as great flexibility in implementation with other software packages in a pipeline.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Brain activity in numerous perisylvian brain regions is modulated by the expectedness of linguistic stimuli. We leverage recent advances in computational parsing models to test what representations guide the processes reflected by this activity. Recurrent Neural Network Grammars (RNNGs) are generative models of (tree, string) pairs that use neural networks to drive derivational choices. Parsing with them yields a variety of incremental complexity metrics that we evaluate against a publicly available fMRI data-set recorded while participants simply listen to an audiobook story. Surprisal, which captures a word\'s un-expectedness, correlates with a wide range of temporal and frontal regions when it is calculated based on word-sequence information using a top-performing LSTM neural network language model. The explicit encoding of hierarchy afforded by the RNNG additionally captures activity in left posterior temporal areas. A separate metric tracking the number of derivational steps taken between words correlates with activity in the left temporal lobe and inferior frontal gyrus. This pattern of results narrows down the kinds of linguistic representations at play during predictive processing across the brain\'s language network.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    To understand human language-both spoken and signed-the listener or viewer has to parse the continuous external signal into components. The question of what those components are (e.g., phrases, words, sounds, phonemes?) has been a subject of long-standing debate. We re-frame this question to ask: What properties of the incoming visual or auditory signal are indispensable to eliciting language comprehension? In this review, we assess the phenomenon of language parsing from modality-independent viewpoint. We show that the interplay between dynamic changes in the entropy of the signal and between neural entrainment to the signal at syllable level (4-5 Hz range) is causally related to language comprehension in both speech and sign language. This modality-independent Entropy Syllable Parsing model for the linguistic signal offers insight into the mechanisms of language processing, suggesting common neurocomputational bases for syllables in speech and sign language. This article is categorized under: Linguistics > Linguistic Theory Linguistics > Language in Mind and Brain Linguistics > Computational Models of Language Psychology > Language.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    Syntactic and semantic information processing can interact selectively during language comprehension. However, the nature and extent of the interactions, in particular of semantic effects on syntax, remain to some extent elusive. We revisit an influential ERP result by Kim and Osterhout (2005), later replicated by Kim and Sikos (2011), that the verb in sentences such as \'The hearty meal was devouring … \' evokes a P600 effect-a signature of syntactic processing difficulty-even though all stimuli were grammatically well-formed. We view this effect as a manifestation of a conflict in the assignment of grammatical subject and object roles to the verb\'s arguments as performed independently by a semantic system (predicting that meal should be the object) and by a syntactic system (labeling meal as the subject). More specifically, we develop an explicit algorithmic implementation of a parallel processing architecture that supports (i) meaning-based prediction of grammatical role labels, using either a probabilistic label guesser or a neural network, and (ii) comparison of the predicted labels with labels assigned by a state-of-the-art dependency parser. We demonstrate that the system can classify sentences from the Kim and Osterhout (2005) corpus with adequate accuracy, and can detect labeling conflicts as intended. Some implications of our results for models of prediction in language processing are discussed.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号