semantic enrichment

语义丰富
  • 文章类型: Journal Article
    文本重用揭示了大型语料库中文本的有意义的重复。人文研究者利用文本重用来研究,例如,对有影响力的文本的后验接收或揭示历史媒体不断发展的出版实践。这项研究通常得到交互式可视化的支持,这些可视化突出了文本段之间的关系和差异。在本文中,我们建立在这个领域的早期工作。我们大规模展示了impresso文本重用,我们的知识第一接口,它将文本重用数据与其他形式的语义丰富集成在一起,以实现对历史报纸语料库中互文关系的通用和可扩展的探索。文本重用规模接口是作为impresso项目的一部分开发的,结合了强大的搜索和过滤操作与近距离和远距离阅读视角。我们将文本重用数据与从主题建模中获得的丰富内容集成在一起,命名实体识别和分类,语言和文档类型检测以及丰富的报纸元数据集。我们报告了用于分析历史文本重用数据的历史研究目标和常见用户任务,并将原型界面与用户评估结果一起呈现。
    Text Reuse reveals meaningful reiterations of text in large corpora. Humanities researchers use text reuse to study, e.g., the posterior reception of influential texts or to reveal evolving publication practices of historical media. This research is often supported by interactive visualizations which highlight relations and differences between text segments. In this paper, we build on earlier work in this domain. We present impresso Text Reuse at Scale, the to our knowledge first interface which integrates text reuse data with other forms of semantic enrichment to enable a versatile and scalable exploration of intertextual relations in historical newspaper corpora. The Text Reuse at Scale interface was developed as part of the impresso project and combines powerful search and filter operations with close and distant reading perspectives. We integrate text reuse data with enrichments derived from topic modeling, named entity recognition and classification, language and document type detection as well as a rich set of newspaper metadata. We report on historical research objectives and common user tasks for the analysis of historical text reuse data and present the prototype interface together with the results of a user evaluation.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在医学和生物医学研究中,与性别和性别相关的方面无处不在。如果没有充分考虑,研究数据的质量较低,同时研究结果在现实环境下的可概括性较低.从翻译的角度来看,在获得的数据中缺乏性别和性别敏感性可能会对诊断产生负面影响,治疗(结果和副作用),和风险预测。为了建立改进的认可和奖励设置,我们着手在德国医学院开发系统性性别和性别意识的试点,通过在常规临床实践和研究中实施平等等行动,以及在科学实践中(包括科学教育)。我们相信文化的变化将对研究成果产生积极影响,导致对科学领域的重新思考,促进性别和性别相关的临床研究,并影响良好科学实践的设计。
    In medicine and biomedical research, sex- and gender-related aspects are ubiquitous. If not considered adequately, a lower quality of research data can be expected together with a lower generalizability of study results with real-world settings. From a translational perspective, a lack of sex- and gender-sensitivity in acquired data can have negative implications for diagnosis, treatment (outcome and side effects), and risk prediction. To establish improved recognition and reward settings we set out to develop a pilot of systemic sex and gender awareness in a German medical faculty, with actions such as implementing equality in routine clinical practice and research, as well as in scientific practice (incl. science education). We believe that the change of culture will have a positive effect on research outcomes, lead to a rethinking in the scientific domain, foster sex- and gender-related clinical studies, and influence the design of good scientific practices.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:集成来自多个域的数据可以大大提高分析工作流中生成的知识的质量和适用性。然而,处理健康数据是一项挑战,需要仔细的准备,以支持有意义的解释和稳健的结果。本体封装变量之间的关系,可以丰富健康数据集的语义内容,以增强可解释性并为下游分析提供信息。
    结果:我们开发了用于电子健康数据准备的R包,\"eHDPrep,“在多模态结直肠癌数据集上证明(661例患者,155个变量;Colo-661);另一个演示者取自癌症基因组图谱(459名患者,94个变量;TCGA-COAD)。eHDPrep提供了用户友好的质量控制方法,包括内部一致性检查和冗余去除和信息论变量合并。提供了语义丰富功能,根据变量之间的本体论共同祖先,能够生成新的信息“元变量”,在目前的研究中,用SNOMEDCT和基因本体论进行了证明。eHDPrep还有助于数字编码,从自由文本中提取变量,完整性分析,和用户查看对数据集的修改。
    结论:eHDPrep提供了有效的工具来评估和提高数据质量,为下游分析的稳健性能和可解释性奠定基础。应用于多模态结直肠癌数据集提高了数据质量,结构化,和强大的编码,以及增强的语义信息。我们使eHDPrep作为一个R包从CRAN(https://cran。r-project.org/package=eHDPrep)和GitHub(https://github.com/overton-group/eHDPrep)。
    Integration of data from multiple domains can greatly enhance the quality and applicability of knowledge generated in analysis workflows. However, working with health data is challenging, requiring careful preparation in order to support meaningful interpretation and robust results. Ontologies encapsulate relationships between variables that can enrich the semantic content of health datasets to enhance interpretability and inform downstream analyses.
    We developed an R package for electronic health data preparation, \"eHDPrep,\" demonstrated upon a multimodal colorectal cancer dataset (661 patients, 155 variables; Colo-661); a further demonstrator is taken from The Cancer Genome Atlas (459 patients, 94 variables; TCGA-COAD). eHDPrep offers user-friendly methods for quality control, including internal consistency checking and redundancy removal with information-theoretic variable merging. Semantic enrichment functionality is provided, enabling generation of new informative \"meta-variables\" according to ontological common ancestry between variables, demonstrated with SNOMED CT and the Gene Ontology in the current study. eHDPrep also facilitates numerical encoding, variable extraction from free text, completeness analysis, and user review of modifications to the dataset.
    eHDPrep provides effective tools to assess and enhance data quality, laying the foundation for robust performance and interpretability in downstream analyses. Application to multimodal colorectal cancer datasets resulted in improved data quality, structuring, and robust encoding, as well as enhanced semantic information. We make eHDPrep available as an R package from CRAN (https://cran.r-project.org/package = eHDPrep) and GitHub (https://github.com/overton-group/eHDPrep).
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    This paper presents ARCA, a software system that enables semantic search and exploration over a book catalog. The main purpose of this work is twofold: to propose a general paradigm for a semantic enrichment workflow and to evaluate a visual approach to information retrieval based on extracted information and existing knowledge graphs. ARCA has been designed and implemented following a user-centered design approach. Two different releases of the system have incrementally and iteratively developed and evaluated. The first release has evaluated the quality and usefulness of the extracted data. The second release, whose design was a refinement based on the previous evaluation results, was assessed by several users. Moreover, a comparative test with other information retrieval systems was conducted in order to study the potential added-value of the system. ARCA is employed in a real editorial scenario to visually search and explore the books of a publishing house.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Code search is a common practice for developers during software implementation. The challenges of accurate code search mainly lie in the knowledge gap between source code and natural language (i.e., queries). Due to the limited code-query pairs and large code-description pairs available, the prior studies based on deep learning techniques focus on learning the semantic matching relation between source code and corresponding description texts for the task, and hypothesize that the semantic gap between descriptions and user queries is marginal. In this work, we found that the code search models trained on code-description pairs may not perform well on user queries, which indicates the semantic distance between queries and code descriptions. To mitigate the semantic distance for more effective code search, we propose QueCos, a Query-enriched Code search model. QueCos learns to generate semantic enriched queries to capture the key semantics of given queries with reinforcement learning (RL). With RL, the code search performance is considered as a reward for producing accurate semantic enriched queries. The enriched queries are finally employed for code search. Experiments on the benchmark datasets show that QueCos can significantly outperform the state-of-the-art code search models.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    文化遗产图像是传播和维护社会文化价值的主要媒体之一。图像代表具体和抽象的内容,象征着社会,经济,政治,社会的文化价值。然而,大量的嵌入在图像中的这样的价值没有得到利用,部分原因是缺乏捕获的方法和技术解决方案,代表,利用潜在的信息。随着新技术的出现和数字格式文化遗产图像的可用性,在语义上丰富和利用这些资源所遵循的方法成为支持用户需求的重要因素。本文提出了一种方法,该方法旨在通过应用人工智能(AI)技术(例如计算机视觉(CV)和语义Web技术)来发掘通过文化数字图像传达的文化信息。为此,本文提出了一种方法,可以有效地分析和丰富涵盖所有主要阶段和任务的大量文化图像。使用Europeana平台上的文化图像收藏案例研究,对所提出的方法进行了应用和测试。本文进一步提出了案例研究的分析,挑战,吸取的教训,以及该主题的未来研究领域。
    Cultural heritage images are among the primary media for communicating and preserving the cultural values of a society. The images represent concrete and abstract content and symbolise the social, economic, political, and cultural values of the society. However, an enormous amount of such values embedded in the images is left unexploited partly due to the absence of methodological and technical solutions to capture, represent, and exploit the latent information. With the emergence of new technologies and availability of cultural heritage images in digital formats, the methodology followed to semantically enrich and utilise such resources become a vital factor in supporting users need. This paper presents a methodology proposed to unearth the cultural information communicated via cultural digital images by applying Artificial Intelligence (AI) technologies (such as Computer Vision (CV) and semantic web technologies). To this end, the paper presents a methodology that enables efficient analysis and enrichment of a large collection of cultural images covering all the major phases and tasks. The proposed method is applied and tested using a case study on cultural image collections from the Europeana platform. The paper further presents the analysis of the case study, the challenges, the lessons learned, and promising future research areas on the topic.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    本研究探讨了汉语补语强制句是否会在实时理解过程中引起处理困难。互补强制是一种语言现象,其中某些动词(例如,start,enjoy),需要一个事件表示补语,与实体表示补语(例如,book),就像作者开始写一本书一样。以前的研究报告说,与不需要事件参数的动词相比,实体表示的补语引起了处理困难(例如,作者写了一本书)。尽管在英语和德语等印欧语系中对补语强制转换的处理进行了广泛的研究,在汉语等汉藏语言中,它的研究相对不足。鉴于有许多语言元素在不同的语言家族中表现明显,例如,动词关于它们的语义属性和补语的句法表示,研究现有的语言差异是否对普通话中补语强制的处理有任何影响是有意义的。有了这个研究目标,我们记录了61位普通话母语人士的自定进度阅读时间,以调查具有三种不同动词类型(需要事件表示补语的方面动词,表示方面表达的首选解释的首选动词,和非首选动词,表示对方面表达的非首选但合理的解释),如//gu-kèkāi-shhā/tián-xiá/chá-kànzhè-fènwèn-juàn\"客户开始/填写/检查问卷。\“发现实体名词补语(例如,zhè-fènwèn-juàn\“问卷\”)在强制句中引起的阅读时间比非强制句更长。结果与以前的英语发现一致,该发现补充了强制句在实时理解过程中施加了处理成本。该研究为跨语言的强制研究提供了经验证据。
    This study examines whether Chinese complement coercion sentences with aspectual verbs will elicit processing difficulty during real-time comprehension. Complement coercion is a linguistic phenomenon in which certain verbs (e.g., start, enjoy), requiring an event-denoting complement, are combined with an entity-denoting complement (e.g., book), as in The author started a book. Previous studies have reported that the entity-denoting complement elicited processing difficulty following verbs that require event argument compared with verbs that do not (e.g., The author wrote a book). While the processing of complement coercion has been extensively studied in Indo-European languages such as English and German, it is relatively under-researched in Sino-Tibetan languages such as Mandarin Chinese. Given the fact that there are many linguistic elements behaving distinctly in the different language families, for instance, verbs with respect to their semantic properties and syntactic representations of the complement, it is meaningful to investigate whether or not the existing linguistic differences have any effect on the processing of complement coercion in Mandarin. With this research goal, we recorded self-paced reading time of 61 native Mandarin speakers to investigate the processing of the entity-denoting complement in sentences with three different verb types (aspectual verbs which require an event-denoting complement, preferred verbs which denote a preferred interpretation of the aspectual expressions, and non-preferred verbs which denote a non-preferred but plausible interpretation of the aspectual expressions), as exemplified in // gù-kè kāi-shǐ/tián-xiě/chá-kàn zhè-fèn wèn-juàn \"The customer started/filled in/checked the questionnaire.\" It was found that the entity noun complement (e.g., zhè-fèn wèn-juàn \"the questionnaire\") elicited significantly longer reading times in coercion sentences than non-coercion counterparts. The results are compatible with the previous findings in English that complement coercion sentences impose processing cost during real-time comprehension. The study contributes empirical evidence to coercion studies cross-linguistically.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    In the time it takes a human life sciences researcher to read one research article machines can process hundreds of thousands of articles. An unco-ordinated army of bots, crawlers, and other software agents are active day and night on the Internet discovering, ingesting, and analyzing research content. Many of these agents are designed to help researchers rapidly filter the ever-expanding research record and surface the articles and findings most relevant to their work. For these software agents to be most effective, they need to understand the content they are reading in a manner similar to an expert human reader. (What are the main concepts being discussed and what are the main findings asserted? What is this research article telling us that is new and what is supporting or contradicting past findings?). This is where semantic enrichment comes into play - semantic enrichment adds structured machine-readable metadata to life science articles to assist software agents in \'reading\' the content in a manner similar to a human researcher. In the present study, I\'ll define the mechanism of semantic enrichment of life sciences content, examine the benefits it is bringing to researchers today, and preview promising avenues for future benefits.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    生物和生物医学本体和术语用于组织和存储各种特定领域的知识,以提供术语使用的标准化并提高互操作性。这种本体和术语的数量越来越多,在临床中的应用也越来越多,研究和医疗保健环境要求这些本体和术语的有效和高效的质量保证和语义丰富技术。在这篇社论中,我们提供了本增刊中包含的9篇文章的介绍性摘要,用于质量保证和生物学和生物医学本体论和术语的丰富。这些文章涵盖了一系列标准,包括SNOMEDCT,国家癌症研究所词库,统一医学语言系统,北美中央癌症登记协会和OBO铸造本体论。
    Biological and biomedical ontologies and terminologies are used to organize and store various domain-specific knowledge to provide standardization of terminology usage and to improve interoperability. The growing number of such ontologies and terminologies and their increasing adoption in clinical, research and healthcare settings call for effective and efficient quality assurance and semantic enrichment techniques of these ontologies and terminologies. In this editorial, we provide an introductory summary of nine articles included in this supplement issue for quality assurance and enrichment of biological and biomedical ontologies and terminologies. The articles cover a range of standards including SNOMED CT, National Cancer Institute Thesaurus, Unified Medical Language System, North American Association of Central Cancer Registries and OBO Foundry Ontologies.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    The study aims to explore the processing pattern of Mandarin Chinese sentences with complement coercion. Complement coercion is a known linguistic phenomenon in which some verbs, semantically requiring an event-denoting complement, are combined with an entity-denoting complement, as in Mary began the book. The combination (i.e., event-selecting verb + entity-denoting noun) has been reported to involve type mismatch, and thus elicits processing difficulty. While the phenomenon has been extensively studied in Indo-European languages, such as English and German, it is debatable if the phenomenon exists in a typologically distinct language from English (e.g., in structural complexity of words), such as Mandarin. To provide empirical evidence, the study conducted a self-paced reading experiment to compare the processing patterns of coercion sentences and non-coercion controls in Mandarin. The results showed longer reading times for the coercion sentences than the non-coercion counterparts, which supported previous findings about the processing difficulty of complement coercion.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

公众号