impresso 文本按比例重用。语义丰富的历史报纸中文本重用数据的探索接口。impresso Text Reuse at Scale. An interface for the exploration of text reuse data in semantically enriched historical newspapers.-医云文献数字医云科研云海量医学决策数据服务

Abstract：

Text Reuse reveals meaningful reiterations of text in large corpora. Humanities researchers use text reuse to study, e.g., the posterior reception of influential texts or to reveal evolving publication practices of historical media. This research is often supported by interactive visualizations which highlight relations and differences between text segments. In this paper, we build on earlier work in this domain. We present impresso Text Reuse at Scale, the to our knowledge first interface which integrates text reuse data with other forms of semantic enrichment to enable a versatile and scalable exploration of intertextual relations in historical newspaper corpora. The Text Reuse at Scale interface was developed as part of the impresso project and combines powerful search and filter operations with close and distant reading perspectives. We integrate text reuse data with enrichments derived from topic modeling, named entity recognition and classification, language and document type detection as well as a rich set of newspaper metadata. We report on historical research objectives and common user tasks for the analysis of historical text reuse data and present the prototype interface together with the results of a user evaluation.

摘要：

文本重用揭示了大型语料库中文本的有意义的重复。人文研究者利用文本重用来研究,例如，对有影响力的文本的后验接收或揭示历史媒体不断发展的出版实践。这项研究通常得到交互式可视化的支持，这些可视化突出了文本段之间的关系和差异。在本文中,我们建立在这个领域的早期工作。我们大规模展示了impresso文本重用，我们的知识第一接口，它将文本重用数据与其他形式的语义丰富集成在一起，以实现对历史报纸语料库中互文关系的通用和可扩展的探索。文本重用规模接口是作为impresso项目的一部分开发的，结合了强大的搜索和过滤操作与近距离和远距离阅读视角。我们将文本重用数据与从主题建模中获得的丰富内容集成在一起，命名实体识别和分类，语言和文档类型检测以及丰富的报纸元数据集。我们报告了用于分析历史文本重用数据的历史研究目标和常见用户任务，并将原型界面与用户评估结果一起呈现。