关键词: Biodiversity research Forest inventory Nomenclature R language Taxonomic databases

Mesh : Biodiversity Databases, Factual Ecology Forests Software Trees

来  源:   DOI:10.7717/peerj.16896   PDF(Pubmed)

Abstract:
Standardizing and translating species names from different databases is key to the successful integration of data sources in biodiversity research. There are numerous taxonomic name-resolution applications that implement increasingly powerful name-cleaning and matching approaches, allowing the user to resolve species relative to multiple backbones simultaneously. Yet there remains no principled approach for combining information across these underlying taxonomic backbones, complicating efforts to combine and merge species lists with inconsistent and conflicting taxonomic information. Here, we present Treemendous, an open-source software package for the R programming environment that integrates taxonomic relationships across four publicly available backbones to improve the name resolution of tree species. By mapping relationships across the backbones, this package can be used to resolve datasets with conflicting and inconsistent taxonomic origins, while ensuring the resulting species are accepted and consistent with a single reference backbone. The user can chain together different functionalities ranging from simple matching to a single backbone, to graph-based iterative matching using synonym-accepted relations across all backbones in the database. In addition, the package allows users to \'translate\' one tree species list into another, streamlining the assimilation of new data into preexisting datasets or models. The package provides a flexible workflow depending on the use case, and can either be used as a stand-alone name-resolution package or in conjunction with existing packages as a final step in the name-resolution pipeline. The Treemendous package is fast and easy to use, allowing users to quickly merge different data sources by standardizing their species names according to the regularly updated database. By combining taxonomic information across multiple backbones, the package increases matching rates and minimizes data loss, allowing for more efficient translation of tree species datasets to aid research into forest biodiversity and tree ecology.
摘要:
标准化和翻译来自不同数据库的物种名称是生物多样性研究中成功整合数据源的关键。有许多分类名称解析应用程序实现了越来越强大的名称清理和匹配方法,允许用户同时解析相对于多个主干的物种。然而,仍然没有原则性的方法来组合这些潜在的分类骨干的信息,使合并和合并具有不一致和冲突的分类学信息的物种列表的努力复杂化。这里,我们呈现的是巨大的,用于R编程环境的开源软件包,该软件包集成了四个公开可用的主干之间的分类关系,以改善树种的名称分辨率。通过映射跨骨干的关系,这个软件包可以用来解决具有冲突和不一致的分类起源的数据集,同时确保所产生的物种被接受并与单个参考主链一致。用户可以将从简单匹配到单个主干的不同功能链接在一起,使用数据库中所有主干的同义词接受的关系进行基于图的迭代匹配。此外,该软件包允许用户将一个树种列表转换为另一个树种列表,简化新数据到现有数据集或模型的同化。该软件包根据用例提供了灵活的工作流程,,并且可以用作独立的名称解析包,也可以与现有包一起用作名称解析管道中的最后一步。Treemendous包装是快速和易于使用,允许用户通过根据定期更新的数据库标准化其物种名称来快速合并不同的数据源。通过组合多个主干的分类信息,该软件包提高了匹配率并最大程度地减少了数据丢失,允许更有效地翻译树种数据集,以帮助研究森林生物多样性和树木生态学。
公众号