taxonomic databases

  • 文章类型: Journal Article
    标准化和翻译来自不同数据库的物种名称是生物多样性研究中成功整合数据源的关键。有许多分类名称解析应用程序实现了越来越强大的名称清理和匹配方法,允许用户同时解析相对于多个主干的物种。然而,仍然没有原则性的方法来组合这些潜在的分类骨干的信息,使合并和合并具有不一致和冲突的分类学信息的物种列表的努力复杂化。这里,我们呈现的是巨大的,用于R编程环境的开源软件包,该软件包集成了四个公开可用的主干之间的分类关系,以改善树种的名称分辨率。通过映射跨骨干的关系,这个软件包可以用来解决具有冲突和不一致的分类起源的数据集,同时确保所产生的物种被接受并与单个参考主链一致。用户可以将从简单匹配到单个主干的不同功能链接在一起,使用数据库中所有主干的同义词接受的关系进行基于图的迭代匹配。此外,该软件包允许用户将一个树种列表转换为另一个树种列表,简化新数据到现有数据集或模型的同化。该软件包根据用例提供了灵活的工作流程,,并且可以用作独立的名称解析包,也可以与现有包一起用作名称解析管道中的最后一步。Treemendous包装是快速和易于使用,允许用户通过根据定期更新的数据库标准化其物种名称来快速合并不同的数据源。通过组合多个主干的分类信息,该软件包提高了匹配率并最大程度地减少了数据丢失,允许更有效地翻译树种数据集,以帮助研究森林生物多样性和树木生态学。
    Standardizing and translating species names from different databases is key to the successful integration of data sources in biodiversity research. There are numerous taxonomic name-resolution applications that implement increasingly powerful name-cleaning and matching approaches, allowing the user to resolve species relative to multiple backbones simultaneously. Yet there remains no principled approach for combining information across these underlying taxonomic backbones, complicating efforts to combine and merge species lists with inconsistent and conflicting taxonomic information. Here, we present Treemendous, an open-source software package for the R programming environment that integrates taxonomic relationships across four publicly available backbones to improve the name resolution of tree species. By mapping relationships across the backbones, this package can be used to resolve datasets with conflicting and inconsistent taxonomic origins, while ensuring the resulting species are accepted and consistent with a single reference backbone. The user can chain together different functionalities ranging from simple matching to a single backbone, to graph-based iterative matching using synonym-accepted relations across all backbones in the database. In addition, the package allows users to \'translate\' one tree species list into another, streamlining the assimilation of new data into preexisting datasets or models. The package provides a flexible workflow depending on the use case, and can either be used as a stand-alone name-resolution package or in conjunction with existing packages as a final step in the name-resolution pipeline. The Treemendous package is fast and easy to use, allowing users to quickly merge different data sources by standardizing their species names according to the regularly updated database. By combining taxonomic information across multiple backbones, the package increases matching rates and minimizes data loss, allowing for more efficient translation of tree species datasets to aid research into forest biodiversity and tree ecology.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本论文提供了一个经验证的维管植物的命名清单,该植物是乌克兰喀尔巴厘植物的(亚)特有和存在。该清单是针对在乌克兰喀尔巴厘分布的特有植物清单的工作的一部分。它主要基于对主要来源(即原始原著和专著)的分析,而且还使用了最近在线分类聚合器提供的数据,例如全球生物多样性信息设施(GBIF),生活目录(CoL)世界植物在线(POWO)Euro+MedPlantBase,世界植物区系在线(WFO)和其他。还修改了存放在乌克兰主要草本中的7,000多个标本,并在清单上的工作中用作支持数据源。
    检查表提供了修订后的命名法,包括发布日期的更正,重新发现了分类学的原著,纠正了乌克兰喀尔巴厘植物的(亚)特有(亚)种的作者身份和修订的分类学地位。它包含1,101个名字,其中78个物种和亚种已被接受为有效,并且提供了1023个物种和种下分类群作为同义词。完成了关于有问题的分类群的命名法的重要注释,以及关于它们在乌克兰喀尔巴士山脉中分布的简短注释,表明所有分析(亚)物种的特有范围和土壤状况。当前的清单与GBIF分类骨干链接,提供关于检测到的问题的说明,主要侧重于其对术语问题和分类不一致的更新和纠正,但也旨在讨论其他流行的分类数据库中的问题。Sabulinapauciflora被提议为新的组合,以符合Sabulina属的最新修订。
    UNASSIGNED: The current paper presents a nomenclatural checklist for vascular plants validated being (sub)endemic to and present in the flora of the Ukrainian Carpathians. This checklist is a part of the work targeted on an inventory of endemic plants distributed in the Ukrainian Carpathians. It is mainly based on the analysis of primary sources (i.e. original protologues and monographic works), but also uses the data provided in the recent online taxonomic aggregators, such as the Global Biodiversity Information Facility (GBIF), Catalogue of Life (CoL), Plants of the World Online (POWO), Euro+Med PlantBase, World Flora Online (WFO) and others. Over 7,000 specimens deposited in the leading Ukrainian herbaria were also revised and used as a supporting data source during the work on the checklist.
    UNASSIGNED: The checklist provides a revised nomenclature, including corrections on publication dates, rediscovered taxonomic protologues, corrected authorships and revised taxonomic status for (sub)endemic (sub)species of vascular plants occurring in the Ukrainian Carpathians. It contains 1,101 names, from which 78 species and subspecies have been accepted as valid and 1023 species and infraspecific taxa are provided as synonyms. It is completed with critical notes on the nomenclature of problematic taxa and brief annotations regarding their distribution in the Ukrainian Carpathians, indicating the endemicity range and sozological status for all analysed (sub)species.The current checklist is linked with the GBIF taxonomic backbone, provides notes on detected issues and primarily focuses on its update and correction of the nomenclatural issues and taxonomic inconsistencies, but also aims at discussing issues in other popular taxonomic databases.Sabulinapauciflora is proposed as a new combination to comply with a recent revision of the genus Sabulina.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    生物多样性知识图中的主要差距是分类名称与分类文献之间的联系。虽然名称和出版物通常都具有持久性标识符(PID),例如生命科学标识符(LSID)或数字对象标识符(DOI),名称的LSID很少链接到发布的DOI。本文介绍了在三个大型分类数据库之间建立这些连接的努力:IndexFungorum,国际植物名称索引(IPNI)和生物名称索引(ION)。超过一百万个名称已与DOI或其他分类出版物的持久性标识符匹配。这大约占可获得出版物数据的名称的36%。LSID和发布PID之间的映射可通过ChecklistBank使用。讨论了这种映射的应用,包括一个Web应用程序来查找一个分类名称的引用和一个知识图,该知识图使用研究人员ORCIDID上的数据将分类名称和出版物连接到这些名称的作者。
    A major gap in the biodiversity knowledge graph is a connection between taxonomic names and the taxonomic literature. While both names and publications often have persistent identifiers (PIDs), such as Life Science Identifiers (LSIDs) or Digital Object Identifiers (DOIs), LSIDs for names are rarely linked to DOIs for publications. This article describes efforts to make those connections across three large taxonomic databases: Index Fungorum, International Plant Names Index (IPNI) and the Index of Organism Names (ION). Over a million names have been matched to DOIs or other persistent identifiers for taxonomic publications. This represents approximately 36% of names for which publication data are available. The mappings between LSIDs and publication PIDs are made available through ChecklistBank. Applications of this mapping are discussed, including a web app to locate the citation of a taxonomic name and a knowledge graph that uses data on researcher ORCID ids to connect taxonomic names and publications to authors of those names.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    我们引入了一种新的方法,通过使标记重新捕获方法适应分类数据库的比较来估计可接受的物种多样性。随着时间的推移,分类数据库应该变得更加完整,因此,对其完整性的估计和它所处理的分类单元的已知多样性的误差条将减少。独立的数据库可以相互关联,所以我们用估计的时间过程比较它们来理解相关性的影响。如果后面的估计值明显大于前面的估计值,数据库是正相关的,如果它小得多,它们是负相关的,如果估计大致保持不变,那么相关性已经被平均了。我们通过估计MolluscaBase对于接受的陆生腹足动物名称的完整程度来测试此方法。使用来自独立数据库的随机名字样本,我们确定每个名称是否导致MolluscaBase中接受的名称。2020年8月测试的样本发现,16.7%的测试名称丢失;2021年7月,一个人发现5.3%丢失。MolluscaBase在此期间增长了近3,000种被接受的物种,达到27,050种。估计范围从2021年的28409±365到2020年的29063±771。所有估计都有重叠的95%置信区间,表明数据库之间的相关性不会导致重大问题。超过抽样误差的不确定性增加了475±430种,因此,我们对2021年底公认的陆生腹足动物物种的估计为28,895±630种。这一估计比以前的物种高出4000多个物种。这一估计没有考虑到物种进出同义词的持续流动,新发现,或改变分类方法和概念。陆生腹足动物的物种命名曲线还远未达到渐近线,加上额外的不确定性,这意味着,预测有多少更多的物种可能最终被认可是目前不可行的。我们的方法可用于估计最近软体动物的名称总数(相对于目前接受的名称),已知的软体动物化石的多样性,以及其他门的已知多样性。
    We introduce a new method of estimating accepted species diversity by adapting mark-recapture methods to comparisons of taxonomic databases. A taxonomic database should become more complete over time, so the error bar on an estimate of its completeness and the known diversity of the taxon it treats will decrease. Independent databases can be correlated, so we use the time course of estimates comparing them to understand the effect of correlation. If a later estimate is significantly larger than an earlier one, the databases are positively correlated, if it is significantly smaller, they are negatively correlated, and if the estimate remains roughly constant, then the correlations have averaged out. We tested this method by estimating how complete MolluscaBase is for accepted names of terrestrial gastropods. Using random samples of names from an independent database, we determined whether each name led to a name accepted in MolluscaBase. A sample tested in August 2020 found that 16.7% of tested names were missing; one in July 2021 found 5.3% missing. MolluscaBase grew by almost 3,000 accepted species during this period, reaching 27,050 species. The estimates ranged from 28,409 ± 365 in 2021 to 29,063 ± 771 in 2020. All estimates had overlapping 95% confidence intervals, indicating that correlations between the databases did not cause significant problems. Uncertainty beyond sampling error added 475 ± 430 species, so our estimate for accepted terrestrial gastropods species at the end of 2021 is 28,895 ± 630 species. This estimate is more than 4,000 species higher than previous ones. The estimate does not account for ongoing flux of species into and out of synonymy, new discoveries, or changing taxonomic methods and concepts. The species naming curve for terrestrial gastropods is still far from reaching an asymptote, and combined with the additional uncertainties, this means that predicting how many more species might ultimately be recognized is presently not feasible. Our methods can be applied to estimate the total number of names of Recent mollusks (as opposed to names currently accepted), the known diversity of fossil mollusks, and known diversity in other phyla.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    基于特定遗传标记(metabarcoding)的扩增对生物的分类学鉴定隐含地需要分类学数据库中环境DNA序列的足够的歧视性信息和分类学覆盖。通过比较通过代谢编码和光学显微镜获得的蓝细菌和微藻的测定,定量检查了这些要求。我们使用了在高山地区的37个湖泊和22条河流中收集的浮游和生物膜样品。我们专注于参考数据库中最常用和最具代表性的两个遗传标记,即16SrRNA和18SrRNA基因。使用Blastn的序列间隙分析表明,在99-100%的同一性范围内,大约30%(浮游生物)和60%(生物膜)的序列在参考数据库(NCBIGenBank)中没有找到任何紧密的对应物。同样,分类学差异分析显示,通过光学显微镜鉴定的大约50%的蓝藻和真核微藻物种没有出现在参考数据库中。在这两种情况下,主要分类群体之间的差距大小不同。即使考虑到在显微镜下确定并在参考数据库中表示的物种,22%和26%仍未包括在身份≥95%和≥97%的百分比水平的blastn获得的结果中,分别。主要原因是由于扩增和/或测序失败以及显微镜步骤中的潜在错误鉴定而导致的匹配序列的缺乏。我们的结果定量地表明,在16SrRNA和18SrRNA序列的分类和高通量测序生物监测数据的解释中,在元编码中的主要障碍是由于参考数据库的分类完整性和短的长度存在重要的缺口。这项研究集中在阿尔卑斯山地区,但是在其他调查较少的地理区域,差距的程度可能更大。
    The taxonomic identification of organisms based on the amplification of specific genetic markers (metabarcoding) implicitly requires adequate discriminatory information and taxonomic coverage of environmental DNA sequences in taxonomic databases. These requirements were quantitatively examined by comparing the determination of cyanobacteria and microalgae obtained by metabarcoding and light microscopy. We used planktic and biofilm samples collected in 37 lakes and 22 rivers across the Alpine region. We focused on two of the most used and best represented genetic markers in the reference databases, namely the 16S rRNA and 18S rRNA genes. A sequence gap analysis using blastn showed that, in the identity range of 99-100%, approximately 30% (plankton) and 60% (biofilm) of the sequences did not find any close counterpart in the reference databases (NCBI GenBank). Similarly, a taxonomic gap analysis showed that approximately 50% of the cyanobacterial and eukaryotic microalgal species identified by light microscopy were not represented in the reference databases. In both cases, the magnitude of the gaps differed between the major taxonomic groups. Even considering the species determined under the microscope and represented in the reference databases, 22% and 26% were still not included in the results obtained by the blastn at percentage levels of identity ≥95% and ≥97%, respectively. The main causes were the absence of matching sequences due to amplification and/or sequencing failure and potential misidentification in the microscopy step. Our results quantitatively demonstrated that in metabarcoding the main obstacles in the classification of 16S rRNA and 18S rRNA sequences and interpretation of high-throughput sequencing biomonitoring data were due to the existence of important gaps in the taxonomic completeness of the reference databases and the short length of reads. The study focused on the Alpine region, but the extent of the gaps could be much greater in other less investigated geographic areas.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    目的:植物名称的标准化是生物学各个领域的关键一步,包括生物多样性,生物地理学,和植被研究。此处介绍了WorldFlora软件包,以通过将植物名称列表与来自WorldFloraOnline(WFO)的静态副本进行匹配来帮助实现这一目标,正在进行的全球努力,到2020年完成所有已知维管植物和苔藓植物的在线植物区系。
    结果:基于直接和模糊匹配,WorldFlora将来自WFO的匹配案例插入到包含分类名称的提交数据集。为四个数据集提供了选择预期最佳单个匹配的结果和成功率,包括最近比较用于纠正分类单元名称的软件工具的两个数据集。
    结论:WorldFlora为半自动工厂名称检查提供了一条简单的管道。对于四个数据集,可信比赛的成功率从94.7%到99.9%不等。
    OBJECTIVE: The standardization of plant names is a critical step in various fields of biology, including biodiversity, biogeography, and vegetation research. The WorldFlora package is introduced here to help achieve this goal by matching lists of plant names with a static copy from World Flora Online (WFO), an ongoing global effort to complete an online flora of all known vascular plants and bryophytes by 2020.
    RESULTS: Based on direct and fuzzy matching, WorldFlora inserts matching cases from the WFO to a submitted data set containing taxonomic names. The results and success rates for selecting the expected best single matches are presented for four data sets, including two data sets used in recent comparisons of software tools for correcting taxon names.
    CONCLUSIONS: WorldFlora offers a straightforward pipeline for semi-automatic plant name checking. For the four data sets, the success rate of credible matches ranged from 94.7% to 99.9%.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号