Toolkit

工具包
  • 文章类型: Journal Article
    在神经科学领域,队列研究项目的核心包括收集,分析,和多模态数据共享。近年来,大量高效和高质量的工具包被发布和使用,以提高队列研究中多模态数据的质量。反过来,对于队列研究人员来说,从这样的研究中收集相关问题的答案是一项耗时的任务。作为我们解决这个问题的努力的一部分,我们提出了一个由项目/组织组成的分层神经科学知识库,多模式数据库,和工具包,以便于研究人员的答案搜索过程。我们首先根据多模态数据生命周期对“神经信息学前沿”主题进行的研究进行分类,从这些研究中,作为项目/组织的信息对象,多模式数据库,并提取了工具包。然后,我们将这些信息对象映射到我们提出的知识库框架中。还开发了一个基于Python的查询工具,以便更快地访问知识库,(可访问https://github.com/Romantic-Pumpkin/PDT_fninf)。最后,基于构建的知识库,我们讨论了多模态数据生命周期不同阶段的一些关键问题和潜在趋势。
    In the field of neuroscience, the core of the cohort study project consists of collection, analysis, and sharing of multi-modal data. Recent years have witnessed a host of efficient and high-quality toolkits published and employed to improve the quality of multi-modal data in the cohort study. In turn, gleaning answers to relevant questions from such a conglomeration of studies is a time-consuming task for cohort researchers. As part of our efforts to tackle this problem, we propose a hierarchical neuroscience knowledge base that consists of projects/organizations, multi-modal databases, and toolkits, so as to facilitate researchers\' answer searching process. We first classified studies conducted for the topic \"Frontiers in Neuroinformatics\" according to the multi-modal data life cycle, and from these studies, information objects as projects/organizations, multi-modal databases, and toolkits have been extracted. Then, we map these information objects into our proposed knowledge base framework. A Python-based query tool has also been developed in tandem for quicker access to the knowledge base, (accessible at https://github.com/Romantic-Pumpkin/PDT_fninf). Finally, based on the constructed knowledge base, we discussed some key research issues and underlying trends in different stages of the multi-modal data life cycle.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Transformer是一种基于注意力的架构,被证明是自然语言处理(NLP)中最先进的模型。为了降低在医学语言理解中开始使用基于变压器的模型的难度,并扩展scikit-learn工具包在深度学习中的能力,我们提出了一个简单易学的Python工具包,名为transformer-sklearn。通过将变压器的接口仅包装在三个函数中(即,fit,得分,并预测),变压器-sklearn结合了变压器和scikit-learn工具包的优点。
    在变形金刚-sklearn中,实现了三个Python类,即,分类任务的BERTologyClassifier,命名实体识别(NER)任务的BERTologyNERClassifier,和BERTologyRegressor的回归任务。每个类包含三个方法,即,适合具有训练数据集的基于变压器的微调模型,用于评估微调模型性能的分数,和预测用于预测测试数据集的标签。transformers-sklearn是一个用户友好的工具包,它(1)可通过几个参数定制(例如,model_name_or_path和model_type),(2)支持多语言NLP任务,(3)需要较少的编码。输入数据格式由具有注释语料库的transformer-sklearn自动生成。新手只需要准备数据集。模型框架和训练方法在Transformer-sklearn中预定义。
    我们收集了四个开源医学语言数据集,包括中医试验文本多标签分类,BC5CDR用于英文生物医学文本名称实体识别,用于中国糖尿病实体识别的DiabetesNER和用于英语生物医学句子相似性估计的BIOSSES。在四个医学NLP任务中,我们的脚本的平均代码大小是45行/任务,是变压器脚本大小的六分之一。实验结果表明,基于预训练的BERT模型的Transformer-sklearn分别获得了0.8225、0.8703和0.6908的宏F1分数,关于试验分类,BC5CDR和DiabetesNER任务以及BIOSSES任务上0.8260的Pearson相关性,这与变压器的结果是一致的。
    拟议的工具包可以帮助新来者使用scikit-learn编码风格轻松解决医学语言理解任务。变压器sklearn的代码和教程可在https://doi.org/10.5281/zenodo.4453803获得。在未来,将支持更多的医学语言理解任务,以改善Transformer_sklearn的应用。
    Transformer is an attention-based architecture proven the state-of-the-art model in natural language processing (NLP). To reduce the difficulty of beginning to use transformer-based models in medical language understanding and expand the capability of the scikit-learn toolkit in deep learning, we proposed an easy to learn Python toolkit named transformers-sklearn. By wrapping the interfaces of transformers in only three functions (i.e., fit, score, and predict), transformers-sklearn combines the advantages of the transformers and scikit-learn toolkits.
    In transformers-sklearn, three Python classes were implemented, namely, BERTologyClassifier for the classification task, BERTologyNERClassifier for the named entity recognition (NER) task, and BERTologyRegressor for the regression task. Each class contains three methods, i.e., fit for fine-tuning transformer-based models with the training dataset, score for evaluating the performance of the fine-tuned model, and predict for predicting the labels of the test dataset. transformers-sklearn is a user-friendly toolkit that (1) Is customizable via a few parameters (e.g., model_name_or_path and model_type), (2) Supports multilingual NLP tasks, and (3) Requires less coding. The input data format is automatically generated by transformers-sklearn with the annotated corpus. Newcomers only need to prepare the dataset. The model framework and training methods are predefined in transformers-sklearn.
    We collected four open-source medical language datasets, including TrialClassification for Chinese medical trial text multi label classification, BC5CDR for English biomedical text name entity recognition, DiabetesNER for Chinese diabetes entity recognition and BIOSSES for English biomedical sentence similarity estimation. In the four medical NLP tasks, the average code size of our script is 45 lines/task, which is one-sixth the size of transformers\' script. The experimental results show that transformers-sklearn based on pretrained BERT models achieved macro F1 scores of 0.8225, 0.8703 and 0.6908, respectively, on the TrialClassification, BC5CDR and DiabetesNER tasks and a Pearson correlation of 0.8260 on the BIOSSES task, which is consistent with the results of transformers.
    The proposed toolkit could help newcomers address medical language understanding tasks using the scikit-learn coding style easily. The code and tutorials of transformers-sklearn are available at https://doi.org/10.5281/zenodo.4453803 . In future, more medical language understanding tasks will be supported to improve the applications of transformers_sklearn.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    With the impacts of climate disruption becoming more evident there has been an increase in the uptake of climate change adaptation \"toolkits\" to assist local governments build community resilience and adapt to the impacts of climate change. There is increasing attention and call for practitioners to adopt proactive and participatory approaches to help in the adaptive response planning process. One such toolkit is the International Council for Local Environmental Initiatives (ICLEI) Asian Cities Climate Change Resilience Network (ACCRN) Process (IAP). This is a simple but rigorous toolkit developed to help local governments in Asian cities build resilience to the impacts of climate change. This paper outlines the application of the toolkit to determine its versatility in the rural context and was trialled in the Himalayan rural enclave of Ramgad in the Indian state of Uttarakhand. Given the differences between urban and rural environments, the outcomes highlighted the need for further investigation and analysis into the process to ensure that the methodology truly reflects the nature of rural systems and their level of vulnerability and adaptive capacity. Overall, the toolkit proved to be a simple but versatile toolkit to assess the vulnerability and adaptive capacity of communities in rural Himalaya. Over 40 resilience intervention strategies were developed for the Ramgad enclave and these were prioritized according to their technical, political, social and economic feasibility.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    我们开发了EumicrobeDBLite-一种轻量级的卵菌生物综合基因组资源和序列分析平台。EummicrobeDBLite是使用基因组统一模式(GUS)构建的VBI微生物数据库(VMD)的后继数据库。在这个版本中,通过删除许多过时的模块和重新设计其他模块以整合当代数据,GUS已大大简化。几个依赖,例如用于VMD中数据加载的perl对象层,已被替换为独立的轻量级脚本。EummicrobeDBLite现在在我们实验室开发的强大注释引擎上运行,称为“基因组注释器精简版”。目前,该数据库有26个公开可用的卵菌生物基因组和10个表达序列标签(EST)数据集.浏览器页面具有动态轨迹,呈现比较基因组学分析,编码和非编码数据,tRNA基因,重复和EST对齐。此外,我们已经从12个卵菌生物中定义了44777个核心保守蛋白,它们形成了2974个簇。通过并入基因组合成查看器(GSV)工具来启用合成查看。为了便于浏览,用户界面发生了重大变化。可查询的比较基因组学信息,保守的直系同源基因和通路是该数据库更新的新关键特征之一.浏览器已升级,使用户能够上传GFF文件,以便快速查看基因组注释比较。该工具包页面集成了EMBOSS软件包,并具有基因预测工具。生物注释每6个月更新一次,以确保质量。数据库资源可在www上获得。eummicrobedb.org.
    We have developed EumicrobeDBLite-a lightweight comprehensive genome resource and sequence analysis platform for oomycete organisms. EumicrobeDBLite is a successor of the VBI Microbial Database (VMD) that was built using the Genome Unified Schema (GUS). In this version, GUS has been greatly simplified with the removal of many obsolete modules and the redesign of others to incorporate contemporary data. Several dependences, such as perl object layers used for data loading in VMD, have been replaced with independent lightweight scripts. EumicrobeDBLite now runs on a powerful annotation engine developed at our laboratory, called \'Genome Annotator Lite\'. Currently, this database has 26 publicly available genomes and 10 expressed sequence tag (EST) datasets of oomycete organisms. The browser page has dynamic tracks presenting comparative genomics analyses, coding and non-coding data, tRNA genes, repeats and EST alignments. In addition, we have defined 44 777 core conserved proteins from 12 oomycete organisms which form 2974 clusters. Synteny viewing is enabled by the incorporation of the Genome Synteny Viewer (GSV) tool. The user interface has undergone major changes for ease of browsing. Queryable comparative genomics information, conserved orthologous genes and pathways are among the new key features updated in this database. The browser has been upgraded to enable user upload of GFF files for quick view of genome annotation comparisons. The toolkit page integrates the EMBOSS package and has a gene prediction tool. Annotations for the organisms are updated once every 6 months to ensure quality. The database resource is available at www.eumicrobedb.org.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

公众号