目标:ModelDB(https://modeldb。科学)是计算神经科学的发现平台,包含超过1850个已发布的带有标准化元数据的模型代码。这些代码主要来自未经请求的模型作者提交的内容,但是这种方法本质上是有限的。例如,我们估计我们只捕获了大约三分之一的神经元模型,ModelDB中最常见的模型类型。为了更完整地描述计算神经科学建模工作的状态,我们的目标是识别包含来自计算神经科学方法及其标准化相关元数据的结果的作品(例如,细胞类型,研究主题)。
方法:我们的研究包括了ModelDB的已知计算神经科学工作和PubMed查询的确定神经科学工作。在使用SPECTER2(一种免费的文档嵌入方法)进行预筛选后,GPT-3.5和GPT-4用于识别可能的计算神经科学工作和相关元数据。
结果:SPECTER2,GPT-4和GPT-3.5在识别计算神经科学工作方面表现出多种但很高的能力。GPT-4通过指令调整和思想链实现了96.9%的准确率,GPT-3.5从54.2%提高到85.5%。GPT-4在识别相关元数据注释方面也显示出很高的潜力。
结论:识别和提取的准确性可以通过处理计算元素的模糊性来进一步提高,包括更多来自论文的信息(例如,方法部分),改进提示,等。
结论:可以将自然语言处理和大型语言模型技术添加到ModelDB中,以促进进一步的模型发现,并将有助于建立一个更加标准化和全面的框架,以建立特定领域的资源。
OBJECTIVE: ModelDB (https://modeldb.science) is a discovery platform for computational neuroscience, containing over 1850 published model codes with standardized metadata. These codes were mainly supplied from unsolicited model author submissions, but this approach is inherently limited. For example, we estimate we have captured only around one-third of NEURON models, the most common type of models in ModelDB. To more completely characterize the state of computational neuroscience modeling work, we aim to identify works containing results derived from computational neuroscience approaches and their standardized associated metadata (eg, cell types, research topics).
METHODS: Known computational neuroscience work from ModelDB and identified neuroscience work queried from PubMed were included in our study. After pre-screening with SPECTER2 (a free document embedding method), GPT-3.5, and GPT-4 were used to identify likely computational neuroscience work and relevant metadata.
RESULTS: SPECTER2, GPT-4, and GPT-3.5 demonstrated varied but high abilities in identification of computational neuroscience work. GPT-4 achieved 96.9% accuracy and GPT-3.5 improved from 54.2% to 85.5% through instruction-tuning and Chain of Thought. GPT-4 also showed high potential in identifying relevant metadata annotations.
CONCLUSIONS: Accuracy in identification and extraction might further be improved by dealing with ambiguity of what are computational elements, including more information from papers (eg, Methods section), improving prompts, etc.
CONCLUSIONS: Natural language processing and large language model techniques can be added to ModelDB to facilitate further model discovery, and will contribute to a more standardized and comprehensive framework for establishing domain-specific resources.