scientific literature

科学文献
  • 文章类型: Journal Article
    进行文献计量学分析,作为对与月经癫痫相关的出版物的全面审查,并讨论该领域的知识现状。
    从Scopus数据库检索了1956年至2022年之间发布的出版物。使用R包和VOSviewer进行文献计量分析,以显示期刊的数据和网络,组织,作者,国家,和关键词。2022年10月15日进行的分析总共产生了320项细化研究。
    出版物数量大幅增加,特别是在过去的20年里。月经癫痫相关出版物主要来自医学和其他学科领域,美国拥有最大的出版物产量。作者的合作程度很低,组织,和国家层面,尤其是在亚洲大陆。出版物仍然很少,特别是在实践准则上,风险评估,和药物相关的研究。根据关键字分析,文献计量分析确定了未来研究的可能主题。
    月经癫痫相关文献至关重要,但仍然不足,需要进一步研究。
    UNASSIGNED: To perform a bibliometric analysis as a comprehensive review of publications associated with catamenial epilepsy and discuss the current state of knowledge in the field.
    UNASSIGNED: Publications published between 1956 and 2022 were retrieved from the Scopus database. Bibliometric analysis was performed using the R package and VOSviewer to show the data and network of journals, organizations, authors, countries, and keywords. The analysis conducted on October 15, 2022, yielded a total of 320 refinement studies.
    UNASSIGNED: The number of publications has escalated significantly, particularly in the last 20 years. Catamenial epilepsy-related publications originated mostly from medicine and other subject areas, with the United States having the largest publication output. Collaboration is low at the author, organizational, and national levels, especially in the Asian continent. Publications remain scarce, particularly on practice guidelines, risk assessment, and medication-related research. Based on a keyword analysis, a bibliometric analysis identified possible themes for future investigation.
    UNASSIGNED: Catamenial epilepsy-related literature is crucial but still insufficient, and further studies are required.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    在这篇文章中,我认为,研究综合与哲学方法的结合可以填补神经科学中重要的方法论空白。虽然多年来实验研究和正式建模已经看到他们的方法在严谨性和复杂性方面逐渐增加,分析和综合报告新结果和模型的大量文献的任务已经落后。这个问题变得更加严重,因为神经科学已经发展并扩展到一个由相关但部分独立的子领域组成的巨大马赛克,每个人都有自己的文献。这种碎片化不仅使得很难看到神经科学研究的全貌,而且限制了各个子领域的进展。当前的神经科学文献具有创造信息科学家唐·斯旺森所说的“未被发现的公共知识”的完美条件-存在于不同已发布信息的相互含义中的知识,但仍然未被发现,因为这些信息没有明确地联系在一起。目前严格的研究综合方法,如系统评价和荟萃分析,大多侧重于结合类似的研究,不适合探索未被发现的公共知识。为了这个目标,它们需要适应和补充。我认为,成功探索神经科学文献中的隐藏含义将需要将这些适应的研究综合方法与哲学方法相结合,以进行严格(和创造性)的分析和综合。
    In this essay, I argue that the combination of research synthesis and philosophical methods can fill an important methodological gap in neuroscience. While experimental research and formal modelling have seen their methods progressively increase in rigour and sophistication over the years, the task of analysing and synthesizing the vast literature reporting new results and models has lagged behind. The problem is aggravated because neuroscience has grown and expanded into a vast mosaic of related but partially independent subfields, each with their own literatures. This fragmentation not only makes it difficult to see the full picture emerging from neuroscience research but also limits progress in individual subfields. The current neuroscience literature has the perfect conditions to create what the information scientist Don Swanson called \"undiscovered public knowledge\"-knowledge that exists in the mutual implications of different published pieces of information but that is nonetheless undiscovered because those pieces have not been explicitly connected. Current methods for rigorous research synthesis, such as systematic reviews and meta-analyses, mostly focus on combining similar studies and are not suited for exploring undiscovered public knowledge. To that aim, they need to be adapted and supplemented. I argue that successful exploration of the hidden implications in the neuroscience literature will require the combination of these adapted research synthesis methods with philosophical methods for rigorous (and creative) analysis and synthesis.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    背景:研究空白是指现有知识体系中未回答的问题,由于缺乏研究或结果不确定。研究差距是科学研究的重要起点和动力。确定研究差距的传统方法,如文献综述和专家意见,可能很耗时,劳动密集型,而且容易产生偏见.在处理快速发展或时间敏感的主题时,它们也可能不足。因此,需要创新的可扩展方法来确定研究差距,系统地评估文献,并优先考虑感兴趣的主题的进一步研究领域。
    目的:在本文中,我们提出了一种基于机器学习的方法,通过分析科学文献来识别研究差距。我们使用COVID-19大流行作为案例研究。
    方法:我们使用COVID-19开放研究(CORD-19)数据集进行了分析,以确定COVID-19文献中的研究空白,其中包括1,121,433篇与COVID-19大流行有关的论文。我们的方法基于BERTopic主题建模技术,它利用转换器和基于类的术语频率-逆文档频率来创建密集的集群,从而允许易于解释的主题。我们基于BERTopic的方法涉及3个阶段:嵌入文档,聚类文档(降维和聚类),和代表主题(生成候选和最大化候选相关性)。
    结果:应用研究选择标准后,我们在本研究的分析中纳入了33,206篇摘要.最终的研究差距清单确定了21个不同的领域,分为6个主要主题。这些主题是:\“COVID-19的病毒”,\“COVID-19的危险因素”,\“预防COVID-19”,\“COVID-19的治疗”,\“COVID-19期间的医疗保健服务,\”和COVID-19的影响。\"最突出的话题,在超过一半的分析研究中观察到,是“COVID-19的影响。
    结论:提出的基于机器学习的方法有可能发现科学文献中的研究空白。本研究并非旨在取代选定主题内的个别文献研究。相反,它可以作为指导,在与以前的出版物指定用于未来探索的研究问题相关的特定领域制定精确的文献检索查询。未来的研究应该利用从目标区域最常见的数据库中检索到的最新研究列表。在可行的情况下,全文或,至少,应该对讨论部分进行分析,而不是将其分析局限于摘要。此外,未来的研究可以评估更有效的建模算法,尤其是那些将主题建模与统计不确定性量化相结合的方法,如共形预测。
    BACKGROUND: Research gaps refer to unanswered questions in the existing body of knowledge, either due to a lack of studies or inconclusive results. Research gaps are essential starting points and motivation in scientific research. Traditional methods for identifying research gaps, such as literature reviews and expert opinions, can be time consuming, labor intensive, and prone to bias. They may also fall short when dealing with rapidly evolving or time-sensitive subjects. Thus, innovative scalable approaches are needed to identify research gaps, systematically assess the literature, and prioritize areas for further study in the topic of interest.
    OBJECTIVE: In this paper, we propose a machine learning-based approach for identifying research gaps through the analysis of scientific literature. We used the COVID-19 pandemic as a case study.
    METHODS: We conducted an analysis to identify research gaps in COVID-19 literature using the COVID-19 Open Research (CORD-19) data set, which comprises 1,121,433 papers related to the COVID-19 pandemic. Our approach is based on the BERTopic topic modeling technique, which leverages transformers and class-based term frequency-inverse document frequency to create dense clusters allowing for easily interpretable topics. Our BERTopic-based approach involves 3 stages: embedding documents, clustering documents (dimension reduction and clustering), and representing topics (generating candidates and maximizing candidate relevance).
    RESULTS: After applying the study selection criteria, we included 33,206 abstracts in the analysis of this study. The final list of research gaps identified 21 different areas, which were grouped into 6 principal topics. These topics were: \"virus of COVID-19,\" \"risk factors of COVID-19,\" \"prevention of COVID-19,\" \"treatment of COVID-19,\" \"health care delivery during COVID-19,\" \"and impact of COVID-19.\" The most prominent topic, observed in over half of the analyzed studies, was \"the impact of COVID-19.\"
    CONCLUSIONS: The proposed machine learning-based approach has the potential to identify research gaps in scientific literature. This study is not intended to replace individual literature research within a selected topic. Instead, it can serve as a guide to formulate precise literature search queries in specific areas associated with research questions that previous publications have earmarked for future exploration. Future research should leverage an up-to-date list of studies that are retrieved from the most common databases in the target area. When feasible, full texts or, at minimum, discussion sections should be analyzed rather than limiting their analysis to abstracts. Furthermore, future studies could evaluate more efficient modeling algorithms, especially those combining topic modeling with statistical uncertainty quantification, such as conformal prediction.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    人工智能已在骨科研究中显示出实用性。从机器学习中得出的算法模型已经证明了自适应学习具有对结果的预测性应用,导致文献中的牵引力增加。这项研究旨在确定机器学习关节成形术的研究趋势,并预测新出现的关键术语。
    通过ClarivateAnalytics的WebofScience核心合集选择了1992年至2023年专注于关节成形术中机器学习的已发表文献。在此之后,获得了文献计量指标,并使用Bibliometrix和VOSviewer进行了额外的检查,以确定文献中的历史和当前模式。
    通过基于关节成形术文献中的机器学习应用的文献计量来源获得了总共235篇文献。34个国家发表了有关该主题的文章,美国被证明是最大的全球贡献者。四百五家机构在国际上投稿,哈佛医学院和加州大学系统是最相关的机构,生产了75和44篇文章,分别。KwonYM是最有成效的作者,而根据h指数,HaeberleHS和RamkumarPN的影响最大。专题图和共现可视化有助于确定科学数据库中存在的主要和利基主题。
    关节成形术研究中的机器学习以不断增长的年生产率和国际作者和机构的贡献继续获得牵引力。位于美国的机构和作者是关节成形术研究中机器学习应用的主要贡献者。这项研究发现了已经发生的趋势,目前正在进行中,并在这个领域出现,旨在为未来热点发展提供信息。
    UNASSIGNED: Artificial intelligence has demonstrated utility in orthopedic research. Algorithmic models derived from machine learning have demonstrated adaptive learning with predictive application towards outcomes, leading to increased traction in the literature. This study aims to identify machine learning arthroplasty research trends and anticipate emerging key terms.
    UNASSIGNED: Published literature focused on machine learning in arthroplasty from 1992 to 2023 was selected through the Web of Science Core Collection of Clarivate Analytics. Following that, bibliometric indicators were attained and brought in to perform an additional examination using Bibliometrix and VOSviewer to identify historical and present patterns within the literature.
    UNASSIGNED: A total of 235 documents were obtained through bibliometric sourcing based on machine learning applications within the arthroplasty literature. Thirty-four countries published articles on the topic, and the United States was demonstrated to be the largest global contributor. Four hundred-five institutions internationally contributed articles, with Harvard Medical School and the University of California system as the most relevant institutes, with 75 and 44 articles produced, respectively. Kwon YM was the most productive author, while Haeberle HS and Ramkumar PN were the most impactful based on h-index. The Thematic map and Co-occurrence visualization helped identify both major and niche themes present in the scientific databases.
    UNASSIGNED: Machine learning in arthroplasty research continues to gain traction with a growing annual production rate and contributions from international authors and institutions. Institutions and authors based in the United States are the leading contributors to machine learning applications within arthroplasty research. This research discerns trends that have occurred, are presently ongoing, and are emerging within this field, aiming to inform future hotspot development.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    底栖有孔虫,单细胞海洋生物,以其广泛的分布而闻名,丰富和物种多样性,测试(即,贝壳)在沉积层中的保存(例如,历史)记录,以及对环境变化的敏感性。因为这些特点,它们已被广泛用作环境监测的生物指标,最近,作为生态质量状况(EcoQS)评价中的生物质量元素(BQEs)。有关底栖有孔虫作为生物指标的全球科学文献是从Scopus数据库收集的(1973年至2022年共966篇论文),并使用科学计量软件进行了探索。结果表明,底栖有孔虫对污染物反应的调查始于50多年前。的确,不仅公布的文件数量最近达到顶峰(即,2021年和2022年),但涉及有孔虫指数在EcoQS评估中的应用的决策科学类别中的论文百分比也有所增长。
    Benthic foraminifera, single-celled marine organisms, are known for their wide distribution, high abundance and species diversity, test (i.e., shell) preservation in the sedimentary (e.g., historical) record, and sensitivity to environmental changes. Because of these characteristics, they have been widely used as bioindicators in environmental monitoring and, more recently, as Biological Quality Elements (BQEs) in the Ecological Quality Status (EcoQS) evaluation. The global scientific literature on benthic foraminifera as bioindicators was gathered from the Scopus database (overall 966 papers from 1973 to 2022) and explored with scientometric software. The outcomes highlight that the investigation of benthic foraminiferal response to pollutants started over 50 years ago. Indeed, not only the number of published documents has recently peaked (i.e., 2021 and 2022) but there has been also a growth in the percentages of papers falling within the Decision Sciences category that deals with the application of foraminiferal indices for the EcoQS assessment.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    能够进行科学交流是获得科学学位的学生的一项重要技能。在未来的研究生院和科学专业的职业生涯中使用的技能包括口头和书面沟通,以及科学素养和能够创建数字来显示信息。人们一致认为,应该在整个本科科学课程中教授这些技能;然而,许多教师指出,没有足够的时间来涵盖技能和开发材料来有效地融入这些技能,特别是低层次的内容为重点的课程。这里,我们提出了一个积极的课程,可以很容易地纳入任何内容为重点的本科细胞生物学课程。该课程围绕科学文献设计,使学生参与多种主动学习活动,以培养不同类型的科学交流技能。该课程不仅培养了学生在科学交流中的技能和自我效能感,它还使他们参与课程内容,并激发他们对研究的兴趣。虽然对课程进行更改以包括科学交流可能很困难,做一些小的改变,例如将该课程添加到已经存在的以内容为重点的课程中,可能会对早期本科理科学生的技能和态度产生重大影响。
    Being able to communicate scientifically is an important skill for students graduating with a science degree. Skills used in future graduate school and careers for science majors include oral and written communication, as well as science literacy and being able to create figures to display information. There is a consensus that these skills should be taught throughout an undergraduate science curriculum; however, many instructors have cited insufficient time to cover skills and develop materials to effectively incorporate these skills, especially into lower-level content-focused courses. Here, we present an active curriculum that can easily be incorporated into any content-focused undergraduate Cell Biology course. The curriculum is designed around scientific literature that engages students in a multitude of active learning activities to develop different types of scientific communication skills. This curriculum not only develops student skills and self-efficacy in scientific communication, it also engages them in course content and stimulates their interest in research. While making changes to a course to include scientific communication can be difficult, making small changes, such as addition of this curriculum to an already-existing content-focused course, could make a big difference in the skills and attitudes of early undergraduate science students.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    科学知识正以前所未有的速度在生物医学文献中积累。最广泛使用的数据库与生物医学相关的文章摘要,PubMed,目前包含超过3600万个条目。在该数据库中搜索感兴趣的主题的用户面临难以手动处理的数千个条目(文章)。在这项工作中,我们提出了一种交互式工具,用于自动消化大量PubMed文章:PMIDigest(PubMedID消化器)。该系统允许根据不同的标准对物品进行分类/排序,包括文章类型和不同的引文相关数字。它还计算感兴趣类别的MeSH(医学主题词)术语的分布,提供一组主题的图片。这些MeSH术语在文章摘要中以不同的颜色突出显示,具体取决于类别。还提供了文章间引用网络的交互式表示,以便轻松定位与特定主题相关的文章“集群”,以及他们相应的“集线器”文章。除了PubMed文章,系统还可以处理一组Scopus或WebofScience条目。总之,有了这个系统,用户可以拥有大量文章及其主要主题趋势的“鸟瞰”,并获得在简单的摘要列表中不明显的其他信息。
    Scientific knowledge is being accumulated in the biomedical literature at an unprecedented pace. The most widely used database with biomedicine-related article abstracts, PubMed, currently contains more than 36 million entries. Users performing searches in this database for a subject of interest face thousands of entries (articles) that are difficult to process manually. In this work, we present an interactive tool for automatically digesting large sets of PubMed articles: PMIDigest (PubMed IDs digester). The system allows for classification/sorting of articles according to different criteria, including the type of article and different citation-related figures. It also calculates the distribution of MeSH (medical subject headings) terms for categories of interest, providing in a picture of the themes addressed in the set. These MeSH terms are highlighted in the article abstracts in different colors depending on the category. An interactive representation of the interarticle citation network is also presented in order to easily locate article \"clusters\" related to particular subjects, as well as their corresponding \"hub\" articles. In addition to PubMed articles, the system can also process a set of Scopus or Web of Science entries. In summary, with this system, the user can have a \"bird\'s eye view\" of a large set of articles and their main thematic tendencies and obtain additional information not evident in a plain list of abstracts.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    本研究旨在确定高影响力的决定因素,按影响因子(IF)和特征值得分衡量,耳鼻喉科杂志。
    “耳鼻咽喉科”期刊的文献计量数据收集自期刊引文报告(JCR)数据库。2009-2020年,我们收集了归一化特征值得分,5年IF,即时性指数,来自期刊自引的IF分数,已发表的可引用文章的比例和数量,和总引用数。High-IF和-Eigenfactor期刊被认为是每年该指标前四分位数内的期刊。
    高IF和-Eigenfactor耳鼻喉科期刊显示较高的5年期IF,即时性索引,和无自引的IF(所有年份p<.05),包括按特征值排序的总引文计数和可引用文章(所有年份p<.05)。耳鼻喉科IF与同年的5年IF和即时性指数相关(所有年份p<0.05)和前几年(所有年份p<0.05;2017-2018年p<0.05;2009-2016年p>0.05)。特征因子与5年IF相关,总引文计数,以及同一年(所有年份p<0.05)和前几年(2013-2018年p<0.05)内的可引用文章。多元线性回归显示,5年IF(2009-2018年p<0.05)和前2年的即时性指数(2017-2018年p<0.05;2009-2016年p>0.05)预测了2019年IF。同样,5年IF,总引文计数,和可引用的文章(2013-2018年p<0.05)预测了2019年的特征因子得分。
    持续发表有影响力的文章是高IF和特征值得分的主要驱动因素。特征因子得分反映了对耳鼻咽喉科期刊的独特评估;与按IF进行排名相比,按特征因子得分对耳鼻咽喉科期刊进行排名显着改变了期刊排名。
    NA。
    UNASSIGNED: This study aims to identify determinants of high impact, measured by Impact Factor (IF) and Eigenfactor score, among otolaryngology journals.
    UNASSIGNED: Bibliometric data of \"otorhinolaryngology\" journals were collected from the Journal Citation Reports (JCR) database. For the years 2009-2020, we collected normalized Eigenfactor score, 5-year IF, immediacy index, fraction of IF from journal-self citation, proportion and magnitude of published citable articles, and total citation counts. High-IF and -Eigenfactor journals were considered those within the top-quartile of that metric each respective year.
    UNASSIGNED: High-IF and -Eigenfactor otolaryngology journals displayed higher 5-year IFs, immediacy indexes, and IF without self-citation (p < .05 for all years) including total citations counts and citable articles when ranked by Eigenfactor (p < .05 for all years). Otolaryngology IF correlated with 5-year IF and immediacy index within the same year (p < .05 for all years) and from previous years (p < .05 for all years; p < .05 for 2017-2018; p > .05 for 2009-2016). Eigenfactor correlated with 5-year IF, total citation counts, and citable articles within the same year (p < .05 for all years) and previous years (p < .05 for 2013-2018). Multilinear regression revealed that 5-year IF (p < .05 for 2009-2018) and immediacy index from the prior 2 years (p < .05 for 2017-2018; p > .05 for 2009-2016) predicted 2019 IF. Similarly, 5-year IF, total citation counts, and citable articles (p < .05 for 2013-2018) predicted 2019 Eigenfactor score.
    UNASSIGNED: Sustained publication of impactful articles is the dominant driver of high IF and Eigenfactor score. Eigenfactor score reflects a unique evaluation of otolaryngology journals; ranking otolaryngology journals by their Eigenfactor scores significantly alters journal ranking compared to ranking by IF.
    UNASSIGNED: NA.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    神经母细胞瘤是一种儿童神经系统肿瘤,影响全世界成千上万的儿童,关于其预后的信息对患者来说是至关重要的,他们的家人,和临床医生。相关生物信息学分析的主要目标之一是提供能够包括其表达水平可以有效预测患者预后的基因的稳定遗传特征。在这项研究中,我们收集了生物医学文献中发表的神经母细胞瘤的预后特征,并注意到其中最常见的基因是三个:AHCY,DPYLS3和NME1。因此,我们通过对诊断为神经母细胞瘤的不同组患者的多个基因表达数据集进行生存分析和二元分类来研究这三个基因的预后能力。最后,我们讨论了这三个基因与神经母细胞瘤相关的文献中的主要研究。我们的结果,在这三个验证步骤中,确认AHCY的预后能力,DPYLS3和NME1,并强调它们在神经母细胞瘤预后中的关键作用。我们的结果可以对神经母细胞瘤遗传学研究产生影响:生物学家和医学研究人员可以更加关注这三个基因在神经母细胞瘤患者中的调控和表达,因此,可以开发出更好的治疗和治疗方法,可以挽救病人的生命。
    Neuroblastoma is a childhood neurological tumor which affects hundreds of thousands of children worldwide, and information about its prognosis can be pivotal for patients, their families, and clinicians. One of the main goals in the related bioinformatics analyses is to provide stable genetic signatures able to include genes whose expression levels can be effective to predict the prognosis of the patients. In this study, we collected the prognostic signatures for neuroblastoma published in the biomedical literature, and noticed that the most frequent genes present among them were three: AHCY, DPYLS3, and NME1. We therefore investigated the prognostic power of these three genes by performing a survival analysis and a binary classification on multiple gene expression datasets of different groups of patients diagnosed with neuroblastoma. Finally, we discussed the main studies in the literature associating these three genes with neuroblastoma. Our results, in each of these three steps of validation, confirm the prognostic capability of AHCY, DPYLS3, and NME1, and highlight their key role in neuroblastoma prognosis. Our results can have an impact on neuroblastoma genetics research: biologists and medical researchers can pay more attention to the regulation and expression of these three genes in patients having neuroblastoma, and therefore can develop better cures and treatments which can save patients\' lives.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    背景:有关机制的信息,包括病毒和细菌在内的致病病原体的管理和治疗可从MEDLINE索引的研究出版物中轻松获得。然而,在实验研究的基础上,确定了具体表征这些病原体及其特性的文献,对于理解由这些病原体引起的疾病的分子基础很重要,需要筛选大量的文章,以排除偶然提及的病原体,或在其他非实验环境中引用病原体,如公共卫生。
    目的:在这项工作中,我们为在科学文献中描述病原体的自动方法的发展奠定了基础,专注于确定涉及实验背景下病原体实验研究的研究。没有手动注释的病原体语料库可用于此目的,而这些资源对于支持基于机器学习的模型的开发是必要的。因此,我们的目标是填补这一空白,在任务定义的一些简化假设下,从MEDLINE自动生成大型数据集,并使用它来探索自动方法,专门支持检测研究出版物中提到的实验研究病原体。
    方法:我们使用NCBI资源自动开发了病原体提及表征文献数据集-READBiomed-病原体,我们提供的。诸如NCBI分类法之类的资源,MeSH和GenBank可以有效地用于鉴定有关实验研究病原体的相关文献,更具体地说,使用MeSH链接到MEDLINE引文,包括标题和摘要与实验研究的病原体。我们实验了几种基于机器学习的自然语言处理(NLP)算法,利用这些数据集作为训练数据。对检测专门描述病原体实验研究的论文的任务进行建模。
    结果:我们表明,我们的数据集READBiomed-Pathogens可用于探索自然语言处理配置,以进行实验性病原体提及表征。READBiomed-Pathogensincludesreplicationsrelatedtoorganismsincludingbacters,病毒,以及少量毒素和其他致病因子。
    结论:我们研究了科学文献中实验研究的病原体的特征,开发由自动开发的数据集支持的几种自然语言处理方法。作为工作的核心贡献,我们提出了一种使用现有生物医学资源自动构建病原体鉴定数据集的方法.数据集和注释代码是公开可用的。在小的手动注释数据集上另外评估病原体提及鉴定和表征算法的性能,表明我们已经生成的数据集允许表征感兴趣的病原体。
    背景:不适用。
    Information pertaining to mechanisms, management and treatment of disease-causing pathogens including viruses and bacteria is readily available from research publications indexed in MEDLINE. However, identifying the literature that specifically characterises these pathogens and their properties based on experimental research, important for understanding of the molecular basis of diseases caused by these agents, requires sifting through a large number of articles to exclude incidental mentions of the pathogens, or references to pathogens in other non-experimental contexts such as public health.
    In this work, we lay the foundations for the development of automatic methods for characterising mentions of pathogens in scientific literature, focusing on the task of identifying research that involves the experimental study of a pathogen in an experimental context. There are no manually annotated pathogen corpora available for this purpose, while such resources are necessary to support the development of machine learning-based models. We therefore aim to fill this gap, producing a large data set automatically from MEDLINE under some simplifying assumptions for the task definition, and using it to explore automatic methods that specifically support the detection of experimentally studied pathogen mentions in research publications.
    We developed a pathogen mention characterisation literature data set -READBiomed-Pathogens- automatically using NCBI resources, which we make available. Resources such as the NCBI Taxonomy, MeSH and GenBank can be used effectively to identify relevant literature about experimentally researched pathogens, more specifically using MeSH to link to MEDLINE citations including titles and abstracts with experimentally researched pathogens. We experiment with several machine learning-based natural language processing (NLP) algorithms leveraging this data set as training data, to model the task of detecting papers that specifically describe experimental study of a pathogen.
    We show that our data set READBiomed-Pathogens can be used to explore natural language processing configurations for experimental pathogen mention characterisation. READBiomed-Pathogens includes citations related to organisms including bacteria, viruses, and a small number of toxins and other disease-causing agents.
    We studied the characterisation of experimentally studied pathogens in scientific literature, developing several natural language processing methods supported by an automatically developed data set. As a core contribution of the work, we presented a methodology to automatically construct a data set for pathogen identification using existing biomedical resources. The data set and the annotation code are made publicly available. Performance of the pathogen mention identification and characterisation algorithms were additionally evaluated on a small manually annotated data set shows that the data set that we have generated allows characterising pathogens of interest.
    N/A.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号