BERTopic BERTopic-医云文献数字医云科研云海量医学决策数据服务

BERTopic 关注

BERTopic

文献(16篇)

百科

视频

1 Experiences of Alzheimer's disease and related dementia family caregivers on Reddit communities: A topic modeling and sentiment analysis.

Reddit 社区阿尔茨海默病和相关痴呆家庭照顾者的经验：主题建模和情感分析。影响指数 : 暂无
发表时间：2024
来源期刊：Artif Intell Health PMID：39246419

DOI：10.36922/aih.3075
文章类型： Journal Article

阿尔茨海默病和相关痴呆（ADRD）是一系列以认知能力下降为特征的疾病，这对受影响的个人及其照顾者都构成了重大挑战。以前的文献集中在患者家庭调查上，这些调查并不总是能捕捉到护理人员真实经历的广度。在线社交媒体平台为个人提供了一个分享经验的空间，并获得有关照顾ADRD患者的建议。这项研究利用Reddit，一个照顾者经常光顾的平台，寻求照顾家庭成员的建议，并为ADRD提供建议。确定大多数护理人员寻求和寻求的讨论或建议的主题，我们采用结构化主题建模技术（如BERTopic）来分析这些帖子的内容，并使用主题间距离图来辨别不同Reddit类别中主题的变化。此外,我们使用价值感知词典和情绪推理器对Reddit帖子的情绪进行分析，以推断负面程度，积极的,和讨论帖子的中立情绪。我们的研究结果表明，护理人员最常讨论和寻求建议的主题与护理人员的故事有关，社区支持,并关注ADRD。具体来说,我们旨在重现Reddit对家庭成员虐待关怀的有机搜索，财政斗争,幻觉的症状,和ADRD家庭成员的重复。这些结果强调了在线社区对于全面了解ADRD护理人员所面临的多方面经验和挑战的重要性。
Alzheimer\'s disease and related dementias (ADRD) are a spectrum of disorders characterized by cognitive decline, which pose significant challenges for both affected individuals and their caregivers. Previous literature has focused on patient family surveys which do not always capture the breadth of authentic experiences of the caregiver. Online social media platforms provide a space for individuals to share their experiences and obtain advice toward caring for those with ADRD. This study leverages Reddit, a platform frequented by caregivers seeking advice for caring for a family member with advice for ADRD. To identify the topics of discussion or advice that most caregivers seek and sought after, we employed structured topic modeling techniques such as BERTopic to analyze the content of these posts and use an intertopic distance map to discern the variation in themes across different Reddit categories. In addition, we analyze the sentiment of the Reddit postings using Valence Aware Dictionary and Sentiment Reasoner to deduce the degree of negative, positive, and neutral sentiment of the discussion posts. Our findings reveal that the topics that caregivers most frequently discuss and seek advice for were related to caregiver stories, community support, and concerns ADRD. Specifically, we aimed to reproduce an organic Reddit search of caregiving of abuse on family member, financial struggles, symptoms of hallucinations, and repetition in ADRD family members. These results underscore the importance of online communities for gaining a comprehensive understanding of the multifaceted experiences and challenges faced by ADRD caregivers.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
2 Cluster-Based BERTopic Modeling on Swedish COVID-19 Vaccine Posts.

瑞典 COVID - 19 疫苗发布的基于聚类的 BERTopic 建模。影响指数 : 暂无
发表时间：Aug 2024 22
来源期刊：Stud Health Technol Inform PMID：39176864

DOI：10.3233/SHTI240805
文章类型： Journal Article

本文在流行的瑞典讨论论坛Flashback上探讨了多个线程中的流行主题。在各种各样的主题中，该论坛积极参与用户解决和辩论与COVID-19疫苗和疫苗接种有关的问题。通过在14个相关主题讨论的帖子中区分积极和消极的观点，我们雇佣了BERTopic,一个模块化的主题建模框架，它利用预先训练的语言模型并应用聚类技术来识别当前主题。这使我们能够对总体主题进行细致入微的探索，为瑞典关于COVID-19疫苗和疫苗接种的讨论的多面性提供有价值的见解。
This paper explores the prevalent themes across multiple threads on the popular Swedish discussion forum Flashback. Among its diverse array of topics, the forum actively engages users in addressing and debating questions pertaining to COVID-19 vaccines and vaccination. Through distinguishing between positive and negative perspectives within posts across 14 relevant thread discussions, we employ BERTopic, a modular topic modeling framework, which utilizes pre-trained language models and applies clustering techniques to identify prevailing topics. This enables us to conduct a nuanced exploration of overarching themes, offering valuable insights into the multifaceted nature of the discussions regarding COVID-19 vaccines and vaccination in Sweden.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
3 Discovering patterns and trends in customer service technologies patents using large language model.

使用大型语言模型发现客户服务技术专利的模式和趋势。影响指数 : 3.776
发表时间：Jul 2024 30
来源期刊：Heliyon PMID：39149018

DOI：10.1016/j.heliyon.2024.e34701
文章类型： Journal Article

服务的定义已经从2000年代之前对制造业中物质价值的关注发展到以服务业的显着增长为基础的以客户为中心的价值。由于通过第四次工业革命和COVID-19将数字技术纳入其中，数字化转型对服务行业的公司至关重要。这项研究利用变压器（BERT）的双向编码器表示来分析2000年至2022年间注册的3029项与客户服务行业和数字化转型相关的国际专利。通过主题建模，这项研究确定了客户服务行业的10个主要主题，并分析了它们的年度趋势。我们的研究结果表明，截至2022年，频率最高的趋势是以用户为中心的网络服务设计，而云计算在过去五年中经历了最急剧的增长。自互联网诞生以来，以用户为中心的网络服务一直在稳步发展。云计算是2023年为客户服务数字化转型而大力开发的关键技术之一。这项研究确定了客户服务行业专利的时间序列趋势，并提出了使用BERTopic预测技术未来趋势的有效性。
The definition of service has evolved from a focus on material value in manufacturing before the 2000s to a customer-centric value based on the significant growth of the service industry. Digital transformation has become essential for companies in the service industry due to the incorporation of digital technology through the Fourth Industrial Revolution and COVID-19. This study utilised Bidirectional Encoder Representations from Transformer (BERT) to analyse 3029 international patents related to the customer service industry and digital transformation registered between 2000 and 2022. Through topic modelling, this study identified 10 major topics in the customer service industry and analysed their yearly trends. Our findings show that as of 2022, the trend with the highest frequency is user-centric network service design, while cloud computing has experienced the steepest increase in the last five years. User-centric network services have been steadily developing since the inception of the Internet. Cloud computing is one of the key technologies being developed intensively in 2023 for the digital transformation of customer service. This study identifies time series trends of customer service industry patents and suggests the effectiveness of using BERTopic to predict future trends in technology.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
4 Mpox Discourse on Twitter by Sexual Minority Men and Gender-Diverse Individuals: Infodemiological Study Using BERTopic.

性少数族裔男性和性别多样化个体在 Twitter 上的 Mpox 话语：使用 BERTopic 的信息流行病学研究。影响指数 : 14.557
发表时间：Aug 2024 13
来源期刊：JMIR Public Health Surveill PMID：39137013

DOI：10.2196/59193
文章类型： Journal Article

背景：根据美国疾病控制和预防中心（CDC）的数据，从2022年5月至2024年3月，在美国爆发了32,063例病例和58例死亡，在全球范围内导致95,912例病例。像其他疾病爆发一样（例如，艾滋病毒)与感知的社区协会，天痘会产生耻辱的风险，加剧同性恋恐惧症，并可能阻碍医疗保健的获得和社会公平。然而,关于天花的现有文献对性少数男性和性别多样化（SMMGD）个体的观点表示有限。
目标：为了填补这一空白，这项研究旨在综合SMMGD个人之间的讨论主题，并听取SMMGD的声音，以识别当前围绕水痘的公共卫生沟通中的问题，以提高包容性，股本，和正义。
方法：我们分析了在2020年10月至2022年9月期间由2326名用户发布的与mpox相关的帖子（N=8688），这些用户在Twitter/X上自我标识为SMMGD，并位于美国。我们在推文中应用了BERTopic（一种主题建模技术），通过人工标签和注释验证了机器生成的主题，并对每个主题的推文进行了内容分析。地理分析是针对与加利福尼亚大学相关的美国各州最突出的主题的大小进行的，洛杉矶（UCLA）女同性恋，同性恋,和双性恋（LGB）社会气候指数。
结果：BERTopic确定了11个主题，哪些注释者被标记为水痘健康行动主义(n=2590，29.81%)，水痘疫苗接种（n=2242，25.81%），和不良事件（n=85，0.98％）；讽刺，笑话,和情绪表达（n=1220，14.04%）；COVID-19和水痘（n=636，7.32%）；政府或公共卫生反应（n=532，6.12%）；水痘症状（n=238，2.74%）；病例报告（n=192，2.21%）；关于病毒命名的双关语（即，水痘；n=75，0.86%）；媒体宣传（n=59，0.68%）；儿童水痘（n=58，0.67%）。Spearman等级相关表明，在美国州一级，健康行动主义的主题大小与UCLALGB社会气候指数之间存在显着负相关（ρ=-0.322，P=.03）。
结论：SMMGD个体对天花的讨论包括两种功利主义（例如，疫苗接入，病例报告,和天花症状）和情绪激动（即，提高认识,倡导反对同性恋恐惧症，错误信息/虚假信息，和健康耻辱)主题。在LGB社会接受度较低的美国各州，水痘健康活动更为普遍，这表明SMMGD个体在面对公共卫生压迫时具有弹性的沟通模式。我们的社会倾听方法可以促进未来的公共卫生工作，提供一种具有成本效益的方式来捕捉受影响人群的观点。这项研究阐明了SMMGD与水痘话语的参与，强调需要更具包容性的公共卫生规划。研究结果还强调了水痘的社会影响：健康耻辱。我们的发现可以为干预措施提供信息和有形卫生资源的优化交付，利用计算混合方法分析（例如，BERTopic)和大数据。
BACKGROUND: The mpox outbreak resulted in 32,063 cases and 58 deaths in the United States and 95,912 cases worldwide from May 2022 to March 2024 according to the US Centers for Disease Control and Prevention (CDC). Like other disease outbreaks (eg, HIV) with perceived community associations, mpox can create the risk of stigma, exacerbate homophobia, and potentially hinder health care access and social equity. However, the existing literature on mpox has limited representation of the perspective of sexual minority men and gender-diverse (SMMGD) individuals.
OBJECTIVE: To fill this gap, this study aimed to synthesize themes of discussions among SMMGD individuals and listen to SMMGD voices for identifying problems in current public health communication surrounding mpox to improve inclusivity, equity, and justice.
METHODS: We analyzed mpox-related posts (N=8688) posted between October 2020 and September 2022 by 2326 users who self-identified on Twitter/X as SMMGD and were geolocated in the United States. We applied BERTopic (a topic-modeling technique) on the tweets, validated the machine-generated topics through human labeling and annotations, and conducted content analysis of the tweets in each topic. Geographic analysis was performed on the size of the most prominent topic across US states in relation to the University of California, Los Angeles (UCLA) lesbian, gay, and bisexual (LGB) social climate index.
RESULTS: BERTopic identified 11 topics, which annotators labeled as mpox health activism (n=2590, 29.81%), mpox vaccination (n=2242, 25.81%), and adverse events (n=85, 0.98%); sarcasm, jokes, and emotional expressions (n=1220, 14.04%); COVID-19 and mpox (n=636, 7.32%); government or public health response (n=532, 6.12%); mpox symptoms (n=238, 2.74%); case reports (n=192, 2.21%); puns on the naming of the virus (ie, mpox; n=75, 0.86%); media publicity (n=59, 0.68%); and mpox in children (n=58, 0.67%). Spearman rank correlation indicated significant negative correlation (ρ=-0.322, P=.03) between the topic size of health activism and the UCLA LGB social climate index at the US state level.
CONCLUSIONS: Discussions among SMMGD individuals on mpox encompass both utilitarian (eg, vaccine access, case reports, and mpox symptoms) and emotionally charged (ie, promoting awareness, advocating against homophobia, misinformation/disinformation, and health stigma) themes. Mpox health activism is more prevalent in US states with lower LGB social acceptance, suggesting a resilient communicative pattern among SMMGD individuals in the face of public health oppression. Our method for social listening could facilitate future public health efforts, providing a cost-effective way to capture the perspective of impacted populations. This study illuminates SMMGD engagement with the mpox discourse, underscoring the need for more inclusive public health programming. Findings also highlight the social impact of mpox: health stigma. Our findings could inform interventions to optimize the delivery of informational and tangible health resources leveraging computational mixed-method analyses (eg, BERTopic) and big data.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
5 Exploring climate change discourse on social media and blogs using a topic modeling analysis.

使用主题建模分析探索社交媒体和博客上的气候变化话语。影响指数 : 3.776
发表时间：Jun 2024 15
来源期刊：Heliyon PMID：38947458

DOI：10.1016/j.heliyon.2024.e32464
文章类型： Journal Article

气候变化是我们这个时代最紧迫的全球性问题之一，了解公众对该主题的看法和认识对于制定有效的政策以减轻其影响至关重要。虽然传统的调查方法已经被用来衡量公众的意见，自然语言处理（NLP）和数据可视化技术的进步为分析来自社交媒体和博客文章的用户生成内容提供了新的机会。在这项研究中,从社交媒体和各种博客中收集了与气候变化相关的文本的新数据集。使用BERTopic和LDA分析数据集，以识别和可视化与气候变化相关的最重要主题。该研究还使用句子相似性来确定所写评论的相似性以及它们属于哪些主题类别。关键字提取和文本表示的不同技术的性能，包括OpenAI,最大边际相关性(MMR)，和KeyBERT,将主题建模与BERTopic进行了比较。可以看出，使用基于OpenAI的BERTopic可以获得最佳的连贯性得分和主题多样性度量。这些结果提供了公众对气候变化的态度和看法的见解，这可以为政策制定提供信息，并有助于减少导致气候变化的活动。
Climate change is one of the most pressing global issues of our time, and understanding public perception and awareness of the topic is crucial for developing effective policies to mitigate its effects. While traditional survey methods have been used to gauge public opinion, advances in natural language processing (NLP) and data visualization techniques offer new opportunities to analyze user-generated content from social media and blog posts. In this study, a new dataset of climate change-related texts was collected from social media sources and various blogs. The dataset was analyzed using BERTopic and LDA to identify and visualize the most important topics related to climate change. The study also used sentence similarity to determine the similarities in the comments written and which topic categories they belonged to. The performance of different techniques for keyword extraction and text representation, including OpenAI, Maximal Marginal Relevance (MMR), and KeyBERT, was compared for topic modeling with BERTopic. It was seen that the best coherence score and topic diversity metric were obtained with OpenAI-based BERTopic. The results provide insights into the public\'s attitudes and perceptions towards climate change, which can inform policy development and contribute to efforts to reduce activities that cause climate change.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
6 Topic Modeling Analysis of Chinese Medicine Literature on Gastroesophageal Reflux Disease: Insights into Potential Treatment.

胃食管反流病中医文献的主题建模分析：对潜在治疗的启示。影响指数 : 2.626
发表时间：Jun 2024 8
来源期刊：Chin J Integr Med PMID：38850480

DOI：10.1007/s11655-024-3800-y
文章类型： Journal Article

目的：分析胃食管反流病（GERD）的中药处方，我们对GERD相关经典CM文献的主题进行建模，提供潜在治疗的见解。
方法：临床指南用于确定GERD的症状术语，从数据库“Imedbooks”中检索CM文献，以获取相关处方及其相应来源，适应症,和其他信息。BERTopic用于识别主要主题并可视化数据。
结果：在手动过滤后，总共查询了36,207个条目，并获得了1,938个有效条目。BERTopic确定了八个主题，包括消化功能减弱，胃流感,呼吸道相关症状,胃功能障碍,小儿患者的反流和胃肠功能障碍，呕吐,中风和酒精积聚与GERD的风险相关，呕吐及其原因，返流,上腹痛,还有胃灼热的症状.
结论：主题建模以时间有效和规模有效的方式提供了对GERD的经典CM文献的无偏分析。基于这一分析,我们提出了一系列缓解症状的治疗方案，包括草药和非药物干预措施，如针灸和饮食疗法。
OBJECTIVE: To analyze Chinese medicine (CM) prescriptions for gastroesophageal reflux disease (GERD), we model topics on GERD-related classical CM literature, providing insights into the potential treatment.
METHODS: Clinical guidelines were used to identify symptom terms for GERD, and CM literature from the database \"Imedbooks\" was retrieved for related prescriptions and their corresponding sources, indications, and other information. BERTopic was applied to identify the main topics and visualize the data.
RESULTS: A total of 36,207 entries are queried and 1,938 valid entries were acquired after manually filtering. Eight topics were identified by BERTopic, including digestion function abate, stomach flu, respiratory-related symptoms, gastric dysfunction, regurgitation and gastrointestinal dysfunction in pediatric patients, vomiting, stroke and alcohol accumulation are associated with the risk of GERD, vomiting and its causes, regurgitation, epigastric pain, and symptoms of heartburn.
CONCLUSIONS: Topic modeling provides an unbiased analysis of classical CM literature on GERD in a time-efficient and scale-efficient manner. Based on this analysis, we present a range of treatment options for relieving symptoms, including herbal remedies and non-pharmacological interventions such as acupuncture and dietary therapy.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
7 Comparing Open-Access Database and Traditional Intensive Care Studies Using Machine Learning: Bibliometric Analysis Study.

使用机器学习比较开放存取数据库和传统重症监护研究：文献计量分析研究。影响指数 : 7.076
发表时间：Apr 2024 17
来源期刊：J Med Internet Res PMID：38630522

DOI：10.2196/48330
文章类型： Journal Article

背景：重症监护研究主要依赖于常规方法，如随机对照试验。然而,开放获取的日益普及，过去十年的免费数据库为研究开辟了新的途径，提供新的见解。利用机器学习（ML）技术可以在大量研究中分析趋势。
目的：本研究旨在使用ML进行全面的文献计量分析，以比较传统重症监护病房（ICU）研究和使用开放获取数据库（OAD）进行的研究的趋势和研究主题。
方法：我们在本研究中使用ML分析WebofScience数据库中的出版物。文章分为“OAD”和“传统重症监护”(TIC)研究。OAD研究被纳入重症监护医疗信息集市(MIMIC)，eICU合作研究数据库(eICU-CRD),阿姆斯特丹大学医学中心数据库（AmsterdamUMCdb），高时间分辨率ICU数据集(HiRID),和儿科重症监护数据库。TIC研究包括所有其他重症监护研究。使用均匀流形近似和投影来可视化语料库分布。BERTopic技术用于生成30个主题唯一标识号，并将主题分类为22个主题系列。
结果：共提取了227,893条记录。排除后，145,426篇文献被鉴定为TIC，1301篇文献被鉴定为OAD研究。在过去的20年里，TIC研究经历了指数增长，在2021年达到16378篇文章的峰值，而OAD研究显示自2018年以来持续激增。脓毒症,通风相关研究，儿科重症监护是最经常讨论的话题。TIC研究显示出比OAD研究更广泛的覆盖范围，提出了更广泛的研究范围。
结论：本研究分析了ICU研究，从大量出版物中提供有价值的见解。OAD研究补充了TIC研究，专注于预测建模，而TIC研究捕获了必要的定性信息。以互补的方式整合两种方法是ICU研究的未来方向。此外,自然语言处理技术为文献综述和文献计量分析提供了一种变革性的替代方案。
BACKGROUND: Intensive care research has predominantly relied on conventional methods like randomized controlled trials. However, the increasing popularity of open-access, free databases in the past decade has opened new avenues for research, offering fresh insights. Leveraging machine learning (ML) techniques enables the analysis of trends in a vast number of studies.
OBJECTIVE: This study aims to conduct a comprehensive bibliometric analysis using ML to compare trends and research topics in traditional intensive care unit (ICU) studies and those done with open-access databases (OADs).
METHODS: We used ML for the analysis of publications in the Web of Science database in this study. Articles were categorized into \"OAD\" and \"traditional intensive care\" (TIC) studies. OAD studies were included in the Medical Information Mart for Intensive Care (MIMIC), eICU Collaborative Research Database (eICU-CRD), Amsterdam University Medical Centers Database (AmsterdamUMCdb), High Time Resolution ICU Dataset (HiRID), and Pediatric Intensive Care database. TIC studies included all other intensive care studies. Uniform manifold approximation and projection was used to visualize the corpus distribution. The BERTopic technique was used to generate 30 topic-unique identification numbers and to categorize topics into 22 topic families.
RESULTS: A total of 227,893 records were extracted. After exclusions, 145,426 articles were identified as TIC and 1301 articles as OAD studies. TIC studies experienced exponential growth over the last 2 decades, culminating in a peak of 16,378 articles in 2021, while OAD studies demonstrated a consistent upsurge since 2018. Sepsis, ventilation-related research, and pediatric intensive care were the most frequently discussed topics. TIC studies exhibited broader coverage than OAD studies, suggesting a more extensive research scope.
CONCLUSIONS: This study analyzed ICU research, providing valuable insights from a large number of publications. OAD studies complement TIC studies, focusing on predictive modeling, while TIC studies capture essential qualitative information. Integrating both approaches in a complementary manner is the future direction for ICU research. Additionally, natural language processing techniques offer a transformative alternative for literature review and bibliometric analysis.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
8 Public perception on active aging after COVID-19: an unsupervised machine learning analysis of 44,343 posts.

公众对 COVID - 19 后主动衰老的感知：对 44, 343 个帖子的无监督机器学习分析。影响指数 : 6.461
发表时间：2024
来源期刊：Front Public Health PMID：38515596

DOI：10.3389/fpubh.2024.1329704
文章类型： Journal Article

■在主流社交媒体平台上分析公众对中国积极老龄化的看法，以确定中国共产党于2022年发布的“老龄化职业和老年成人护理系统发展第十四个五年计划”是否充分满足了公众需求。
■提取了2020年1月1日至2022年6月30日在微博上发布的原始推文，其中包含“老化”或“老年”字样。来自基于变压器（BERT）的模型的双向编码器表示用于生成与该感知相关的主题。研究人员进行了定性的主题分析和主题标签的独立审查。
■调查结果表明，公众的看法围绕四个主题：（1）健康预防和保护，（2）生活环境便利，(3)认知健康与社会融合,(四)保护老年人的权益。
■我们的研究发现，尽管该计划与大多数主题一致，它缺乏明确的财务保障和婚姻生活规划。
To analyze public perceptions of active aging in China on mainstream social media platforms to determine whether the \"14th Five Year Plan for the Development of the Aging Career and Older Adult Care System\" issued by the CPC in 2022 has fully addressed public needs.
The original tweets posted on Weibo between January 1, 2020, and June 30, 2022, containing the words \"aging\" or \"old age\" were extracted. A bidirectional encoder representation from transformers (BERT)-based model was used to generate themes related to this perception. A qualitative thematic analysis and an independent review of the theme labels were conducted by the researchers.
The findings indicate that public perceptions revolved around four themes: (1) health prevention and protection, (2) convenient living environments, (3) cognitive health and social integration, and (4) protecting the rights and interests of the older adult.
Our study found that although the Plan aligns with most of these themes, it lacks clear planning for financial security and marital life.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
9 Machine Learning-Based Approach for Identifying Research Gaps: COVID-19 as a Case Study.

基于机器学习的研究差距识别方法：以 COVID - 19 为例。影响指数 : 暂无
发表时间：Mar 2024 5
来源期刊：JMIR Form Res PMID：38441952

DOI：10.2196/49411
文章类型： Journal Article

背景：研究空白是指现有知识体系中未回答的问题，由于缺乏研究或结果不确定。研究差距是科学研究的重要起点和动力。确定研究差距的传统方法，如文献综述和专家意见，可能很耗时，劳动密集型，而且容易产生偏见.在处理快速发展或时间敏感的主题时，它们也可能不足。因此，需要创新的可扩展方法来确定研究差距，系统地评估文献，并优先考虑感兴趣的主题的进一步研究领域。
目的：在本文中，我们提出了一种基于机器学习的方法，通过分析科学文献来识别研究差距。我们使用COVID-19大流行作为案例研究。
方法：我们使用COVID-19开放研究（CORD-19）数据集进行了分析，以确定COVID-19文献中的研究空白，其中包括1,121,433篇与COVID-19大流行有关的论文。我们的方法基于BERTopic主题建模技术，它利用转换器和基于类的术语频率-逆文档频率来创建密集的集群，从而允许易于解释的主题。我们基于BERTopic的方法涉及3个阶段：嵌入文档，聚类文档(降维和聚类)，和代表主题（生成候选和最大化候选相关性）。
结果：应用研究选择标准后，我们在本研究的分析中纳入了33,206篇摘要.最终的研究差距清单确定了21个不同的领域，分为6个主要主题。这些主题是：\“COVID-19的病毒”，\“COVID-19的危险因素”，\“预防COVID-19”，\“COVID-19的治疗”，\“COVID-19期间的医疗保健服务，\”和COVID-19的影响。\"最突出的话题,在超过一半的分析研究中观察到，是“COVID-19的影响。
结论：提出的基于机器学习的方法有可能发现科学文献中的研究空白。本研究并非旨在取代选定主题内的个别文献研究。相反，它可以作为指导，在与以前的出版物指定用于未来探索的研究问题相关的特定领域制定精确的文献检索查询。未来的研究应该利用从目标区域最常见的数据库中检索到的最新研究列表。在可行的情况下,全文或,至少,应该对讨论部分进行分析，而不是将其分析局限于摘要。此外，未来的研究可以评估更有效的建模算法，尤其是那些将主题建模与统计不确定性量化相结合的方法，如共形预测。
BACKGROUND: Research gaps refer to unanswered questions in the existing body of knowledge, either due to a lack of studies or inconclusive results. Research gaps are essential starting points and motivation in scientific research. Traditional methods for identifying research gaps, such as literature reviews and expert opinions, can be time consuming, labor intensive, and prone to bias. They may also fall short when dealing with rapidly evolving or time-sensitive subjects. Thus, innovative scalable approaches are needed to identify research gaps, systematically assess the literature, and prioritize areas for further study in the topic of interest.
OBJECTIVE: In this paper, we propose a machine learning-based approach for identifying research gaps through the analysis of scientific literature. We used the COVID-19 pandemic as a case study.
METHODS: We conducted an analysis to identify research gaps in COVID-19 literature using the COVID-19 Open Research (CORD-19) data set, which comprises 1,121,433 papers related to the COVID-19 pandemic. Our approach is based on the BERTopic topic modeling technique, which leverages transformers and class-based term frequency-inverse document frequency to create dense clusters allowing for easily interpretable topics. Our BERTopic-based approach involves 3 stages: embedding documents, clustering documents (dimension reduction and clustering), and representing topics (generating candidates and maximizing candidate relevance).
RESULTS: After applying the study selection criteria, we included 33,206 abstracts in the analysis of this study. The final list of research gaps identified 21 different areas, which were grouped into 6 principal topics. These topics were: \"virus of COVID-19,\" \"risk factors of COVID-19,\" \"prevention of COVID-19,\" \"treatment of COVID-19,\" \"health care delivery during COVID-19,\" \"and impact of COVID-19.\" The most prominent topic, observed in over half of the analyzed studies, was \"the impact of COVID-19.\"
CONCLUSIONS: The proposed machine learning-based approach has the potential to identify research gaps in scientific literature. This study is not intended to replace individual literature research within a selected topic. Instead, it can serve as a guide to formulate precise literature search queries in specific areas associated with research questions that previous publications have earmarked for future exploration. Future research should leverage an up-to-date list of studies that are retrieved from the most common databases in the target area. When feasible, full texts or, at minimum, discussion sections should be analyzed rather than limiting their analysis to abstracts. Furthermore, future studies could evaluate more efficient modeling algorithms, especially those combining topic modeling with statistical uncertainty quantification, such as conformal prediction.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
10 Trends in stroke-related journals: Examination of publication patterns using topic modeling.

中风相关期刊的趋势：使用主题建模检查出版模式。影响指数 : 2.677
发表时间：Jun 2024 25
来源期刊：J Stroke Cerebrovasc Dis PMID：38412931

DOI：10.1016/j.jstrokecerebrovasdis.2024.107665
文章类型： Journal Article

目的：本研究旨在证明自然语言处理和主题建模的能力，以管理和解释中风研究领域的大量学术出版物。这些工具可以加快文献综述过程，揭示隐藏的主题，跟踪不断上升的研究领域。
方法：我们的研究包括回顾和分析发表在五本著名中风期刊上的文章，即中风，国际中风杂志，欧洲中风杂志,翻译中风研究,和中风和脑血管疾病杂志。团队提取了文档标题，摘要,出版年,和引用计数从Scopus数据库。BERTopic被选为主题建模技术。使用线性回归模型，确定了当前卒中研究趋势。Python3.1用于分析和可视化数据。
结果：在收集的35,779份文件中，26,732人分为30类，用于分析。\"动物模型,\"\"康复,“”和“再灌注治疗”被确定为三个最普遍的主题。线性回归模型确定\“Emboli，髓质和小脑梗死,“和”葡萄糖代谢“作为热门话题，而“脑静脉血栓形成，\"\"他汀类药物,“”和“脑出血”显示出较弱的趋势。
结论：该方法可以帮助研究人员，资助者,和出版商通过记录主题的演变和专业化。这些发现说明了动物模型的重要性，康复研究的扩展，以及再灌注治疗的中心地位。限制包括五期刊上限和对高质量元数据的依赖。
OBJECTIVE: This study aims to demonstrate the capacity of natural language processing and topic modeling to manage and interpret the vast quantities of scholarly publications in the landscape of stroke research. These tools can expedite the literature review process, reveal hidden themes, and track rising research areas.
METHODS: Our study involved reviewing and analyzing articles published in five prestigious stroke journals, namely Stroke, International Journal of Stroke, European Stroke Journal, Translational Stroke Research, and Journal of Stroke and Cerebrovascular Diseases. The team extracted document titles, abstracts, publication years, and citation counts from the Scopus database. BERTopic was chosen as the topic modeling technique. Using linear regression models, current stroke research trends were identified. Python 3.1 was used to analyze and visualize data.
RESULTS: Out of the 35,779 documents collected, 26,732 were classified into 30 categories and used for analysis. \"Animal Models,\" \"Rehabilitation,\" and \"Reperfusion Therapy\" were identified as the three most prevalent topics. Linear regression models identified \"Emboli,\" \"Medullary and Cerebellar Infarcts,\" and \"Glucose Metabolism\" as trending topics, whereas \"Cerebral Venous Thrombosis,\" \"Statins,\" and \"Intracerebral Hemorrhage\" demonstrated a weaker trend.
CONCLUSIONS: The methodology can assist researchers, funders, and publishers by documenting the evolution and specialization of topics. The findings illustrate the significance of animal models, the expansion of rehabilitation research, and the centrality of reperfusion therapy. Limitations include a five-journal cap and a reliance on high-quality metadata.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文

BERTopic 关注

1 Experiences of Alzheimer's disease and related dementia family caregivers on Reddit communities: A topic modeling and sentiment analysis.

2 Cluster-Based BERTopic Modeling on Swedish COVID-19 Vaccine Posts.

3 Discovering patterns and trends in customer service technologies patents using large language model.

4 Mpox Discourse on Twitter by Sexual Minority Men and Gender-Diverse Individuals: Infodemiological Study Using BERTopic.

5 Exploring climate change discourse on social media and blogs using a topic modeling analysis.

6 Topic Modeling Analysis of Chinese Medicine Literature on Gastroesophageal Reflux Disease: Insights into Potential Treatment.

7 Comparing Open-Access Database and Traditional Intensive Care Studies Using Machine Learning: Bibliometric Analysis Study.

8 Public perception on active aging after COVID-19: an unsupervised machine learning analysis of 44,343 posts.

9 Machine Learning-Based Approach for Identifying Research Gaps: COVID-19 as a Case Study.

10 Trends in stroke-related journals: Examination of publication patterns using topic modeling.