Web 刮擦 web scraping-医云文献数字医云科研云海量医学决策数据服务

web scraping 关注

Web 刮擦

文献(36篇)

百科

视频

1 Scale of unregulated international trade in Australian reptiles and amphibians.

澳大利亚爬行动物和两栖动物不受管制的国际贸易规模。影响指数 : 7.563
发表时间：Oct 2024
来源期刊：Conserv Biol PMID：39248765

DOI：10.1111/cobi.14355
文章类型： Journal Article

Reptiles and amphibians are popular in the exotic pet trade, where Australian species are valued for their rarity and uniqueness. Despite a near-complete ban on the export of Australian wildlife, smuggling and subsequent international trade frequently occur in an unregulated and unmonitored manner. In 2022, Australia listed over 100 squamates in Appendix III of the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) to better monitor this trade. We investigated current trade and assessed the value of this Australian CITES listing using web-scraping methods to monitor the online pet trade in Australian reptiles and amphibians, with additional data from published papers, trade databases, and seizure records. Despite the export ban, we identified 170 endemic herpetofauna (reptile and amphibian) species in international trade, 33 of which were not recorded previously in the international market, including 6 newly recorded genera. Ninety-two traded species were included in CITES appendices (59 added in 2022), but at least 78 other traded species remained unregulated. Among these, 5 of the 10 traded threatened species were unlisted, and we recommend they be considered for inclusion in CITES Appendix III. We also recommend the listing of all Diplodactylidae genera in Appendix III. Despite this family representing the greatest number of Australian species in trade, only one genus (of 7 traded) was included in the recent CITES amendments. Overall, a large number of Australian reptile and amphibian species are traded internationally and, although we acknowledge the value of Australia\'s recent CITES listing, we recommend the consideration of other taxa for similar inclusion in CITES.
Escala del mercado internacional no regulado de reptiles y anfibios australianos Resumen Los reptiles y anfibios son populares en el comercio de mascotas exóticas, en el que las especies australianas son valoradas por su rareza y singularidad. A pesar de la prohibición casi total de la exportación de fauna silvestre australiana, el contrabando y el comercio internacional posterior se producen con frecuencia de forma no regulada y no supervisada. En 2022, Australia incluyó más de 100 escamosos en el apéndice III de la Convención sobre el Comercio Internacional de Especies Amenazadas de Fauna y Flora Silvestres (CITES) para controlar mejor este comercio. Investigamos el comercio actual y evaluamos el valor de esta inclusión en CITES con métodos de raspado web para monitorear el comercio virtual de reptiles y anfibios australianos como mascotas, con datos adicionales de artículos publicados, bases de datos comerciales y registros de incautaciones. A pesar de la prohibición de las exportaciones, identificamos 170 especies endémicas de herpetofauna (reptiles y anfibios) en el comercio internacional, 33 de las cuales no se habían registrado previamente en el mercado internacional, incluidos 6 géneros registrados recientemente. Noventa y dos especies comercializadas se incluyeron en los apéndices de CITES (59 añadidas en 2022), pero al menos otras 78 especies comercializadas permanecieron sin regular. Entre ellas, cinco de las diez especies amenazadas comercializadas no estaban incluidas y recomendamos que se considere su inclusión en el apéndice III de CITES. También recomendamos la inclusión de todos los géneros de Diplodactylidae en el apéndice III. A pesar de que esta familia representa el mayor número de especies australianas en el comercio, sólo un género (de 7 comercializados) fue incluido en las recientes enmiendas de CITES. En general, un gran número de especies de reptiles y anfibios australianos son objeto de comercio internacional y, aunque reconocemos el valor de la reciente inclusión de Australia en CITES, recomendamos que se consideren otros taxones para su similar inclusión.
爬行动物和两栖动物在外来宠物贸易中很受欢迎, 其中, 澳大利亚的生物因其稀有性和独特性而备受青睐。尽管澳大利亚几乎全面禁止野生动物出口, 但仍频繁出现不受管理和监督的走私及随后发生的国际贸易。2022年, 澳大利亚将100多种有鳞类动物列入《濒危野生动植物种国际贸易公约》 (CITES) 附录III, 以更好地监管相关贸易。本研究采用网络抓取数据的方法监测澳大利亚两栖和爬行动物的在线宠物贸易数据, 并从已发表论文、贸易数据库和罚没记录中补充了更多数据, 以研究当前的贸易情况并评估这些澳大利亚物种列入CITES附录的价值。尽管存在出口禁令, 我们仍在国际贸易中发现了170种特有的两栖和爬行动物, 其中33种以前在国际市场上没有记录, 包含6个新记录的属。有92个贸易物种已被列入CITES附录 (其中59个为2022年新增物种), 但至少有另外78个贸易物种仍未受到监管。其中, 10个濒危的贸易物种中有5个未列入附录, 我们建议考虑将其列入CITES附录III。我们还建议将澳洲蜥虎科 (Diplodactylidae) 所有属都列入附录III。这个科在澳大利亚贸易物种中所占数量最多, 但在最近的CITES修订中只列入了一个属 (共7个贸易物种) 。总体而言, 大量澳大利亚两栖和爬行动物存在国际贸易, 尽管我们承认澳大利亚最近新增CITES附录列入物种的价值, 但我们仍建议考虑将其他类群同样列入CITES附录。【翻译: 胡怡思; 审校: 聂永刚】.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
2 Developing the DIGIFOOD Dashboard to Monitor the Digitalization of Local Food Environments: Interdisciplinary Approach.

开发 DIGIFOOD 仪表板以监测当地食品环境的数字化：跨学科方法。影响指数 : 14.557
发表时间：Aug 2024 13
来源期刊：JMIR Public Health Surveill PMID：39137032

DOI：10.2196/59924
文章类型： Journal Article

背景：在线食品交付服务（OFS）使个人能够从任何可交付的位置方便地获取食品。食物可及性的增加可能对健康或不健康食物的消费产生影响。令人担心的是,先前的研究表明，ODS提供了丰富的能量密集和营养不足的食物，通过交易或折扣大量促销。
目的：在本文中，我们描述了DIGIFOOD仪表板的开发，以监控新南威尔士州当地食品环境的数字化，澳大利亚,由于ODS的扩散。
方法：与一组数据科学家一起，我们使用MicrosoftPowerBI设计了专门构建的仪表板。开发过程包括三个主要阶段：（1）通过网上刮片获取食品网点的数据，(2)数据清洗和处理,和(3)仪表板上的食品出口的可视化。我们还描述了食品店的分类过程，以表征当地的健康，在线,和混合食物环境。这些类别包括外卖特许经营，独立外卖，独立的餐馆和咖啡馆，超市或杂货，面包店,酒精零售商，便利店,和三明治或沙拉店。
结果：迄今为止，DIGIFOOD仪表板在新南威尔士州绘制了36,967个独特的当地食品店（本地访问和从Google地图上抓取）和16,158个独特的在线食品店（在线访问和从UberEats抓取），澳大利亚。2023年，市场领先的ODS在新南威尔士州的1061个独特的郊区或地区运营。悉尼-帕拉马塔地区，新南威尔士州的一个主要城市地区，有28个邮政编码，记录的在线食品店数量最多(n=4221)。相比之下,远西部和奥拉纳地区,新南威尔士州的一个农村地区，只有两个邮政编码，网上可访问的食品店数量最少(n=7)。城市地区似乎是通过在线食品交付可访问的食品网点总数增长最大的地区。在本地和在线食品环境中，很明显，独立餐馆和咖啡馆占食品商店的比例最大，分别为47.2%(17,437/36,967)和51.8%(8369/16,158)，分别。然而,与当地的食物环境相比，线上餐饮环境拥有相对较多的外卖专营权(2734/16,158,16.9%相比3273/36,967,8.9%)和独立外卖网点(2416/16,158,14.9%相比4026/36,967,10.9%)。
结论：DIGIFOOD仪表板利用当前丰富的数据环境来显示和对比本地可访问和在线可访问的食品商店的可用性和健康状况。DIGIFOOD仪表板可以成为区域范围内不断发展的数字食品环境的有用监测工具，并有可能在国家一级扩大规模。仪表板的未来迭代，包括来自其他重要ODS的数据，政策制定者可以用来确定在线和本地获取健康食品有限的高优先领域。
BACKGROUND: Online food delivery services (OFDS) enable individuals to conveniently access foods from any deliverable location. The increased accessibility to foods may have implications on the consumption of healthful or unhealthful foods. Concerningly, previous research suggests that OFDS offer an abundance of energy-dense and nutrient-poor foods, which are heavily promoted through deals or discounts.
OBJECTIVE: In this paper, we describe the development of the DIGIFOOD dashboard to monitor the digitalization of local food environments in New South Wales, Australia, resulting from the proliferation of OFDS.
METHODS: Together with a team of data scientists, we designed a purpose-built dashboard using Microsoft Power BI. The development process involved three main stages: (1) data acquisition of food outlets via web scraping, (2) data cleaning and processing, and (3) visualization of food outlets on the dashboard. We also describe the categorization process of food outlets to characterize the healthfulness of local, online, and hybrid food environments. These categories included takeaway franchises, independent takeaways, independent restaurants and cafes, supermarkets or groceries, bakeries, alcohol retailers, convenience stores, and sandwich or salad shops.
RESULTS: To date, the DIGIFOOD dashboard has mapped 36,967 unique local food outlets (locally accessible and scraped from Google Maps) and 16,158 unique online food outlets (accessible online and scraped from Uber Eats) across New South Wales, Australia. In 2023, the market-leading OFDS operated in 1061 unique suburbs or localities in New South Wales. The Sydney-Parramatta region, a major urban area in New South Wales accounting for 28 postcodes, recorded the highest number of online food outlets (n=4221). In contrast, the Far West and Orana region, a rural area in New South Wales with only 2 postcodes, recorded the lowest number of food outlets accessible online (n=7). Urban areas appeared to have the greatest increase in total food outlets accessible via online food delivery. In both local and online food environments, it was evident that independent restaurants and cafes comprised the largest proportion of food outlets at 47.2% (17,437/36,967) and 51.8% (8369/16,158), respectively. However, compared to local food environments, the online food environment has relatively more takeaway franchises (2734/16,158, 16.9% compared to 3273/36,967, 8.9%) and independent takeaway outlets (2416/16,158, 14.9% compared to 4026/36,967, 10.9%).
CONCLUSIONS: The DIGIFOOD dashboard leverages the current rich data landscape to display and contrast the availability and healthfulness of food outlets that are locally accessible versus accessible online. The DIGIFOOD dashboard can be a useful monitoring tool for the evolving digital food environment at a regional scale and has the potential to be scaled up at a national level. Future iterations of the dashboard, including data from additional prominent OFDS, can be used by policy makers to identify high-priority areas with limited access to healthful foods both online and locally.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
3 Averse to what: Consumer aversion to algorithmic labels, but not their outputs?

反对什么：消费者对算法标签的厌恶，但不是他们的输出？影响指数 : 6.813
发表时间：Aug 2024 29
来源期刊：Curr Opin Psychol PMID：38996629

DOI：10.1016/j.copsyc.2024.101839
文章类型： Journal Article

受重大技术进步的启发，快速增长的研究流探索人类围绕AI工具的信仰和反应，使用算法来模仿人类智能的元素。这些文献主要记录了对这些工具或底层算法的负面反应，通常被称为算法厌恶或，或者，对人类的偏爱。这篇文章提出了第三种解释:人们可能厌恶他们的标签,但赞赏他们的输出。这个观点为我们如何研究人们对算法的反应提供了三个核心见解。研究将受益于(1)仔细考虑人工智能工具的标签，(2)扩大研究范围，包括与这些工具的相互作用，和(3)核算其技术配置。
Inspired by significant technical advancements, a rapidly growing stream of research explores human lay beliefs and reactions surrounding AI tools, which employ algorithms to mimic elements of human intelligence. This literature predominantly documents negative reactions to these tools or the underlying algorithms, often referred to as algorithm aversion or, alternatively, a preference for humans. This article proposes a third interpretation: people may be averse to their labels, but appreciative of their output. This perspective offers three core insights for how we study people\'s reactions to algorithms. Research would benefit from (1) carefully considering the labeling of AI tools, (2) broadening the scope of study to include interactions with these tools, and (3) accounting for their technical configuration.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
4 Web scraping of user-simulated online nutrition information for people with multiple sclerosis.

多发性硬化症患者的用户模拟在线营养信息的网络抓取。影响指数 : 4.808
发表时间：Aug 2024 22
来源期刊：Mult Scler Relat Disord PMID：38959592

DOI：10.1016/j.msard.2024.105746
文章类型： Journal Article

背景：被诊断为多发性硬化症（MS）的人经常寻求通过在线建议来改变他们的饮食，然而，这一建议可能不符合国家饮食指南。这项研究的目的是模拟由MS患者进行的饮食建议的在线搜索并评估内容。据推测，在线为MS推广了各种饮食模式，这些饮食方法可能是矛盾的。
方法：使用Google趋势信息搜索词以及Google和Bing搜索引擎模拟了在线搜索。使用R提取网址。提取营养数据，包括饮食建议，食物,补充剂,和健康专业咨询。使用R.
结果：从49个网站中提取了73个URL，这两个搜索引擎只有14个共同的结果。饮食建议包括整体饮食模式(58个网页，79%），个人食品(55页，75%），和补充(33个网页，45%)。MS最受欢迎的饮食模式是均衡饮食(33条建议，48%），更有可能由非营利组织和健康信息网站(14和17条建议，100%和89%)；生活方式计划网站更有可能推荐限制性饮食(19条建议，100%)(p<0.001)。52%的页面建议咨询健康专业人士，通常是医生或营养师。
结论：均衡饮食是MS在线最推荐的饮食模式，尽管提倡限制性饮食的建议仍然存在。
BACKGROUND: People diagnosed with multiple sclerosis (MS) often seek to modify their diet guided by online advice, however this advice may not align with national dietary guidelines. The aim of this study was to simulate an online search for dietary advice conducted by a person with MS and evaluate the content. It was hypothesised that a variety of eating patterns are promoted for MS online and these dietary approaches can be contradictory.
METHODS: An online search was simulated using Google Trends-informed search terms and Google and Bing search engines. URLs were extracted using R. Nutrition data were extracted including recommendations for diets, foods, supplements, and health professional consultation. Statistical analyses were conducted using R.
RESULTS: 73 URLs from 49 websites were extracted, with only 14 results common to both search engines. Dietary recommendations included overall eating patterns (58 webpages, 79%), individual foods (55 webpages, 75%), and supplements (33 webpages, 45%). The most promoted eating pattern for MS was a balanced diet (33 recommendations, 48%), more likely by nonprofit organisations and health information websites (14 and 17 recommendations, 100% and 89%); lifestyle program websites were more likely to recommend restrictive diets (19 recommendations, 100%) (p<0.001). 52% pages advised consulting a health professional, most often a doctor or dietitian.
CONCLUSIONS: A balanced diet is the most recommended eating pattern for MS online, though advice promoting restrictive diets persists.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
5 Analysis of science journalism reveals gender and regional disparities in coverage.

对科学新闻的分析揭示了覆盖范围的性别和地区差异。影响指数 : 8.713
发表时间：May 2024 28
来源期刊：Elife PMID：38804191

DOI：10.7554/eLife.84855
文章类型： Journal Article

科学新闻是公众了解科学发现并从中受益的重要途径。这种新闻塑造了公众对当前科学状况的看法，并使专家合法化。记者只能引用和引用有限的消息来源，他们可能会在研究中发现谁，包括其他科学家的建议。任何一个过程中的偏见都可能影响谁被识别并最终被纳入来源。为了研究科学新闻中的潜在偏见，我们分析了《自然》发表的22,001篇非研究文章，并将这些文章与《自然》发表的研究文章进行了预测的性别和姓名来源的比较。我们提取了引用的作者的名字和引用的演讲者的名字。虽然一篇文章中的引用和引用并不反映整个信息收集过程，它们可以提供对可见来源的人口统计学的洞察。然后，我们预测了被引用作者和演讲者的性别和姓名来源。我们将文章与比较器组进行了比较，比较器组由《自然》的主要研究文章中的第一位和最后一位作者以及同一时期的SpringerNature文章的子集组成。在我们的分析中,我们在自然科学新闻中发现了引用男性的倾向。然而,在学术出版中，报价比作者率更快地趋向于平等代表性。《自然》语录中的性别差异取决于文章类型。在摘录和期刊引文中，我们发现具有预测的凯尔特人/英语起源的名称存在明显的过度表示，而具有预测的东亚起源的名称存在不足，但在引文中却有所减弱。
Science journalism is a critical way for the public to learn about and benefit from scientific findings. Such journalism shapes the public\'s view of the current state of science and legitimizes experts. Journalists can only cite and quote a limited number of sources, who they may discover in their research, including recommendations by other scientists. Biases in either process may influence who is identified and ultimately included as a source. To examine potential biases in science journalism, we analyzed 22,001 non-research articles published by Nature and compared these with Nature-published research articles with respect to predicted gender and name origin. We extracted cited authors\' names and those of quoted speakers. While citations and quotations within a piece do not reflect the entire information-gathering process, they can provide insight into the demographics of visible sources. We then predicted gender and name origin of the cited authors and speakers. We compared articles with a comparator set made up of first and last authors within primary research articles in Nature and a subset of Springer Nature articles in the same time period. In our analysis, we found a skew toward quoting men in Nature science journalism. However, quotation is trending toward equal representation at a faster rate than authorship rates in academic publishing. Gender disparity in Nature quotes was dependent on the article type. We found a significant over-representation of names with predicted Celtic/English origin and under-representation of names with a predicted East Asian origin in both in extracted quotes and journal citations but dampened in citations.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
6 How to harness the power of web scraping for medical and surgical research: An application in estimating international collaboration.

如何利用刮网的力量进行医学和外科研究：在评估国际合作中的应用。影响指数 : 3.282
发表时间：06 2024 24
来源期刊：World J Surg PMID：38794809

DOI：10.1002/wjs.12220
文章类型： Journal Article

通过对其革命性应用和深远影响的全面分析，刮网在外科研究中的变革潜力现在已经触手可及。这份手稿揭示了网页抓取在推动创新中的关键作用，能够更有效地管理人力资本动态，并提高患者在手术领域的治疗效果。作为一个例子,我们展示了网络抓取如何揭示外科研究中国际合作的见解，揭示了发达国家和发展中国家外科医生之间有限的合作。
The transformative potential of web scraping in surgical research through a comprehensive analysis of its revolutionary applications and profound impact is now within reach. This manuscript unveils the pivotal role of web scraping in driving innovation, enabling more effective management of human capital dynamics, and enhancing patient outcomes in the surgical field. As an example, we demonstrate how web scraping can uncover insights into international collaboration in surgery research revealing limited collaboration between surgeons in developed and developing countries.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
7 Extracting big data from the internet to support the development of a new patient-reported outcome measure for breast implant illness: a proof of concept study.

从互联网上提取大数据以支持开发新的患者报告的乳房植入物疾病结果度量：概念验证研究。影响指数 : 3.44
发表时间：Jul 2024 21
来源期刊：Qual Life Res PMID：38771557

DOI：10.1007/s11136-024-03672-6
文章类型： Journal Article

目的：有健康状况的人经常使用在线患者论坛来分享他们的经验。这些患者数据是免费提供的，很少用于患者报告的结果（PRO）研究。Web刮擦，网页数据的自动识别和编码，可用于收集患者经验进行PRO研究。这项研究的目的是评估使用网状刮擦来支持开发针对乳房植入物疾病（BII）的新PRO措施的可行性。
方法：在与两位著名的BII倡导领导者进行咨询后，选择了九个公开的BII特定网络论坛。PythonSelenium和Pandas软件包用于将单个帖子/评论中的去标识文本自动提取到电子表格中。使用逐行方法对数据进行编码，并使用恒定比较来创建顶级域和子域。
结果：6362个独特的代码被识别并组织成四个顶级的信息需求领域，症状体验，BII对生命的影响，和护理经验。女性的信息需求包括在乳房植入手术前寻求/分享信息，乳房植入手术后，在考虑外植体手术时，和外植体手术后。女性通常描述的症状包括疲劳，脑雾,肌肉骨骼症状.许多评论描述了BII对日常活动和社会心理健康的影响。最后,一些评论描述了负面护理经验和与向提供者倡导自己相关的经验。
结论：这项概念验证研究证明了采用刮网作为一种具有成本效益的方法的可行性，了解BII女性经历的有效方法。这些数据将用于为BII特定的PROM的开发提供信息。
OBJECTIVE: Individuals with health conditions often use online patient forums to share their experiences. These patient data are freely available and have rarely been used in patient-reported outcomes (PRO) research. Web scraping, the automated identification and coding of webpage data, can be employed to collect patient experiences for PRO research. The objective of this study was to assess the feasibility of using web scraping to support the development of a new PRO measure for breast implant illness (BII).
METHODS: Nine publicly available BII-specific web forums were chosen post-consultation with two prominent BII advocacy leaders. The Python Selenium and Pandas packages were used to automate extraction of de-identified text from the individual posts/comments into a spreadsheet. Data were coded using a line-by-line approach and constant comparison was used to create top-level domains and sub-domains.
RESULTS: 6362 unique codes were identified and organized into four top-level domains of information needs, symptom experiences, life impact of BII, and care experiences. Information needs of women included seeking/sharing information pre-breast implant surgery, post-breast implant surgery, while contemplating explant surgery, and post-explant surgery. Symptoms commonly described by women included fatigue, brain fog, and musculoskeletal symptoms. Many comments described BII\'s impact on daily activities and psychosocial wellbeing. Lastly, some comments described negative care experiences and experiences related to advocating for themselves to providers.
CONCLUSIONS: This proof-of-concept study demonstrated the feasibility of employing web scraping as a cost-effective, efficient method to understand the experiences of women with BII. These data will be used to inform the development of a BII-specific PROM.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

求助全文
8 Generating Contextual Variables From Web-Based Data for Health Research: Tutorial on Web Scraping, Text Mining, and Spatial Overlay Analysis.

从基于 Web 的数据生成上下文变量以进行健康研究：关于 Web 刮擦的教程，文本挖掘,和空间叠加分析。影响指数 : 14.557
发表时间：Jan 2024 8
来源期刊：JMIR Public Health Surveill PMID：38190245

DOI：10.2196/50379
文章类型： Journal Article

背景：捕获划定的地理或管辖区域特征的上下文变量对于健康和社会研究至关重要。然而,在缺乏监测系统或公共人口普查数据的情况下，获取具有上下文级别数据的数据集可能具有挑战性。
目的：我们描述并实现了一种8步方法，该方法结合了网页抓取，文本挖掘,和空间叠加分析（WeTMS）将来自政府网站的大量文本数据转换为包含辖区上下文数据的可分析数据集。
方法：本教程描述了该方法，并为健康和社会研究人员的应用提供了资源。我们使用这种方法来创建旨在增强老年人社交联系的健康资产数据集(例如，从2015年到2022年，加泰罗尼亚374个卫生管辖区的活动和资源，如步行团体和高级俱乐部）。作为国家公共卫生计划的一部分，这些资产由来自各种健康和非健康组织的当地利益相关者在基于网络的政府平台上注册。步骤1至3涉及定义感兴趣的变量，识别数据源，并使用Python从链接到该平台的50,000个网站中提取信息。步骤4到6包括预处理抓取的文本，定义新的变量，根据社会联系结构对健康资产进行分类，分析标题和资产描述中的词频，创建特定于主题的词典，在R中实现基于规则的分类器，并验证结果。步骤7和8集成空间叠加分析以确定每个资产的地理位置。我们对数据集进行了描述性分析，以报告已识别资产的特征以及跨区域的资产注册模式。
结果：我们从17305个描述健康资产的网站中识别并提取了数据。活动和资源的标题和描述包含12,560和7301个独特的单词，分别。在应用我们的分类器和空间分析算法后，我们生成了2个数据集,其中包含9546项健康资产(5022项活动和4524项资源),这些数据集具有增强老年人社会联系的潜力.来自318个健康管辖区的利益相关者在2015年7月至2022年12月之间在平台上注册了资产。分类算法与验证的数据集之间的一致性在变量之间的范围从62.02％到99.47％。休闲和技能发展活动最普遍（1844/5022，36.72％）。休闲和文化协会，比如老年人的社交俱乐部，是最常见的资源（878/4524，19.41%）。卫生资产登记因地区而异，范围在0到263项活动和0到265项资源之间。
结论：WeTMS的顺序使用为从互联网文本数据中生成包含上下文级别变量的数据集提供了一种强大的方法。这项研究可以指导健康和社会研究人员有效地生成包含上下文变量的现成分析数据集。
BACKGROUND: Contextual variables that capture the characteristics of delimited geographic or jurisdictional areas are vital for health and social research. However, obtaining data sets with contextual-level data can be challenging in the absence of monitoring systems or public census data.
OBJECTIVE: We describe and implement an 8-step method that combines web scraping, text mining, and spatial overlay analysis (WeTMS) to transform extensive text data from government websites into analyzable data sets containing contextual data for jurisdictional areas.
METHODS: This tutorial describes the method and provides resources for its application by health and social researchers. We used this method to create data sets of health assets aimed at enhancing older adults\' social connections (eg, activities and resources such as walking groups and senior clubs) across the 374 health jurisdictions in Catalonia from 2015 to 2022. These assets are registered on a web-based government platform by local stakeholders from various health and nonhealth organizations as part of a national public health program. Steps 1 to 3 involved defining the variables of interest, identifying data sources, and using Python to extract information from 50,000 websites linked to the platform. Steps 4 to 6 comprised preprocessing the scraped text, defining new variables to classify health assets based on social connection constructs, analyzing word frequencies in titles and descriptions of the assets, creating topic-specific dictionaries, implementing a rule-based classifier in R, and verifying the results. Steps 7 and 8 integrate the spatial overlay analysis to determine the geographic location of each asset. We conducted a descriptive analysis of the data sets to report the characteristics of the assets identified and the patterns of asset registrations across areas.
RESULTS: We identified and extracted data from 17,305 websites describing health assets. The titles and descriptions of the activities and resources contained 12,560 and 7301 unique words, respectively. After applying our classifier and spatial analysis algorithm, we generated 2 data sets containing 9546 health assets (5022 activities and 4524 resources) with the potential to enhance social connections among older adults. Stakeholders from 318 health jurisdictions registered identified assets on the platform between July 2015 and December 2022. The agreement rate between the classification algorithm and verified data sets ranged from 62.02% to 99.47% across variables. Leisure and skill development activities were the most prevalent (1844/5022, 36.72%). Leisure and cultural associations, such as social clubs for older adults, were the most common resources (878/4524, 19.41%). Health asset registration varied across areas, ranging between 0 and 263 activities and 0 and 265 resources.
CONCLUSIONS: The sequential use of WeTMS offers a robust method for generating data sets containing contextual-level variables from internet text data. This study can guide health and social researchers in efficiently generating ready-to-analyze data sets containing contextual variables.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
9 Exploring Online Crowdfunding for Cancer-Related Costs Among LGBTQ+ (Lesbian, Gay, Bisexual, Transgender, Queer, Plus) Cancer Survivors: Integration of Community-Engaged and Technology-Based Methodologies.

探索 LGBTQ + 中癌症相关费用的在线众筹 (女同性恋，同性恋,双性恋,变性人,酷儿,加) 癌症幸存者：社区参与和基于技术的方法的整合。影响指数 : 暂无
发表时间：Oct 2023 30
来源期刊：JMIR Cancer PMID：37902829

DOI：10.2196/51605
文章类型： Journal Article

背景：癌症幸存者经常经历癌症相关的经济负担。女同性恋的程度，同性恋,双性恋,变性人,酷儿,此外，(LGBTQ+)人群经历过与癌症相关的成本应对行为，比如众筹，这在很大程度上是未知的。由于缺乏性取向和性别认同数据收集和社会污名化。网络抓取以前曾被用来评估在线众筹中的不平等现象，但是这些方法本身并不能充分吸引面临不平等的人群。
目的：我们描述了整合基于技术和社区参与方法的方法学过程，以通过在线众筹探索LGBTQ个体中癌症的经济负担。
方法：以LGBTQ+社区为中心，我们遵循社区参与指南，成立了LGBTQ+癌症幸存者研究咨询委员会(SAB)，看护者,以及参与研究每一步的专业人士。通过每季度SAB会议出席率和参与度调查来跟踪SAB成员参与度。然后，我们使用网络抓取方法提取了一组在线众筹活动的数据集。研究小组遵循基于技术和社区参与的集成过程，以开发和完善术语词典进行分析。开发和完善了术语词典，以识别与癌症和LGBTQ相关的众筹活动。
结果：根据会议出勤率指标，咨询委员会的参与度很高，会议参与,和匿名董事会反馈。与SAB合作，术语字典是反复编辑和完善的。LGBTQ+术语词典是由研究小组开发的，而癌症术语词典是从现有词典中提炼出来的。顾问委员会和分析团队成员根据术语字典手动编码，并进行质量检查，直到使用成对协议实现对正确分类的高置信度。通过手动编码和质量检查的每个阶段，与单独的分析团队相比，咨询委员会确定了更多的错误分类活动。在完善LGBTQ+术语词典时，分析团队发现11.8%的错误分类,而SAB发现20.7%的错误分类.一旦每个术语词典定稿，LGBTQ+术语词典达成了95%的成对协议，而癌症术语字典导致89.2%的成对一致性。
结论：通过整合社区参与和基于技术的方法开发的分类工具更准确，因为以公平为基础的方法将LGBTQ+的声音和他们的生活经验居中。该范例表明，整合社区参与和基于技术的方法来研究不平等现象是非常可行的，并且可以应用于LGBTQ财务负担研究之外。
BACKGROUND: Cancer survivors frequently experience cancer-related financial burdens. The extent to which Lesbian, Gay, Bisexual, Transgender, Queer, Plus (LGBTQ+) populations experience cancer-related cost-coping behaviors such as crowdfunding is largely unknown, owing to a lack of sexual orientation and gender identity data collection and social stigma. Web-scraping has previously been used to evaluate inequities in online crowdfunding, but these methods alone do not adequately engage populations facing inequities.
OBJECTIVE: We describe the methodological process of integrating technology-based and community-engaged methods to explore the financial burden of cancer among LGBTQ+ individuals via online crowdfunding.
METHODS: To center the LGBTQ+ community, we followed community engagement guidelines by forming a study advisory board (SAB) of LGBTQ+ cancer survivors, caregivers, and professionals who were involved in every step of the research. SAB member engagement was tracked through quarterly SAB meeting attendance and an engagement survey. We then used web-scraping methods to extract a data set of online crowdfunding campaigns. The study team followed an integrated technology-based and community-engaged process to develop and refine term dictionaries for analyses. Term dictionaries were developed and refined in order to identify crowdfunding campaigns that were cancer- and LGBTQ+-related.
RESULTS: Advisory board engagement was high according to metrics of meeting attendance, meeting participation, and anonymous board feedback. In collaboration with the SAB, the term dictionaries were iteratively edited and refined. The LGBTQ+ term dictionary was developed by the study team, while the cancer term dictionary was refined from an existing dictionary. The advisory board and analytic team members manually coded against the term dictionary and performed quality checks until high confidence in correct classification was achieved using pairwise agreement. Through each phase of manual coding and quality checks, the advisory board identified more misclassified campaigns than the analytic team alone. When refining the LGBTQ+ term dictionary, the analytic team identified 11.8% misclassification while the SAB identified 20.7% misclassification. Once each term dictionary was finalized, the LGBTQ+ term dictionary resulted in a 95% pairwise agreement, while the cancer term dictionary resulted in an 89.2% pairwise agreement.
CONCLUSIONS: The classification tools developed by integrating community-engaged and technology-based methods were more accurate because of the equity-based approach of centering LGBTQ+ voices and their lived experiences. This exemplar suggests integrating community-engaged and technology-based methods to study inequities is highly feasible and has applications beyond LGBTQ+ financial burden research.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)
10 Observational study of population level disparities in food costs in 2021 in Canada: A digital national nutritious food basket (dNNFB).

加拿大 2021 年食品成本人口水平差异的观察研究：数字国家营养食品篮（ dNNFB ）。影响指数 : 2.813
发表时间：Apr 2023
来源期刊：Prev Med Rep PMID：36910505

DOI：10.1016/j.pmedr.2023.102162
文章类型： Journal Article

这项工作的目的是评估在加拿大各地采用具有全国代表性和高度分类的食品成本计算措施的可行性和效果。通过将网刮技术新颖地应用于国家营养食品篮（NNFB）的方法。Further,这项研究检验了以下假设:产品匹配数字NNFB(dNNFB)与现有市场篮子措施相关,并量化了成本差异.这是一项观察性横断面研究，使用2021年11月收集的网上抓取的食品价格数据。食品价格数据是从加拿大大部分Loblaw的横幅上收集的，最终获得了来自11个零售横幅的751家商店的最终商店样本。商店遍布加拿大统计局的所有五个地区，包括除努纳武特以外的所有省份和地区。计算了商店级dNNFB成本，按年龄-性别群体调整，按地理区域和横幅进行总结。然后将dNNFB成本与现有国家统计局的估计（参考家庭的市场篮子度量阈值）进行比较。dNFB成本在全国范围内差异很大，各地区差异显著，存储级，和年龄性别群体特征。与报告的国家统计数据相比，在全国相应的次国家地理的每一次比较中，我们的估计都超过了国家市场篮子的衡量标准，相关性从0.49到0.78不等，取决于汇总比较器。食品价格数据的数字收集是市场篮子成本核算的可行策略。我们的研究结果表明，我们可能经常低估了食品通胀对消费者的影响，特别是那些限制在某些食物环境中的食物。
The aim of this work was to assess the feasibility and effect of applying a nationally representative and highly disaggregated food costing measure across Canada, through the novel application of web-scraping technology to the methods of the National Nutritious Food Basket (NNFB). Further, this study tested the hypothesis that a product-matched digital NNFB (dNNFB) correlates with existing market basket measures and quantified any differences in costs. This was an observational cross-sectional study using web scraped food price data collected in November 2021. Food price data was collected from the majority of Loblaw\'s banners across Canada, resulting in a final store sample of 751 stores sourced from 11 retail banners. Stores were located across all five Statistics Canada regions, including all provinces and territories with the exception of Nunavut. Store-level dNNFB costs were computed, adjusted by age-sex group, and summarized by geographic region and banner. dNNFB costs were then compared with existing national statistics office estimates (Market Basket Measure thresholds for reference families). dNNFB costs varied widely across the country, with notable differences by regional, store-level, and age-sex group characteristics. When compared to reported national statistics, our estimates exceeded the national market basket measure in every comparison in corresponding sub-national geography across the country, with correlation varying from 0.49 to 0.78 dependent on summary comparator. Digital collection of food price data was a feasible strategy for market basket costing. Our findings suggest we may be routinely underestimating the impact of food inflation for consumers, particularly those restricted to certain food environments.

导出

Endnote Noteexpress

更多引用

收藏

翻译标题摘要

我要上传

PDF(Pubmed)

web scraping 关注

1 Scale of unregulated international trade in Australian reptiles and amphibians.

2 Developing the DIGIFOOD Dashboard to Monitor the Digitalization of Local Food Environments: Interdisciplinary Approach.

3 Averse to what: Consumer aversion to algorithmic labels, but not their outputs?

4 Web scraping of user-simulated online nutrition information for people with multiple sclerosis.

5 Analysis of science journalism reveals gender and regional disparities in coverage.

6 How to harness the power of web scraping for medical and surgical research: An application in estimating international collaboration.

7 Extracting big data from the internet to support the development of a new patient-reported outcome measure for breast implant illness: a proof of concept study.

8 Generating Contextual Variables From Web-Based Data for Health Research: Tutorial on Web Scraping, Text Mining, and Spatial Overlay Analysis.

9 Exploring Online Crowdfunding for Cancer-Related Costs Among LGBTQ+ (Lesbian, Gay, Bisexual, Transgender, Queer, Plus) Cancer Survivors: Integration of Community-Engaged and Technology-Based Methodologies.

10 Observational study of population level disparities in food costs in 2021 in Canada: A digital national nutritious food basket (dNNFB).