Data mining

数据挖掘
  • 文章类型: Journal Article
    背景:石油行业的工作场所事故会对人们造成灾难性的损害,property,和环境。该领域的早期研究表明,大多数事故报告信息以非结构化文本格式提供。事故数据分析的常规技术耗时且严重依赖专家的学科知识,经验,和判断。需要开发基于机器学习的决策支持系统,以分析由于缺乏适当的方法而经常被忽视的大量非结构化文本数据。
    方法:为了解决文献中的这一差距,我们提出了一种混合方法,该方法使用改进的文本挖掘技术,并结合非偏见群体决策框架,将风险因素的客观权重(基于文本挖掘)和主观权重(基于专家意见)的输出进行优先级排序。基于语境词嵌入模型和术语频率,我们提取了5个重要的危险因素集群,包括32个以上的危险子因素.联系了石油行业的异质专家和员工小组,以获取他们对提取的风险因素的意见,并使用最佳-最差的方法将他们的意见转换为权重。
    结论:我们提出的框架的适用性是在根据印度石油工业发布的事故数据汇编的数据上进行的测试。我们的框架可以扩展到任何行业的事故数据,减少分析时间,提高风险因素分类和优先排序的准确性。
    BACKGROUND: Workplace accidents in the petroleum industry can cause catastrophic damage to people, property, and the environment. Earlier studies in this domain indicate that the majority of the accident report information is available in unstructured text format. Conventional techniques for the analysis of accident data are time-consuming and heavily dependent on experts\' subject knowledge, experience, and judgment. There is a need to develop a machine learning-based decision support system to analyze the vast amounts of unstructured text data that are frequently overlooked due to a lack of appropriate methodology.
    METHODS: To address this gap in the literature, we propose a hybrid methodology that uses improved text-mining techniques combined with an un-bias group decision-making framework to combine the output of objective weights (based on text mining) and subjective weights (based on expert opinion) of risk factors to prioritize them. Based on the contextual word embedding models and term frequencies, we extracted five important clusters of risk factors comprising more than 32 risk sub-factors. A heterogeneous group of experts and employees in the petroleum industry were contacted to obtain their opinions on the extracted risk factors, and the best-worst method was used to convert their opinions to weights.
    CONCLUSIONS: The applicability of our proposed framework was tested on the data compiled from the accident data released by the petroleum industries in India. Our framework can be extended to accident data from any industry, to reduce analysis time and improve the accuracy in classifying and prioritizing risk factors.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    数字化实现更健康生活的基础是技术的安全开发和使用。数字健康安全的实践已经出现了由于技术失败而造成的患者伤害。该研究旨在调查如何大规模采用和实施数字健康安全指南。数据是通过在线调查收集的,半结构化面试,焦点小组,文件审查,和人工制品的数据挖掘。这项研究的结果捕获了来自澳大利亚的新兴实践,提供了对实践问题的见解,患者安全实践,安全文化,和社会技术因素。研究结果有助于更好地理解平衡数字创新与患者安全的复杂性。该研究提出的四项建议和提供的逻辑模型将支持受众采取更安全的数字健康生态行动。
    A foundation for digitally enabling healthier living is the safe development and use of technology. The practice of digital health safety has emerged from patient harm attributed to failing technologies. The study aimed to investigate how to adopt and implement digital health safety guidelines at scale. Data was collected through an online survey, semi-structured interviews, focus groups, document review, and data mining of artefacts. The findings of this study capture the emerging practice from Australia in a way that offers insights into the problem of practice, patient safety practice, safety culture, and socio-technical factors. The research findings contribute to better understanding of the complexities of balancing digital innovation with patient safety. The four recommendations from the study and the provision of a logic model will support the audience to implement actions toward a safer digital health ecology.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    竞争前数据共享可以为制药行业提供显著的好处,通过更明智的测试策略和通过汇集数据获得的知识,减少新药上市所需的时间和成本。如果共享了足够的数据并可以进行共同分析,那么它还可以提供减少动物使用和改善毒理学影响的计算机预测的潜力。通过应用公平指导原则,可以进一步增强数据共享的好处。减少策展花费的时间,转换和聚合数据集,并允许更多时间进行数据挖掘和分析。我们希望通过描述作为通过综合知识管理(eTRANSAFE)项目增强横向安全性评估的一部分的经验教训,来促进其他组织和计划的数据共享。创新药物倡议(IMI)合作伙伴关系,旨在将公开可用的数据源与制药组织捐赠的专有临床前和临床数据进行整合。描述了促进信任和克服数据共享的非技术障碍的方法,例如法律和IPR(知识产权),包括制药组织通常期望得到满足的安全要求。我们同意制药合作伙伴之间就决定标准达成的共识,这些标准将被纳入内部审批程序,以决定是否可以共享数据。我们还报告了在特定数据字段上达成的共识,这些数据将被排除在敏感的临床前安全性和药理学数据的共享之外,否则将无法共享。
    Pre-competitive data sharing can offer the pharmaceutical industry significant benefits in terms of reducing the time and costs involved in getting a new drug to market through more informed testing strategies and knowledge gained by pooling data. If sufficient data is shared and can be co-analyzed, then it can also offer the potential for reduced animal usage and improvements in the in silico prediction of toxicological effects. Data sharing benefits can be further enhanced by applying the FAIR Guiding Principles, reducing time spent curating, transforming and aggregating datasets and allowing more time for data mining and analysis. We hope to facilitate data sharing by other organizations and initiatives by describing lessons learned as part of the Enhancing TRANslational SAFEty Assessment through Integrative Knowledge Management (eTRANSAFE) project, an Innovative Medicines Initiative (IMI) partnership which aims to integrate publicly available data sources with proprietary preclinical and clinical data donated by pharmaceutical organizations. Methods to foster trust and overcome non-technical barriers to data sharing such as legal and IPR (intellectual property rights) are described, including the security requirements that pharmaceutical organizations generally expect to be met. We share the consensus achieved among pharmaceutical partners on decision criteria to be included in internal clearance pro­cedures used to decide if data can be shared. We also report on the consensus achieved on specific data fields to be excluded from sharing for sensitive preclinical safety and pharmacology data that could otherwise not be shared.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    OBJECTIVE: This study explored cardiothoracic surgeons\' perceptions of health services research and practice guidelines, particularly how both influence providers\' clinical decision-making.
    METHODS: A trained interviewer conducted open-ended, semistructured phone interviews with cardiothoracic surgeons across the United States. The interviews explored surgeons\' experiences with lung cancer treatment and their perceptions of health services research and guidelines. Researchers coded the transcribed interviews using conventional content analysis. Interviews continued until thematic saturation was reached.
    RESULTS: The 27 surgeons interviewed mostly were general thoracic surgeons (23/27) who attend tumor board weekly (21/27). Five themes relating to physician perceptions of health services research and guidelines emerged. Databases analyses\' inherent selection bias and perceived deficit of pertinent clinical variables made providers skeptical of using these studies as primary decision drivers; however, providers thought that database analyses are useful to supplement other data and drive future research. Likewise, providers generally felt that although guidelines provide a useful framework, they often have difficulty applying guidelines to individual patients. An analysis of provider characteristics revealed that younger physicians in practice for fewer years appeared more likely to report using guidelines, and physicians who were aged 50 years or more and not purely academic surgeons appeared to find database analyses less impactful.
    CONCLUSIONS: Health services research, including database analyses, comprise much of the surgical literature; however, this study suggests that perceptions of database analyses and guidelines are mixed and questions whether thoracic surgeons routinely use either to inform their decisions. Researchers must address how to present compelling data to influence clinical practice.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    数字病理学的出现为组织病理学开辟了新的视野。人工智能(AI)算法能够对数字化幻灯片进行操作,以帮助病理学家完成不同的任务。而人工智能的分类和分割方法对图像分析有明显的好处,图像搜索代表了计算病理学的根本转变。将新患者的病理与已经诊断和策划的病例相匹配,为病理学家提供了一种新的方法,可以通过对类似病例的视觉检查和计算多数投票来建立共识来提高诊断准确性。在这项研究中,我们报告了搜索最大的公共数据库(癌症基因组图谱,TCGA)来自近11,000名患者的全幻灯片图像。我们成功地索引并搜索了近30,000张高分辨率数字化幻灯片,这些幻灯片构成了16TB的数据,其中包括2000万个1000×1000像素的图像块。TCGA图像数据库涵盖25个解剖部位,包含32种癌症亚型。高性能存储和GPU功率用于实验。通过保守的“多数投票”对结果进行了评估,以通过垂直搜索建立亚型诊断的共识,并证明了两种冷冻切片的高准确性值(例如,膀胱尿路上皮癌93%,肾肾透明细胞癌97%,和卵巢浆液性囊腺癌99%)和永久性组织病理学切片(例如,前列腺腺癌98%,皮肤皮肤黑色素瘤99%,和胸腺瘤100%)。这项验证研究的关键发现是,如果每种癌症亚型都有足够多的可搜索病例,则计算共识似乎可以用于诊断。
    The emergence of digital pathology has opened new horizons for histopathology. Artificial intelligence (AI) algorithms are able to operate on digitized slides to assist pathologists with different tasks. Whereas AI-involving classification and segmentation methods have obvious benefits for image analysis, image search represents a fundamental shift in computational pathology. Matching the pathology of new patients with already diagnosed and curated cases offers pathologists a new approach to improve diagnostic accuracy through visual inspection of similar cases and computational majority vote for consensus building. In this study, we report the results from searching the largest public repository (The Cancer Genome Atlas, TCGA) of whole-slide images from almost 11,000 patients. We successfully indexed and searched almost 30,000 high-resolution digitized slides constituting 16 terabytes of data comprised of 20 million 1000 × 1000 pixels image patches. The TCGA image database covers 25 anatomic sites and contains 32 cancer subtypes. High-performance storage and GPU power were employed for experimentation. The results were assessed with conservative \"majority voting\" to build consensus for subtype diagnosis through vertical search and demonstrated high accuracy values for both frozen section slides (e.g., bladder urothelial carcinoma 93%, kidney renal clear cell carcinoma 97%, and ovarian serous cystadenocarcinoma 99%) and permanent histopathology slides (e.g., prostate adenocarcinoma 98%, skin cutaneous melanoma 99%, and thymoma 100%). The key finding of this validation study was that computational consensus appears to be possible for rendering diagnoses if a sufficiently large number of searchable cases are available for each cancer subtype.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Cardiopulmonary resuscitation (CPR) guidelines have been updated every 5 years since 2000. Significant changes have been made in each update, and every time a guideline is changed, the instructors of each country that ratify the American Heart Association (AHA) must review the contents of the revised guideline to understand the changes made in the concept of CPR. The purpose of this study was to use a computerized data mining method to identify and characterize the changes in the key concepts of the AHA-Basic Life Support (BLS) updates between 2000 and 2015.
    We analyzed the guidelines of the AHA-BLS provider manual of 2000, 2005, 2010, and 2015 using a computerized data mining method and attempted to identify the changes in keywords along with changes in the guideline.
    In particular, the 2000 guideline has focused on the detailed BLS technique of an individual health care provider, whereas the 2005 and 2010 guidelines have focused on changing the ratio of chest compressions and breathing and changing the BLS sequence, respectively. In the most recent 2015 guideline, the CPR team was the central topic. We observed that as the guidelines were updated over the years, keywords related to CPR and automated external defibrillators (AED) associated with co-occurrence network continued to appear.
    Analysis revealed that keywords related to CPR and AED associated with the co-occurrence network continued to appear. We believe that the results of this study will ultimately contribute to optimizing AHA\'s educational strategies for health care providers.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

  • 文章类型: Journal Article
    The Google Books Ngram Viewer (Google Ngram) is a search engine that charts word frequencies from a large corpus of books and thereby allows for the examination of cultural change as it is reflected in books. While the tool\'s massive corpus of data (about 8 million books or 6% of all books ever published) has been used in various scientific studies, concerns about the accuracy of results have simultaneously emerged. This paper reviews the literature and serves as a guideline for improving Google Ngram studies by suggesting five methodological procedures suited to increase the reliability of results. In particular, we recommend the use of (I) different language corpora, (II) cross-checks on different corpora from the same language, (III) word inflections, (IV) synonyms, and (V) a standardization procedure that accounts for both the influx of data and unequal weights of word frequencies. Further, we outline how to combine these procedures and address the risk of potential biases arising from censorship and propaganda. As an example of the proposed procedures, we examine the cross-cultural expression of religion via religious terms for the years 1900 to 2000. Special emphasis is placed on the situation during World War II. In line with the strand of literature that emphasizes the decline of collectivistic values, our results suggest an overall decrease of religion\'s importance. However, religion re-gains importance during times of crisis such as World War II. By comparing the results obtained through the different methods, we illustrate that applying and particularly combining our suggested procedures increase the reliability of results and prevents authors from deriving wrong assumptions.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    OBJECTIVE: We investigated the impact of clinical guidelines for the management of minor head injury on utilization and diagnostic yield of head CT over two decades.
    METHODS: Retrospective before-after study using multiple electronic health record data sources. Natural language processing algorithms were developed to rapidly extract indication, Glasgow Coma Scale, and CT outcome from clinical records, creating two datasets: one based on all head injury CTs from 1997 to 2009 (n = 9109), for which diagnostic yield of intracranial traumatic findings was calculated. The second dataset (2009-2014) used both CT reports and clinical notes from the emergency department, enabling selection of minor head injury patients (n = 4554) and calculation of both CT utilization and diagnostic yield. Additionally, we tested for significant changes in utilization and yield after guideline implementation in 2011, using chi-square statistics and logistic regression.
    RESULTS: The yield was initially nearly 60%, but in a decreasing trend dropped below 20% when CT became routinely used for head trauma. Between 2009 and 2014, of 4554 minor head injury patients overall, 85.4% underwent head CT. After guideline implementation in 2011, CT utilization significantly increased from 81.6 to 87.6% (p = 7 × 10-7), while yield significantly decreased from 12.2 to 9.6% (p = 0.029).
    CONCLUSIONS: The number of CTs performed for head trauma gradually increased over two decades, while the yield decreased. In 2011, despite implementation of a guideline aiming to improve selective use of CT in minor head injury, utilization significantly increased.
    CONCLUSIONS: • Over two decades, the number of head CTs performed for minor, moderate, and severe head injury gradually increased, while the diagnostic yield for intracranial findings showed a decreasing trend. • Despite the implementation of a guideline in 2011, aiming to improve selective use of CT in minor head injury, utilization significantly increased, while diagnostic yield significantly decreased. • Natural language processing is a valuable tool to monitor the utilization and diagnostic yield of imaging as a potential quality-of-care indicator.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Proper Health-Care Waste Management (HCWM) and integrated documentation in this sector of hospitals require analyzing massive data collected by hospital\'s health experts. This study presented a quantitative software-based index to assess the HCWM process performance by integrating ontology-based Multi-Criteria Group Decision-Making techniques and fuzzy modeling that were coupled with data mining. This framework represented the Complex Event Processing (CEP) and Corporate Performance Management (CPM) types of Process Mining in which a user-friendly software namely Group Fuzzy Decision-Making (GFDM) was employed for index calculation.
    Assessing the governmental hospitals of Shiraz, Iran in 2016 showed that the proposed index was able to determine the waste management condition and clarify the blind spots of HCWM in the hospitals. The index values under 50 were found in some of the hospitals showing poor process performance that should be at the priority of optimization and improvement.
    The proposed framework has distinctive features such as modeling the uncertainties (risks) in hospitals\' process assessment and flexibility enabling users to define the intended criteria, stakeholders, and number of hospitals. Having computer-aided approach for decision process also accelerates the index calculation as well as its accuracy which would contribute to more willingness of hospitals\' experts and other end-users to use the index in practice. The methodology could efficiently be employed as a tool for managing hospitals\' event logs and digital documentation in big data environment not only for the health-care waste management, but also in other administrative wards of hospitals.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Journal Article
    BACKGROUND: This feasibility study of text-mining-based scoring algorithm provides an objective comparison of structured reports (SR) and conventional free-text reports (cFTR) by means of guideline-based key terms. Furthermore, an open-source online version of this ranking algorithm was provided with multilingual text-retrieval pipeline, customizable query and real-time-scoring.
    METHODS: Twenty-five patients with suspected stroke and magnetic resonance imaging were re-assessed by two independent/blinded readers [inexperienced: 3 years; experienced >6 years/Board-certified). SR and cFTR were compared with guideline-query using the cosine similarity score (CSS) and Wilcoxon signed-rank test.
    RESULTS: All pathological findings (18/18) were identified by SR and cFTR. The impressions section of the SRs of the inexperienced reader had the highest median (0.145) and maximal (0.214) CSS and were rated significantly higher (p=2.21×10-5 and p=1.4×10-4, respectively) than cFTR (median=0.102). CSS was robust to variations of query.
    CONCLUSIONS: Objective guideline-based comparison of SRs and cFTRs using the CSS is feasible and provides a scalable quality measure that can facilitate the adoption of structured reports in all fields of radiology.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

公众号