data privacy

数据隐私
  • 文章类型: Journal Article
    背景:确保充分的数据隐私对于数据的有效利用至关重要。去识别,涉及屏蔽或替换数据集中的特定值,可能会损坏数据集的实用程序。然而,在数据隐私和效用之间找到合理的平衡并不容易。尽管如此,很少有研究调查数据去识别工作如何影响数据分析结果。本研究旨在通过临床分析用例证明不同的去识别方法对数据集效用的影响,并评估在数据隐私和效用之间找到可行的权衡的可行性。
    方法:将急诊科住院时间的预测模型用作数据分析用例。从位于首尔的学术医学中心的临床数据仓库中提取的1155例患者病例开发了逻辑回归模型,韩国。使用ARX基于各种去识别配置生成了19个去识别数据集,一个用于匿名敏感个人数据的开源软件。在去识别数据集和原始数据集之间比较变量分布和预测结果。我们研究了数据隐私和效用之间的关联,以确定在两者之间确定可行的权衡是否可行。
    结果:所有19种去识别方案都显著降低了重新识别风险。然而,去识别过程导致记录抑制和完全掩盖用作预测因子的变量,从而损害数据集的效用。仅在重新识别减少率和ARX效用得分之间观察到显着相关性。
    结论:随着健康数据分析的重要性增加,所以需要有效的隐私保护方法。虽然现有指南为取消识别数据集提供了基础,在高隐私性和实用性之间取得平衡是一项复杂的任务,需要了解数据的预期用途并涉及数据用户的输入。这种方法可以帮助在数据隐私和效用之间找到合适的折衷。
    BACKGROUND: Securing adequate data privacy is critical for the productive utilization of data. De-identification, involving masking or replacing specific values in a dataset, could damage the dataset\'s utility. However, finding a reasonable balance between data privacy and utility is not straightforward. Nonetheless, few studies investigated how data de-identification efforts affect data analysis results. This study aimed to demonstrate the effect of different de-identification methods on a dataset\'s utility with a clinical analytic use case and assess the feasibility of finding a workable tradeoff between data privacy and utility.
    METHODS: Predictive modeling of emergency department length of stay was used as a data analysis use case. A logistic regression model was developed with 1155 patient cases extracted from a clinical data warehouse of an academic medical center located in Seoul, South Korea. Nineteen de-identified datasets were generated based on various de-identification configurations using ARX, an open-source software for anonymizing sensitive personal data. The variable distributions and prediction results were compared between the de-identified datasets and the original dataset. We examined the association between data privacy and utility to determine whether it is feasible to identify a viable tradeoff between the two.
    RESULTS: All 19 de-identification scenarios significantly decreased re-identification risk. Nevertheless, the de-identification processes resulted in record suppression and complete masking of variables used as predictors, thereby compromising dataset utility. A significant correlation was observed only between the re-identification reduction rates and the ARX utility scores.
    CONCLUSIONS: As the importance of health data analysis increases, so does the need for effective privacy protection methods. While existing guidelines provide a basis for de-identifying datasets, achieving a balance between high privacy and utility is a complex task that requires understanding the data\'s intended use and involving input from data users. This approach could help find a suitable compromise between data privacy and utility.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    2019年12月,当COVID-19在中国传播时,成千上万的研究都集中在这种大流行上。每个人都有一个独特的视角,反映了大流行的主要科学学科。例如,社会科学家关注的是减少对人类精神状态的心理影响,特别是在封锁期间。计算机科学家专注于建立快速准确的计算机化工具来帮助诊断,预防,从疾病中恢复过来。医学科学家和医生,或者前线,是收到的主要英雄,治疗,并以牺牲自己的健康为代价处理了数百万病例。他们中的一些人甚至以生命为代价继续工作。所有这些研究都强化了多学科的工作,来自不同学科的科学家(社会,环境,技术,等。)联合起来,在危机期间为有益的结果进行研究。计算机科学及其各种技术是众多分支之一,包括人工智能,物联网,大数据,决策支持系统(DSS)还有更多。最值得注意的DSS利用是与多准则决策(MCDM)相关的利用,它应用于各种应用程序和许多上下文,包括商业,社会,技术和医学。由于其在制定正确的决策方案和准确判断的预防策略方面的重要性,它被认为是广泛探索的一个值得注意的话题,特别是在与COVID-19相关的医疗应用中。本研究是使用系统评价方案对COVID-19相关MCDM医学案例研究的综合评价。PRISMA方法被用来从四个主要的科学数据库(ScienceDirect,IEEEXplore,Scopus,和WebofScience)。最终的文章集分为分类法,包括五组:(1)诊断(n=6),(2)安全(n=11),(3)医院(n=8),(4)治疗(n=4),和(5)审查(n=3)。还在年度科学生产的基础上进行了书目分析,国家科学生产,共现,和共同作者。还进行了全面的讨论,以讨论主要挑战,动机,以及在COVID-19相关医学病例研究中使用MCDM研究的建议。最后,我们通过相应的解决方案和详细的方法来确定关键的研究差距,以作为未来方向的指南。总之,MCDM可以有效地用于医学领域,以优化资源并做出最佳选择,尤其是在大流行和自然灾害期间。
    When COVID-19 spread in China in December 2019, thousands of studies have focused on this pandemic. Each presents a unique perspective that reflects the pandemic\'s main scientific disciplines. For example, social scientists are concerned with reducing the psychological impact on the human mental state especially during lockdown periods. Computer scientists focus on establishing fast and accurate computerized tools to assist in diagnosing, preventing, and recovering from the disease. Medical scientists and doctors, or the frontliners, are the main heroes who received, treated, and worked with the millions of cases at the expense of their own health. Some of them have continued to work even at the expense of their lives. All these studies enforce the multidisciplinary work where scientists from different academic disciplines (social, environmental, technological, etc.) join forces to produce research for beneficial outcomes during the crisis. One of the many branches is computer science along with its various technologies, including artificial intelligence, Internet of Things, big data, decision support systems (DSS), and many more. Among the most notable DSS utilization is those related to multicriterion decision making (MCDM), which is applied in various applications and across many contexts, including business, social, technological and medical. Owing to its importance in developing proper decision regimens and prevention strategies with precise judgment, it is deemed a noteworthy topic of extensive exploration, especially in the context of COVID-19-related medical applications. The present study is a comprehensive review of COVID-19-related medical case studies with MCDM using a systematic review protocol. PRISMA methodology is utilized to obtain a final set of (n = 35) articles from four major scientific databases (ScienceDirect, IEEE Xplore, Scopus, and Web of Science). The final set of articles is categorized into taxonomy comprising five groups: (1) diagnosis (n = 6), (2) safety (n = 11), (3) hospital (n = 8), (4) treatment (n = 4), and (5) review (n = 3). A bibliographic analysis is also presented on the basis of annual scientific production, country scientific production, co-occurrence, and co-authorship. A comprehensive discussion is also presented to discuss the main challenges, motivations, and recommendations in using MCDM research in COVID-19-related medial case studies. Lastly, we identify critical research gaps with their corresponding solutions and detailed methodologies to serve as a guide for future directions. In conclusion, MCDM can be utilized in the medical field effectively to optimize the resources and make the best choices particularly during pandemics and natural disasters.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    随着传感器的出现,越来越多的服务被开发,以便为客户提供关于他们的健康和他们的家电\'在家里的能源消耗的见解。要做到这一点,这些服务使用新的挖掘算法来创建新的推理通道。然而,收集的传感器数据可以被转移到推断客户不同意共享的个人数据。这种对未收集的数据的间接访问对应于涉及原始传感器数据(IASD)的推理攻击。面对这些新型攻击,现有的推理检测系统不适合这些推理通道和用户知识的表示要求。在本文中,我们提出了满足这些推理通道表示的RICE-M(基于原始传感器数据的推理信道模型)。基于RICE-M,我们提出了RICE-Sy一个能够检测IASD的可扩展系统,并以MHEALTH数据集为例评估了其性能。不出所料,由于管理的大量传感器数据和快速增长的用户知识,检测IASD被证明是二次的。为了克服这个缺点,我们首先提出了一组降低检测复杂度的概念优化。虽然变得线性,由于在线检测时间保持大于固定的可接受查询响应限制,我们提出了两种方法来估计RICE-Sy的潜力。第一个是基于分区策略,旨在对用户的知识进行分区。我们观察到,通过将用户获得的知识数量作为划分标准,RICE-Sy的中位检测时间减少了63%。第二种方法是H-RICE-SY,建立在RICE-Sy上的混合检测体系结构,该体系结构将查询时的检测限制为具有高恶意概率的用户。我们展示了在查询时处理所有恶意用户的限制,而不会影响查询应答时间。我们观察到,对于30%的用户被认为是恶意的,在线检测时间中位数保持在80ms的可接受时间以下,总共有120万个用户知识实体。根据观察到的增长率,我们估计,对于恶意用户发布的5%的用户知识,在可接受的时间内,可以在线处理最大约860万用户的信息。
    With the advent of sensors, more and more services are developed in order to provide customers with insights about their health and their appliances\' energy consumption at home. To do so, these services use new mining algorithms that create new inference channels. However, the collected sensor data can be diverted to infer personal data that customers do not consent to share. This indirect access to data that are not collected corresponds to inference attacks involving raw sensor data (IASD). Towards these new kinds of attacks, existing inference detection systems do not suit the representation requirements of these inference channels and of user knowledge. In this paper, we propose RICE-M (Raw sensor data based Inference ChannEl Model) that meets these inference channel representations. Based on RICE-M, we proposed RICE-Sy an extensible system able to detect IASDs, and evaluated its performance taking as a case study the MHEALTH dataset. As expected, detecting IASD is proven to be quadratic due to huge sensor data managed and a quickly growing amount of user knowledge. To overcome this drawback, we propose first a set of conceptual optimizations that reduces the detection complexity. Although becoming linear, as online detection time remains greater than a fixed acceptable query response limit, we propose two approaches to estimate the potential of RICE-Sy. The first one is based on partitioning strategies which aim at partitioning the knowledge of users. We observe that by considering the quantity of knowledge gained by a user as a partitioning criterion, the median detection time of RICE-Sy is reduced by 63%. The second approach is H-RICE-SY, a hybrid detection architecture built on RICE-Sy which limits the detection at query-time to users that have a high probability to be malicious. We show the limits of processing all malicious users at query-time, without impacting the query answer time. We observe that for a ratio of 30% users considered as malicious, the median online detection time stays under the acceptable time of 80 ms, for up to a total volume of 1.2 million user knowledge entities. Based on the observed growth rates, we have estimated that for 5% of user knowledge issued by malicious users, a maximum volume of approximately 8.6 million user\'s information can be processed online in an acceptable time.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

    求助全文

  • 文章类型: Case Reports
    随着COVID-19在欧洲的爆发,匿名电信数据提供了对人口水平流动性和评估遏制措施的影响和有效性的关键见解。沃达丰在全球范围内的反应是快速的,并为大流行提供了关键的新指标,这些指标已被证明对许多外部实体有用。与各国政府和超国家实体合作,帮助抗击COVID-19大流行是沃达丰应对措施的关键部分,并在本文中分析了开发的不同方法,以及在这方面建立的关键合作。在本文中,我们还分析了发现的监管挑战,以及这些如何构成不利用这些见解的全部好处的风险,尽管进行了清晰高效的隐私和道德评估,以确保个人安全和数据隐私。
    With the outbreak of COVID-19 across Europe, anonymized telecommunications data provides a key insight into population level mobility and assessing the impact and effectiveness of containment measures. Vodafone\'s response across its global footprint was fast and delivered key new metrics for the pandemic that have proven to be useful for a number of external entities. Cooperation with national governments and supra-national entities to help fight the COVID-19 pandemic was a key part of Vodafone\'s response, and in this article the different methodologies developed are analyzed, as well as the key collaborations established in this context. In this article we also analyze the regulatory challenges found, and how these can pose a risk of the full benefits of these insights not being harnessed, despite clear and efficient Privacy and Ethics assessments to ensure individual safety and data privacy.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    促进增加联邦收集数据共享的联邦开放数据倡议对于透明度很重要,数据质量,信任,以及与公众和国家的关系,部落,当地,领土伙伴。这些举措通过向研究人员提供数据来促进对健康状况和疾病的理解,科学家,和决策者进行分析,合作,并在疾病控制和预防中心(CDC)之外使用,特别是对于像COVID-19这样的新兴条件,数据需求在不断变化。自从大流行开始以来,疾控中心已经收集了个人级别,来自司法管辖区的去识别数据,目前拥有超过800万条记录。我们描述了CDC如何从这些收集的数据中设计和产生2个去识别的公共数据集。
    我们包含了基于有用性的数据元素,公开请求,和隐私影响;我们抑制了一些字段值,以减少重新识别和暴露机密信息的风险。我们创建了数据集,并通过使用数据管理平台分析工具和R脚本验证了它们的隐私和机密性。
    不受限制的数据通过数据向公众开放。CDC.gov,和受限制的数据,使用其他字段,可以通过GitHub.com上的私人存储库来提供数据使用协议。
    丰富了对现有公共数据的理解,用于创建这些数据的方法,以及用于保护去识别的人的隐私的算法允许改进数据使用。自动化数据生成过程提高了共享数据的数量和及时性。
    Federal open-data initiatives that promote increased sharing of federally collected data are important for transparency, data quality, trust, and relationships with the public and state, tribal, local, and territorial partners. These initiatives advance understanding of health conditions and diseases by providing data to researchers, scientists, and policymakers for analysis, collaboration, and use outside the Centers for Disease Control and Prevention (CDC), particularly for emerging conditions such as COVID-19, for which data needs are constantly evolving. Since the beginning of the pandemic, CDC has collected person-level, de-identified data from jurisdictions and currently has more than 8 million records. We describe how CDC designed and produces 2 de-identified public datasets from these collected data.
    We included data elements based on usefulness, public request, and privacy implications; we suppressed some field values to reduce the risk of re-identification and exposure of confidential information. We created datasets and verified them for privacy and confidentiality by using data management platform analytic tools and R scripts.
    Unrestricted data are available to the public through Data.CDC.gov, and restricted data, with additional fields, are available with a data-use agreement through a private repository on GitHub.com.
    Enriched understanding of the available public data, the methods used to create these data, and the algorithms used to protect the privacy of de-identified people allow for improved data use. Automating data-generation procedures improves the volume and timeliness of sharing data.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

  • 文章类型: Journal Article
    Health care professionals are caught between the wish of patients to speed up health-related communication via emails and the need for protecting health information.
    We aimed to analyze the demographic characteristics of patients providing an email, and study the distribution of emails\' domain names.
    We used the information system of the European Hospital Georges Pompidou (HEGP) to identify patients who provided an email address. We used a 1:1 matching strategy to study the demographic characteristics of the patients associated with the presence of an email, and described the characteristics of the emails used (in terms of types of emails-free, business, and personal).
    Overall, 4.22% (41,004/971,822) of the total population of patients provided an email address. The year of last contact with the patient is the strongest driver of the presence of an email address (odds ratio [OR] 20.8, 95% CI 18.9-22.9). Patients more likely to provide an email address were treated for chronic conditions and were more likely born between 1950 and 1969 (taking patients born before 1950 as reference [OR 1.60, 95% CI 1.54-1.67], and compared to those born after 1990 [OR 0.56, 95% CI 0.53-0.59]). Of the 41,004 email addresses collected, 37,779 were associated with known email providers, 31,005 email addresses were associated with Google, Microsoft, Orange, and Yahoo!, 2878 with business emails addresses, and 347 email addresses with personalized domain names.
    Emails have been collected only recently in our institution. The importance of the year of last contact probably reflects this recent change in contact information collection policy. The demographic characteristics and especially the age distribution are likely the result of a population bias in the hospital: patients providing email are more likely to be treated for chronic diseases. A risk analysis of the use of email revealed several situations that could constitute a breach of privacy that is both likely and with major consequences. Patients treated for chronic diseases are more likely to provide an email address, and are also more at risk in case of privacy breach. Several common situations could expose their private information. We recommend a very restrictive use of the emails for health communication.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Pubmed)

  • 文章类型: Journal Article
    Data on diagnosis of infection in the general population are strategic for different applications in the public and private spheres. Among them, the data related to symptoms and people displacement stand out, mainly considering highly contagious diseases. This data is sensitive and requires data privacy initiatives to enable its large-scale use. The search for population-monitoring strategies aims at social tracking, supporting the surveillance of contagions to respond to the confrontation with COVID-19. There are several data privacy issues in environments where IoT devices are used for monitoring hospital processes. In this research, we compare works related to the subject of privacy in the health area. To this end, this research proposes a taxonomy to support the requirements necessary to control patient data privacy in a hospital environment. According to the tests and comparisons made between the variables compared, the application obtained results that contribute to the scenarios applied. In this sense, we modeled and implemented an application. By the end, a mobile application was developed to analyze the privacy and security constraints with COVID-19.
    导出

    更多引用

    收藏

    翻译标题摘要

    我要上传

       PDF(Sci-hub)

       PDF(Pubmed)

公众号