版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、K-Anonymity and Other Cluster-Based MethodsGe RuanOct. 11,2007Data Publishing and Data PrivacyoSociety is experiencing exponential growth in the number and variety of data collections containing person-specific information.oThese collected information is valuable both in research and business. Data
2、sharing is common.oPublishing the data may put the respondents privacy in risk.oObjective:nMaximize data utility while limiting disclosure risk to an acceptable levelRelated WorksoStatistical DatabasesnThe most common way is adding noise and still maintaining some statistical invariant.Disadvantages
3、: odestroy the integrity of the dataRelated Works(Contd)oMulti-level DatabasesnData is stored at different security classifications and users having different security clearances. (Denning and Lunt)nEliminating precise inference. Sensitive information is suppressed, i.e. simply not released. (Su and
4、 Ozsoyoglu)Disadvantages:nIt is impossible to consider every possible attacknMany data holders share same data. But their concerns are different.nSuppression can drastically reduce the quality of the data.Related Works (Contd)oComputer SecuritynAccess control and authentication ensure that right peo
5、ple has right authority to the right object at right time and right place.nThats not what we want here. A general doctrine of data privacy is to release all the information as much as the identities of the subjects (people) are protected.K-Anonymity Sweeny came up with a formal protection model name
6、d k-anonymityoWhat is K-Anonymity?nIf the information for each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appears in the release.nEx.If you try to identify a man from a release, but the only information you have is his birth date and
7、gender. There are k people meet the requirement. This is k-Anonymity.Classification of AttributesoKey Attribute: nName, Address, Cell Phonenwhich can uniquely identify an individual directlynAlways removed before release.oQuasi-Identifier: n5-digit ZIP code,Birth date, gendernA set of attributes tha
8、t can be potentially linked with external information to re-identify entitiesn87% of the population in U.S. can be uniquely identified based on these attributes, according to the Census summary data in 1991.nSuppressed or generalizedClassification of Attributes(Contd)DOBSexZipcodeDisease1/21/76Male5
9、3715Heart Disease4/13/86Female53715Hepatitis2/28/76Male53703Brochitis1/21/76Male53703Broken Arm4/13/86Female53706Flu2/28/76Female53706Hang NailNameDOBSexZipcodeAndre1/21/76Male53715Beth1/10/81Female55410Carol10/1/44Female90210Dan2/21/84Male02174Ellen4/19/72Female02237Hospital Patient DataVote Regist
10、ration DataoAndre has heart disease!Classification of Attributes(Contd)oSensitive Attribute: nMedical record, wage,etc.nAlways released directly. These attributes is what the researchers need. It depends on the requirement.K-Anonymity Protection ModeloPT: Private TableoRT,GT1,GT2: Released TableoQI:
11、 Quasi Identifier (Ai,Aj)o(A1,A2,An): AttributesLemma:Attacks Against K-AnonymityoUnsorted Matching AttacknThis attack is based on the order in which tuples appear in the released table.nSolution:oRandomly sort the tuples before releasing.Attacks Against K-Anonymity(Contd)oComplementary Release Atta
12、cknDifferent releases can be linked together to compromise k-anonymity.nSolution:oConsider all of the released tables before release the new one, and try to avoid linking. oOther data holders may release some data that can be used in this kind of attack. Generally, this kind of attack is hard to be
13、prohibited completely.Attacks Against K-Anonymity(Contd)oComplementary Release Attack (Contd)Attacks Against K-Anonymity(Contd)oComplementary Release Attack (Contd)Attacks Against K-Anonymity(Contd)oTemporal Attack (Contd)nAdding or removing tuples may compromise k-anonymity protection.Attacks Again
14、st K-Anonymity(Contd)ZipcodeAgeDisease476*2*Heart Disease476*2*Heart Disease476*2*Heart Disease4790*40Flu4790*40Heart Disease4790*40Cancer476*3*Heart Disease476*3*Cancer476*3*CancerA 3-anonymous patient tableBobZipcodeAge4767827CarlZipcodeAge4767336ok-Anonymity does not provide privacy if:nSensitive
15、 values in an equivalence class lack diversitynThe attacker has background knowledgeHomogeneity AttackBackground Knowledge AttackA. Machanavajjhala et al. l-Diversity: Privacy Beyond k-Anonymity. ICDE 2006l-DiversityoDistinct l-diversitynEach equivalence class has at least l well-represented sensiti
16、ve valuesnLimitation:oDoesnt prevent the probabilistic inference attacksoEx.In one equivalent class, there are ten tuples. In the “Disease” area, one of them is “Cancer”, one is “Heart Disease” and the remaining eight are “Flu”. This satisfies 3-diversity, but the attacker can still affirm that the
17、target persons disease is “Flu” with the accuracy of 70%.A. Machanavajjhala et al. l-Diversity: Privacy Beyond k-Anonymity. ICDE 2006l-Diversity(Contd)oEntropy l-diversitynEach equivalence class not only must have enough different sensitive values, but also the different sensitive values must be dis
18、tributed evenly enough.nIn the formal language of statistic, it means the entropy of the distribution of sensitive values in each equivalence class is at least log(l)nSometimes this maybe too restrictive. When some values are very common, the entropy of the entire table may be very low. This leads t
19、o the less conservative notion of l-diversity.A. Machanavajjhala et al. l-Diversity: Privacy Beyond k-Anonymity. ICDE 2006l-Diversity(Contd)oRecursive (c,l)-diversitynThe most frequent value does not appear too frequentlynr1DP2,QnGround distance for any pair of valuesDP,Q is dependent upon the groun
20、d distances.Earth Movers DistanceoFormulationnP=(p1,p2,pm), Q=(q1,q2,qm)ndij: the ground distance between element i of P and element j of Q.nFind a flow F=fij where fij is the flow of mass from element i of P to element j of Q that minimizes the overall work:subject to the constraints:Earth Movers D
21、istanceoExamplen3k,4k,5k and 3k,4k,5k,6k,7k,8k,9k,10k,11k nMove 1/9 probability for each of the following pairso3k-6k,3k-7k cost: 1/9*(3+4)/8o4k-8k,4k-9k cost: 1/9*(4+5)/8o5k-10k,5k-11k cost: 1/9*(5+6)/8nTotal cost: 1/9*27/8=0.375nWith P2=6k,8k,11k , we can get the total cost is 0.167 0.375. This ma
22、ke more sense than the other two distance calculation method.How to calculate EMDoEMD for numerical attributesnOrdered distancenOrdered-distance is a metricoNon-negative, symmetry, triangle inequalitynLet ri=pi-qi, then DP,Q is calculated as:|( , )1ijijordereddist v vm1121211111 ,(| . |.|)|11|mimjij
23、Drrrrrrrmm P QHow to calculate EMDoEMD for categorical attributesnEqual distancenEqual-distance is a metricnDP,Q is calculated as:( , )1ijequaldist v v11 ,|()()2iimiiiiiiipqpi qiDpqpqpq P QHow to calculate EMD(Contd)oEMD for categorical attributesnHierarchical distancenHierarchical distance is a met
24、ric( , )( , )ijijlevel v vhierarchicaldist v vHHow to calculate EMD(Contd)oEMD for categorical attributes()() 0_()|( )|C Child Nextra Cposextra Nextra C()() 0_()|( )|C Child Nextra Cnegextra Nextra C()cos ()min(_(),_()height Nt Nposextra Nnegextra NH ,cos ()NDt NP QnDP,Q is calculated as: ()if is a
25、leaf()( ) otherwiseiiC Child NpqNextra Nextra CExperimentsoGoalnTo show l-diversity does not provide sufficient privacy protection (the similarity attack).nTo show the efficiency and data quality of using t-closeness are comparable with other privacy measures.oSetupnAdult dataset from UC Irvine ML repositoryn30162 tuples, 9 attributes (2 sensitive attributes)nAlgorithm: IncognitoExperimentsoSimilarity attack (Occupation)n13 of 21 entropy 2-diversity tables are vulnerablen17 of 26 r
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 二零二五年度文化场馆保安服务协议书2篇
- 专业厨师2024年度服务协议样例版B版
- 二零二五年文化娱乐分公司设立与IP运营协议3篇
- 个性化施工承揽协议范本2024版版A版
- 2025年度电子产品配件销售合同订单模板2篇
- 二零二五年电商平台合作伙伴客户反馈数据保密合同3篇
- 2024版拱门空飘安装合同
- 二零二五版广告公司实习生聘用合同示范书3篇
- 法律服务保密知识产权协议
- 影视场地租赁协议
- 深圳2024-2025学年度四年级第一学期期末数学试题
- 中考语文复习说话要得体
- 《工商业储能柜技术规范》
- 华中师范大学教育技术学硕士研究生培养方案
- 医院医学伦理委员会章程
- 风浪流耦合作用下锚泊式海上试验平台的水动力特性试验
- 高考英语语法专练定语从句含答案
- 有机农业种植技术操作手册
- 【教案】Unit+5+Fun+Clubs+大单元整体教学设计人教版(2024)七年级英语上册
- 2024-2025学年四年级上册数学人教版期末测评卷(含答案)
- 《雾化吸入疗法合理用药专家共识(2024版)》解读
评论
0/150
提交评论