




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、K-Anonymity and Other Cluster-Based MethodsGe RuanOct. 11,2007Data Publishing and Data PrivacyoSociety is experiencing exponential growth in the number and variety of data collections containing person-specific information.oThese collected information is valuable both in research and business. Data
2、sharing is common.oPublishing the data may put the respondents privacy in risk.oObjective:nMaximize data utility while limiting disclosure risk to an acceptable levelRelated WorksoStatistical DatabasesnThe most common way is adding noise and still maintaining some statistical invariant.Disadvantages
3、: odestroy the integrity of the dataRelated Works(Contd)oMulti-level DatabasesnData is stored at different security classifications and users having different security clearances. (Denning and Lunt)nEliminating precise inference. Sensitive information is suppressed, i.e. simply not released. (Su and
4、 Ozsoyoglu)Disadvantages:nIt is impossible to consider every possible attacknMany data holders share same data. But their concerns are different.nSuppression can drastically reduce the quality of the data.Related Works (Contd)oComputer SecuritynAccess control and authentication ensure that right peo
5、ple has right authority to the right object at right time and right place.nThats not what we want here. A general doctrine of data privacy is to release all the information as much as the identities of the subjects (people) are protected.K-Anonymity Sweeny came up with a formal protection model name
6、d k-anonymityoWhat is K-Anonymity?nIf the information for each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appears in the release.nEx.If you try to identify a man from a release, but the only information you have is his birth date and
7、gender. There are k people meet the requirement. This is k-Anonymity.Classification of AttributesoKey Attribute: nName, Address, Cell Phonenwhich can uniquely identify an individual directlynAlways removed before release.oQuasi-Identifier: n5-digit ZIP code,Birth date, gendernA set of attributes tha
8、t can be potentially linked with external information to re-identify entitiesn87% of the population in U.S. can be uniquely identified based on these attributes, according to the Census summary data in 1991.nSuppressed or generalizedClassification of Attributes(Contd)DOBSexZipcodeDisease1/21/76Male5
9、3715Heart Disease4/13/86Female53715Hepatitis2/28/76Male53703Brochitis1/21/76Male53703Broken Arm4/13/86Female53706Flu2/28/76Female53706Hang NailNameDOBSexZipcodeAndre1/21/76Male53715Beth1/10/81Female55410Carol10/1/44Female90210Dan2/21/84Male02174Ellen4/19/72Female02237Hospital Patient DataVote Regist
10、ration DataoAndre has heart disease!Classification of Attributes(Contd)oSensitive Attribute: nMedical record, wage,etc.nAlways released directly. These attributes is what the researchers need. It depends on the requirement.K-Anonymity Protection ModeloPT: Private TableoRT,GT1,GT2: Released TableoQI:
11、 Quasi Identifier (Ai,Aj)o(A1,A2,An): AttributesLemma:Attacks Against K-AnonymityoUnsorted Matching AttacknThis attack is based on the order in which tuples appear in the released table.nSolution:oRandomly sort the tuples before releasing.Attacks Against K-Anonymity(Contd)oComplementary Release Atta
12、cknDifferent releases can be linked together to compromise k-anonymity.nSolution:oConsider all of the released tables before release the new one, and try to avoid linking. oOther data holders may release some data that can be used in this kind of attack. Generally, this kind of attack is hard to be
13、prohibited completely.Attacks Against K-Anonymity(Contd)oComplementary Release Attack (Contd)Attacks Against K-Anonymity(Contd)oComplementary Release Attack (Contd)Attacks Against K-Anonymity(Contd)oTemporal Attack (Contd)nAdding or removing tuples may compromise k-anonymity protection.Attacks Again
14、st K-Anonymity(Contd)ZipcodeAgeDisease476*2*Heart Disease476*2*Heart Disease476*2*Heart Disease4790*40Flu4790*40Heart Disease4790*40Cancer476*3*Heart Disease476*3*Cancer476*3*CancerA 3-anonymous patient tableBobZipcodeAge4767827CarlZipcodeAge4767336ok-Anonymity does not provide privacy if:nSensitive
15、 values in an equivalence class lack diversitynThe attacker has background knowledgeHomogeneity AttackBackground Knowledge AttackA. Machanavajjhala et al. l-Diversity: Privacy Beyond k-Anonymity. ICDE 2006l-DiversityoDistinct l-diversitynEach equivalence class has at least l well-represented sensiti
16、ve valuesnLimitation:oDoesnt prevent the probabilistic inference attacksoEx.In one equivalent class, there are ten tuples. In the “Disease” area, one of them is “Cancer”, one is “Heart Disease” and the remaining eight are “Flu”. This satisfies 3-diversity, but the attacker can still affirm that the
17、target persons disease is “Flu” with the accuracy of 70%.A. Machanavajjhala et al. l-Diversity: Privacy Beyond k-Anonymity. ICDE 2006l-Diversity(Contd)oEntropy l-diversitynEach equivalence class not only must have enough different sensitive values, but also the different sensitive values must be dis
18、tributed evenly enough.nIn the formal language of statistic, it means the entropy of the distribution of sensitive values in each equivalence class is at least log(l)nSometimes this maybe too restrictive. When some values are very common, the entropy of the entire table may be very low. This leads t
19、o the less conservative notion of l-diversity.A. Machanavajjhala et al. l-Diversity: Privacy Beyond k-Anonymity. ICDE 2006l-Diversity(Contd)oRecursive (c,l)-diversitynThe most frequent value does not appear too frequentlynr1DP2,QnGround distance for any pair of valuesDP,Q is dependent upon the groun
20、d distances.Earth Movers DistanceoFormulationnP=(p1,p2,pm), Q=(q1,q2,qm)ndij: the ground distance between element i of P and element j of Q.nFind a flow F=fij where fij is the flow of mass from element i of P to element j of Q that minimizes the overall work:subject to the constraints:Earth Movers D
21、istanceoExamplen3k,4k,5k and 3k,4k,5k,6k,7k,8k,9k,10k,11k nMove 1/9 probability for each of the following pairso3k-6k,3k-7k cost: 1/9*(3+4)/8o4k-8k,4k-9k cost: 1/9*(4+5)/8o5k-10k,5k-11k cost: 1/9*(5+6)/8nTotal cost: 1/9*27/8=0.375nWith P2=6k,8k,11k , we can get the total cost is 0.167 0.375. This ma
22、ke more sense than the other two distance calculation method.How to calculate EMDoEMD for numerical attributesnOrdered distancenOrdered-distance is a metricoNon-negative, symmetry, triangle inequalitynLet ri=pi-qi, then DP,Q is calculated as:|( , )1ijijordereddist v vm1121211111 ,(| . |.|)|11|mimjij
23、Drrrrrrrmm P QHow to calculate EMDoEMD for categorical attributesnEqual distancenEqual-distance is a metricnDP,Q is calculated as:( , )1ijequaldist v v11 ,|()()2iimiiiiiiipqpi qiDpqpqpq P QHow to calculate EMD(Contd)oEMD for categorical attributesnHierarchical distancenHierarchical distance is a met
24、ric( , )( , )ijijlevel v vhierarchicaldist v vHHow to calculate EMD(Contd)oEMD for categorical attributes()() 0_()|( )|C Child Nextra Cposextra Nextra C()() 0_()|( )|C Child Nextra Cnegextra Nextra C()cos ()min(_(),_()height Nt Nposextra Nnegextra NH ,cos ()NDt NP QnDP,Q is calculated as: ()if is a
25、leaf()( ) otherwiseiiC Child NpqNextra Nextra CExperimentsoGoalnTo show l-diversity does not provide sufficient privacy protection (the similarity attack).nTo show the efficiency and data quality of using t-closeness are comparable with other privacy measures.oSetupnAdult dataset from UC Irvine ML repositoryn30162 tuples, 9 attributes (2 sensitive attributes)nAlgorithm: IncognitoExperimentsoSimilarity attack (Occupation)n13 of 21 entropy 2-diversity tables are vulnerablen17 of 26 r
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 钻头代理经销协议书
- 外包运输安全协议书
- 旧房房顶改造协议书
- 刑事司法互助协议书
- 问题处理调解协议书
- 煤炭联营协议书范本
- 医药连锁购销协议书
- 项目承包内部协议书
- 没有档案托管协议书
- 汽车限速协议书范本
- 防流感班会课件
- 2025安徽蚌埠市国有资本运营控股集团有限公司招聘4人笔试参考题库附带答案详解
- 2024年中国资源循环集团有限公司招聘笔试真题
- 行政管理本科毕业论文-数字政府背景下地方政府治理效能研究
- 家庭营养师课件
- 铁路护路工作培训
- 玉兰采购及包栽包活合同范本
- 2025年春季四年级下册语文第15课《白鹅》课件(统编版)
- 2024北京市大兴初二(下)期中数学试卷及答案
- JGT266-2011 泡沫混凝土标准规范
- 中央八项规定实施细则解读课件
评论
0/150
提交评论