专业英语-隐私保护技术_第1页
专业英语-隐私保护技术_第2页
专业英语-隐私保护技术_第3页
专业英语-隐私保护技术_第4页
专业英语-隐私保护技术_第5页
已阅读5页,还剩34页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、K-Anonymity and Other Cluster-Based MethodsGe RuanOct. 11,2007Data Publishing and Data PrivacyoSociety is experiencing exponential growth in the number and variety of data collections containing person-specific information.oThese collected information is valuable both in research and business. Data

2、sharing is common.oPublishing the data may put the respondents privacy in risk.oObjective:nMaximize data utility while limiting disclosure risk to an acceptable levelRelated WorksoStatistical DatabasesnThe most common way is adding noise and still maintaining some statistical invariant.Disadvantages

3、: odestroy the integrity of the dataRelated Works(Contd)oMulti-level DatabasesnData is stored at different security classifications and users having different security clearances. (Denning and Lunt)nEliminating precise inference. Sensitive information is suppressed, i.e. simply not released. (Su and

4、 Ozsoyoglu)Disadvantages:nIt is impossible to consider every possible attacknMany data holders share same data. But their concerns are different.nSuppression can drastically reduce the quality of the data.Related Works (Contd)oComputer SecuritynAccess control and authentication ensure that right peo

5、ple has right authority to the right object at right time and right place.nThats not what we want here. A general doctrine of data privacy is to release all the information as much as the identities of the subjects (people) are protected.K-Anonymity Sweeny came up with a formal protection model name

6、d k-anonymityoWhat is K-Anonymity?nIf the information for each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appears in the release.nEx.If you try to identify a man from a release, but the only information you have is his birth date and

7、gender. There are k people meet the requirement. This is k-Anonymity.Classification of AttributesoKey Attribute: nName, Address, Cell Phonenwhich can uniquely identify an individual directlynAlways removed before release.oQuasi-Identifier: n5-digit ZIP code,Birth date, gendernA set of attributes tha

8、t can be potentially linked with external information to re-identify entitiesn87% of the population in U.S. can be uniquely identified based on these attributes, according to the Census summary data in 1991.nSuppressed or generalizedClassification of Attributes(Contd)DOBSexZipcodeDisease1/21/76Male5

9、3715Heart Disease4/13/86Female53715Hepatitis2/28/76Male53703Brochitis1/21/76Male53703Broken Arm4/13/86Female53706Flu2/28/76Female53706Hang NailNameDOBSexZipcodeAndre1/21/76Male53715Beth1/10/81Female55410Carol10/1/44Female90210Dan2/21/84Male02174Ellen4/19/72Female02237Hospital Patient DataVote Regist

10、ration DataoAndre has heart disease!Classification of Attributes(Contd)oSensitive Attribute: nMedical record, wage,etc.nAlways released directly. These attributes is what the researchers need. It depends on the requirement.K-Anonymity Protection ModeloPT: Private TableoRT,GT1,GT2: Released TableoQI:

11、 Quasi Identifier (Ai,Aj)o(A1,A2,An): AttributesLemma:Attacks Against K-AnonymityoUnsorted Matching AttacknThis attack is based on the order in which tuples appear in the released table.nSolution:oRandomly sort the tuples before releasing.Attacks Against K-Anonymity(Contd)oComplementary Release Atta

12、cknDifferent releases can be linked together to compromise k-anonymity.nSolution:oConsider all of the released tables before release the new one, and try to avoid linking. oOther data holders may release some data that can be used in this kind of attack. Generally, this kind of attack is hard to be

13、prohibited completely.Attacks Against K-Anonymity(Contd)oComplementary Release Attack (Contd)Attacks Against K-Anonymity(Contd)oComplementary Release Attack (Contd)Attacks Against K-Anonymity(Contd)oTemporal Attack (Contd)nAdding or removing tuples may compromise k-anonymity protection.Attacks Again

14、st K-Anonymity(Contd)ZipcodeAgeDisease476*2*Heart Disease476*2*Heart Disease476*2*Heart Disease4790*40Flu4790*40Heart Disease4790*40Cancer476*3*Heart Disease476*3*Cancer476*3*CancerA 3-anonymous patient tableBobZipcodeAge4767827CarlZipcodeAge4767336ok-Anonymity does not provide privacy if:nSensitive

15、 values in an equivalence class lack diversitynThe attacker has background knowledgeHomogeneity AttackBackground Knowledge AttackA. Machanavajjhala et al. l-Diversity: Privacy Beyond k-Anonymity. ICDE 2006l-DiversityoDistinct l-diversitynEach equivalence class has at least l well-represented sensiti

16、ve valuesnLimitation:oDoesnt prevent the probabilistic inference attacksoEx.In one equivalent class, there are ten tuples. In the “Disease” area, one of them is “Cancer”, one is “Heart Disease” and the remaining eight are “Flu”. This satisfies 3-diversity, but the attacker can still affirm that the

17、target persons disease is “Flu” with the accuracy of 70%.A. Machanavajjhala et al. l-Diversity: Privacy Beyond k-Anonymity. ICDE 2006l-Diversity(Contd)oEntropy l-diversitynEach equivalence class not only must have enough different sensitive values, but also the different sensitive values must be dis

18、tributed evenly enough.nIn the formal language of statistic, it means the entropy of the distribution of sensitive values in each equivalence class is at least log(l)nSometimes this maybe too restrictive. When some values are very common, the entropy of the entire table may be very low. This leads t

19、o the less conservative notion of l-diversity.A. Machanavajjhala et al. l-Diversity: Privacy Beyond k-Anonymity. ICDE 2006l-Diversity(Contd)oRecursive (c,l)-diversitynThe most frequent value does not appear too frequentlynr1DP2,QnGround distance for any pair of valuesDP,Q is dependent upon the groun

20、d distances.Earth Movers DistanceoFormulationnP=(p1,p2,pm), Q=(q1,q2,qm)ndij: the ground distance between element i of P and element j of Q.nFind a flow F=fij where fij is the flow of mass from element i of P to element j of Q that minimizes the overall work:subject to the constraints:Earth Movers D

21、istanceoExamplen3k,4k,5k and 3k,4k,5k,6k,7k,8k,9k,10k,11k nMove 1/9 probability for each of the following pairso3k-6k,3k-7k cost: 1/9*(3+4)/8o4k-8k,4k-9k cost: 1/9*(4+5)/8o5k-10k,5k-11k cost: 1/9*(5+6)/8nTotal cost: 1/9*27/8=0.375nWith P2=6k,8k,11k , we can get the total cost is 0.167 0.375. This ma

22、ke more sense than the other two distance calculation method.How to calculate EMDoEMD for numerical attributesnOrdered distancenOrdered-distance is a metricoNon-negative, symmetry, triangle inequalitynLet ri=pi-qi, then DP,Q is calculated as:|( , )1ijijordereddist v vm1121211111 ,(| . |.|)|11|mimjij

23、Drrrrrrrmm P QHow to calculate EMDoEMD for categorical attributesnEqual distancenEqual-distance is a metricnDP,Q is calculated as:( , )1ijequaldist v v11 ,|()()2iimiiiiiiipqpi qiDpqpqpq P QHow to calculate EMD(Contd)oEMD for categorical attributesnHierarchical distancenHierarchical distance is a met

24、ric( , )( , )ijijlevel v vhierarchicaldist v vHHow to calculate EMD(Contd)oEMD for categorical attributes()() 0_()|( )|C Child Nextra Cposextra Nextra C()() 0_()|( )|C Child Nextra Cnegextra Nextra C()cos ()min(_(),_()height Nt Nposextra Nnegextra NH ,cos ()NDt NP QnDP,Q is calculated as: ()if is a

25、leaf()( ) otherwiseiiC Child NpqNextra Nextra CExperimentsoGoalnTo show l-diversity does not provide sufficient privacy protection (the similarity attack).nTo show the efficiency and data quality of using t-closeness are comparable with other privacy measures.oSetupnAdult dataset from UC Irvine ML repositoryn30162 tuples, 9 attributes (2 sensitive attributes)nAlgorithm: IncognitoExperimentsoSimilarity attack (Occupation)n13 of 21 entropy 2-diversity tables are vulnerablen17 of 26 r

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论