大数据外文翻译参考文献综述_第1页
大数据外文翻译参考文献综述_第2页
大数据外文翻译参考文献综述_第3页
大数据外文翻译参考文献综述_第4页
大数据外文翻译参考文献综述_第5页
已阅读5页,还剩10页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

大数据外文翻译参考文献综述大数据外文翻译参考文献综述(文档含中英文对照即英文原文和中文翻译)原文:DataMiningandDataPublishingDataminingistheextractionofvastinterestingpatternsorknowledgefromhugeamountofdata.Theinitialideaofprivacy-preservingdataminingPPDMwastoextendtraditionaldataminingtechniquestoworkwiththedatamodifiedtomasksensitiveinformation.Thekeyissueswerehowtomodifythedataandhowtorecoverthedataminingresultfromthemodifieddata.Privacy-preservingdataminingconsiderstheproblemofrunningdataminingalgorithmsonconfidentialdatathatisnotsupposedtoberevealedeventothepartyrunningthealgorithm.Incontrast,privacy-preservingdatapublishing(PPDP)maynotnecessarilybetiedtoaspecificdataminingtask,andthedataminingtaskmaybeunknownatthetimeofdatapublishing.PPDPstudieshowtotransformrawdataintoaversionthatisimmunizedagainstprivacyattacksbutthatstillsupportseffectivedataminingtasks.Privacy-preservingforbothdatamining(PPDM)anddatapublishing(PPDP)hasbecomeincreasinglypopularbecauseitallowssharingofprivacysensitivedataforanalysispurposes.Onewellstudiedapproachisthek-anonymitymodel[1]whichinturnledtoothermodelssuchasconfidencebounding,l-diversity,t-closeness,(α,k)-anonymity,etc.Inparticular,allknownmechanismstrytominimizeinformationlossandsuchanattemptprovidesaloopholeforattacks.Theaimofthispaperistopresentasurveyformostofthecommonattackstechniquesforanonymization-basedPPDM&PPDPandexplaintheireffectsonDataPrivacy.Althoughdataminingispotentiallyuseful,manydataholdersarereluctanttoprovidetheirdatafordataminingforthefearofviolatingindividualprivacy.Inrecentyears,studyhasbeenmadetoensurethatthesensitiveinformationofindividualscannotbeidentifiedeasily.AnonymityModels,k-anonymizationtechniqueshavebeenthefocusofintenseresearchinthelastfewyears.Inordertoensureanonymizationofdatawhileatthesametimeminimizingtheinformationlossresultingfromdatamodifications,everalextendingmodelsareproposed,whicharediscussedasfollows.1.k-Anonymityk-anonymityisoneofthemostclassicmodels,whichtechniquethatpreventsjoiningattacksbygeneralizingand/orsuppressingportionsofthereleasedmicrodatasothatnoindividualcanbeuniquelydistinguishedfromagroupofsizek.Inthek-anonymoustables,adatasetisk-anonymous(k≥1)ifeachrecordinthedatasetisin-distinguishablefromatleast(k.1)otherrecordswithinthesamedataset.Thelargerthevalueofk,thebettertheprivacyisprotected.k-anonymitycanensurethatindividualscannotbeuniquelyidentifiedbylinkingattacks.2.ExtendingModelsSincek-anonymitydoesnotprovidesufficientprotectionagainstattributedisclosure.Thenotionofl-diversityattemptstosolvethisproblembyrequiringthateachequivalenceclasshasatleastlwell-representedvalueforeachsensitiveattribute.Thetechnologyofl-diversityhassomeadvantagesthank-anonymity.Becausek-anonymitydatasetpermitsstrongattacksduetolackofdiversityinthesensitiveattributes.Inthismodel,anequivalenceclassissaidtohavel-diversityifthereareatleastlwell-representedvalueforthesensitiveattribute.Becausetherearesemanticrelationshipsamongtheattributevalues,anddifferentvalueshaveverydifferentlevelsofsensitivity.Afteranonymization,inanyequivalenceclass,thefrequency(infraction)ofasensitivevalueisnomorethanα.3.RelatedResearchAreasSeveralpollsshowthatthepublichasanin-creasedsenseofprivacyloss.Sincedataminingisoftenakeycomponentofinformationsystems,homelandsecuritysystems,andmonitoringandsurveillancesystems,itgivesawrongimpressionthatdataminingisatechniqueforprivacyintrusion.Thislackoftrusthasbecomeanobstacletothebenefitofthetechnology.Forexample,thepotentiallybeneficialdataminingre-searchproject,TerrorismInformationAwareness(TIA),wasterminatedbytheUSCongressduetoitscontroversialproceduresofcollecting,sharing,andanalyzingthetrailsleftbyindividuals.Motivatedbytheprivacyconcernsondataminingtools,aresearchareacalledprivacy-reservingdatamining(PPDM)emergedin2000.TheinitialideaofPPDMwastoextendtraditionaldataminingtechniquestoworkwiththedatamodifiedtomasksensitiveinformation.Thekeyissueswerehowtomodifythedataandhowtorecoverthedataminingresultfromthemodifieddata.Thesolutionswereoftentightlycoupledwiththedataminingalgorithmsunderconsideration.Incontrast,privacy-preservingdatapublishing(PPDP)maynotnecessarilytietoaspecificdataminingtask,andthedataminingtaskissometimesunknownatthetimeofdatapublishing.Furthermore,somePPDPsolutionsemphasizepreservingthedatatruthfulnessattherecordlevel,butPPDMsolutionsoftendonotpreservesuchproperty.PPDPDiffersfromPPDMinSeveralMajorWaysasFollows:1)PPDPfocusesontechniquesforpublishingdata,nottechniquesfordatamining.Infact,itisexpectedthatstandarddataminingtechniquesareappliedonthepublisheddata.Incontrast,thedataholderinPPDMneedstorandomizethedatainsuchawaythatdataminingresultscanberecoveredfromtherandomizeddata.Todoso,thedataholdermustunderstandthedataminingtasksandalgorithmsinvolved.ThislevelofinvolvementisnotexpectedofthedataholderinPPDPwhousuallyisnotanexpertindatamining.2)Bothrandomizationandencryptiondonotpreservethetruthfulnessofvaluesattherecordlevel;therefore,thereleaseddataarebasicallymeaninglesstotherecipients.Insuchacase,thedataholderinPPDMmayconsiderreleasingthedataminingresultsratherthanthescrambleddata.3)PPDPprimarily“anonymizes”thedatabyhidingtheidentityofrecordowners,whereasPPDMseekstodirectlyhidethesensitivedata.ExcellentsurveysandbooksinrandomizationandcryptographictechniquesforPPDMcanbefoundintheexistingliterature.Afamilyofresearchworkcalledprivacy-preservingdistributeddatamining(PPDDM)aimsatperformingsomedataminingtaskonasetofprivatedatabasesownedbydifferentparties.ItfollowstheprincipleofSecureMultipartyComputation(SMC),andprohibitsanydatasharingotherthanthefinaldataminingresult.Cliftonetal.presentasuiteofSMCoperations,likesecuresum,securesetunion,securesizeofsetintersection,andscalarproduct,thatareusefulformanydataminingtasks.Incontrast,PPDPdoesnotperformtheactualdataminingtask,butconcernswithhowtopublishthedatasothattheanonymousdataareusefulfordatamining.WecansaythatPPDPprotectsprivacyatthedatalevelwhilePPDDMprotectsprivacyattheprocesslevel.Theyaddressdifferentprivacymodelsanddataminingscenarios.Inthefieldofstatisticaldisclosurecontrol(SDC),theresearchworksfocusonprivacy-preservingpublishingmethodsforstatisticaltables.SDCfocusesonthreetypesofdisclosures,namelyidentitydisclosure,attributedisclosure,andinferentialdisclosure.Identitydisclosureoccursifanadversarycanidentifyarespondentfromthepublisheddata.Revealingthatanindividualisarespondentofadatacollectionmayormaynotviolateconfidentialityrequirements.Attributedisclosureoccurswhenconfidentialinformationaboutarespondentisrevealedandcanbeattributedtotherespondent.Attributedisclosureistheprimaryconcernofmoststatisticalagenciesindecidingwhethertopublishtabulardata.Inferentialdisclosureoccurswhenindividualinformationcanbeinferredwithhighconfidencefromstatisticalinformationofthepublisheddata.SomeotherworksofSDCfocusonthestudyofthenon-interactivequerymodel,inwhichthedatarecipientscansubmitonequerytothesystem.Thistypeofnon-interactivequerymodelmaynotfullyaddresstheinformationneedsofdatarecipientsbecause,insomecases,itisverydifficultforadatarecipienttoaccuratelyconstructaqueryforadataminingtaskinoneshot.Consequently,thereareaseriesofstudiesontheinteractivequerymodel,inwhichthedatarecipients,includingadversaries,cansubmitasequenceofqueriesbasedonpreviouslyreceivedqueryresults.Thedatabaseserverisresponsibletokeeptrackofallqueriesofeachuseranddeterminewhetherornotthecurrentlyreceivedqueryhasviolatedtheprivacyrequirementwithrespecttoallpreviousqueries.Onelimitationofanyinteractiveprivacy-preservingquerysystemisthatitcanonlyanswerasublinearnumberofqueriesintotal;otherwise,anadversary(oragroupofcorrupteddatarecipients)willbeabletoreconstructallbut1.o(1)fractionoftheoriginaldata,whichisaverystrongviolationofprivacy.Whenthemaximumnumberofqueriesisreached,thequeryservicemustbeclosedtoavoidprivacyleak.Inthecaseofthenon-interactivequerymodel,theadversarycanissueonlyonequeryand,therefore,thenon-interactivequerymodelcannotachievethesamedegreeofprivacydefinedbyIntroductiontheinteractivemodel.Onemayconsiderthatprivacy-reservingdatapublishingisaspecialcaseofthenon-interactivequerymodel.Thispaperpresentsasurveyformostofthecommonattackstechniquesforanonymization-basedPPDM&PPDPandexplainstheireffectsonDataPrivacy.k-anonymityisusedforsecurityofrespondentsidentityanddecreaseslinkingattackinthecaseofhomogeneityattackasimplek-anonymitymodelfailsandweneedaconceptwhichpreventfromthisattacksolutionisl-diversity.Alltuplesarearrangedinwellrepresentedformandadversarywilldiverttolplacesoronlsensitiveattributes.l-diversitylimitsincaseofbackgroundknowledgeattackbecausenoonepredictsknowledgelevelofanadversary.Itisobservethatusinggeneralizationandsuppressionwealsoapplythesetechniquesonthoseattributeswhichdoesn’tneedthisextentofprivacyandthisleadstoreducetheprecisionofpublishingtable.e-NSTAM(extendedSensitiveTuplesAnonymityMethod)isappliedonsensitivetuplesonlyandreducesinformationloss,thismethodalsofailsinthecaseofmultiplesensitivetuples.Generalizationwithsuppressionisalsothecausesofdatalosebecausesuppressionemphasizeonnotreleasingvalueswhicharenotsuitedforkfactor.Futureworksinthisfrontcanincludedefininganewprivacymeasurealongwithl-diversityformultiplesensitiveattributeandwewillfocustogeneralizeattributeswithoutsuppressionusingothertechniqueswhichareusedtoachievek-anonymitybecausesuppressionleadstoreducetheprecisionofpublishingtable.

译文:数据挖掘和数据发布数据挖掘中提取出大量有趣的模式从大量的数据或知识。数据挖掘隐私保护PPDM的最初的想法是将传统的数据挖掘技术扩展到处理数据修改为屏蔽敏感信息。关键问题是如何修改数据以及如何从修改后的数据恢复数据挖掘的结果。隐私保护数据挖掘认为机密数据上运行数据挖掘算法的问题不应该透露方运行算法。相比之下,隐私保护数据发布(PPDP)不一定是绑定到一个特定的数据挖掘任务,和数据挖掘任务时可能是未知的数据发布。PPDP研究如何将原始数据转换成一个版本接种隐私攻击,但仍然支持有效的数据挖掘任务。隐私保护数据挖掘(PPDM)和数据发布(PPDP)已成为越来越受欢迎,因为它允许共享隐私的敏感数据进行分析的目的。深入研究方法之一是k-anonymity匿名模型进而导致信心边界等模型,l-diversity,t-closeness,(α,k)-anonymity,等。特别是,所有已知的机制,尽量减少信息损失,试图提供一个漏洞攻击。本文的目的是提出一项调查最常见的攻击技术即PPDM&PPDP和解释它们对数据隐私的影响。尽管数据挖掘可能是有用的,很多数据持有者不愿提供他们的数据对数据挖掘的恐惧侵犯个人隐私。近年来,研究了以确保个人敏感信息不能轻易识别。匿名模型(k-匿名)技术一直是研究的焦点,在过去的几年里。为了确保匿名数据的同时尽量减少所造成的信息损失数据的修改,提出了几个扩展模型,讨论如下。1.k-匿名模型k-anonymity最经典模型之一,加入的攻击技术,防止泛化和/或抑制微数据发布的一部分,这样任何个人可以独特区别一群大小k。k-anonymous表,一个数据集是k-anonymous(k≥1)如果每个记录的数据集——至少(k区分开来)其他相同的数据集内的记录。k值越大,更好的隐私保护。英蒂k-anonymity可以确保——viduals不能唯一标识链接攻击。2.扩展模型因为k-anonymity不提供足够的保护属性披露。l-diversity的概念试图解决这个问题,要求每个等价类至少l上流每个敏感属性值。比k-anonymityl-diversity技术有一定的优势。因为k-anonymity数据集允许强大的攻击由于缺乏多样性的敏感属性。在这个模型中,一个等价类据说l-diversity如果至少有l上流的敏感属性的值。因为有语义属性值之间的关系,以及不同价值观有不同水平的敏感性。anonymization之后,在任何等价类,一个敏感的频率(分数)值不超过α。3.相关研究领域一些民意调查显示,公众有——有折痕的隐私的失落感。由于数据挖掘通常是信息系统的一个关键组成部分,国土安全系统,以及监测和监测系统,它给了一个错误的印象,荷兰国际集团数据隐私入侵的技术。这种缺乏信任已经成为障碍的技术中获益。例如,潜在的有益的数据挖掘,搜索项目,恐怖主义信息意识(TIA),是由美国国会终止由于其争议的程序收集、分享和分析个人留下的痕迹。出于隐私问题的数据挖掘工具,一个叫隐私保护的数据挖掘研究领域(PPDM)出现在2000年。PPDM的最初的想法是将传统的数据挖掘技术扩展到处理数据修改为屏蔽敏感信息。关键问题是如何修改数据以及如何从修改后的数据恢复数据挖掘的结果。这些解决方案通常与数据挖掘算法在考虑紧密耦合。相比之下,隐私保护数据发布(PPDP)不一定绑到一个特定的数据挖掘任务,和数据挖掘任务有时是未知的数据发布的时候。此外,一些PPDP解决方案强调保存数据记录级别的真实性,但是PPDM解决方案通常不保留这样的财产。PPDP有别于PPDM在几个主要方面如下:1)PPDP关注技术发布数据,数据挖掘技术。事实上,它预计,标准的数据挖掘技术应用于分析数据。相反,数据持有人在PPDM需要随机数据的方式,数据挖掘结果可以从随机数据中恢复过来。为此,持有人必须了解数据挖掘任务的数据和算法。这种级别的预计数据持有人参与PPDP通常不是一个数据挖掘专家。2)随机化和加密不保存记录的真实值水平;因此,公布的数据基本上是毫无意义的决策。在这种情况下,数据持有人PPDM

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论