




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
TrustworthyMachineLearning
Lecturer:JingfengZhang
RIKEN-AIP
Homepage:https://zjfheart.github.io
2
ImageClassification
ReinforcementLearning
Machinelearning(ML)modelsexceedhumanabilityinmanytasks.
Classificationresults(fromIBMcloud):
crowd,people,Demontration,person,alizarineredcolor
Naturallanguageprocessing
Machinetranslation
Alexa,ordermealargepizza!
MLmodelsarealsoinhigh-stakeapplications.
Educationassessment
Self-drivingcars
Credit
Healthcare
Criminaljustice
Roboticsurgery
MLmodelsneedTRUST!
Contentrecommendations
Whatis“trust”inML?
Security
Fairness
Privacy
Interpretability
5
Anexampleadversarialattacks!
Naturaldata
Adversarialdata
AImakesthepigflyinghigh!
group.
Theimages&theamusementscomefromAleksanderMadry’s
ExamplesadversarialattacksposethreattoAI’sdeployment.
Glasses
Injectahiddenvoice
command
[CarliniWagner2018]
[SharifBhagavatulaBauerReiter2016]
AddHuman-
imperceptiblenoises
Smallstickers
[MopuriGaneshanBabu2018]
[EykholtEvtimovFernandesLiRahmatiXiaoPrakashKohnoSong2018]
zjfTHREAT!
6
Amodel
Inferencephase
Trainingphase
MLpipeline
Trainingset
(inputdata,
labels)
Predictions
(labels)
Modelset
Testdata(inputdata)
Dogorcat?(labels)
Inferencephase
Trainingphase
MLfordogandcatclassification
Welabeledthemasdog!
Welabeledthemascat!
Images+labels(dogandcats)
Anetwork
Neuralnetworks
Testdata(inputdata)
Predictions
(labels)
Inferencephase
Trainingphase
Security:(Evasion)adversarialattackhappensatinferencephase
Trainingset
(inputdata,
labels)
Amodel
Modelset
Testdata(inputdata)
Adversarialattackeraddssmall(human-imperceptible)noisetotestinputdata,whichfoolsthemodeltomakewrongpredictions!
Bad
Theadversarialattackisagainstthemodel’swillonthepurpose!Butwhatismodel’swill?
Letususefunctionftodenotemodel.
•Whatismodel’swill?Correctlylabelthetestinputdata,i.e.,f()=“dog”.
•Then,themodel’swillistominimizethe0-1lossℓ(f(),“dog”).
Lossvalueofmodelpredicting“dog”.
predictions
Good
predictions
Theinputxtothemodel
log-lossandexp-losscanbedifferentiable!
•InML,weusuallyusethesmoothedlossfunction,i.e.,ℓf,y,toupperboundthe0-1loss.Forexample,
Lp-normboundedadversarialattacker:maximizethemodelloss!
AttackerObjective:
=argmaXeBe(xi)ℓf,y
FindanadversarialdatawithintheLpnormballBexofnaturaldataxthatmaximizesthelossℓ(f(),y)withinthenormballconstrainte.
ATypicalMethod:
Projectedgradientdescent(PGD)–givenastartingpointX(0)andstepsizea,PGDworksasfollowed:
X(t+1)=ΠBx%Xt+asign∇xtℓfeXt,y,teN
ΠBx!projectsadversarialdataxtbackontothenormballifxtexceedsthenormballboundary;aisasmallstepsize;tissearchingstepnumbers.
Imagesmodifiedfrom/know-your-enemy-7f7c5038bdf3
1PGDmethod
X1
Randomsampledirection1
Randomsampledirection2
PGDmethod
2
X2
11
Differenttypesofadversarialattacks
•Humanimperceptableattacks,
e.g.,attackersusenormboundtomeasure
imperceptabiltysuchasL6,L2norm,
Wassersteinnorm.
Imagetakenfrom
/breaking-neural-networks-with-adversarial-attacks-f4290a9a45aa
•Patch-basedattacks.e.g.,L8norm.
Imagetaken
from/~sbhagava/
papers/face-rec-ccs16.pdfImagetakenfrom
/pdf/1712.09665.pdf
Others,suchasrotationattacks,out-of-distributionsattacks,etc
Whatifattackerisnotallowedtoaccessmodel’sparameter?
•Black-boxattacker:querythemodel’spredictionsonly.
•Grey-boxattacker:Knowsometrainingdata.Trainasubstitutemodel.
Performthetransfer-basedattacks.
Reading:
Papernotetal.,PracticalBlack-BoxAttacksagainstMachineLearning.
Defensemethodsagainstadversarialattacks?
Trainingset
(inputdata,
labels)
Predictions
(labels)
Amodel
Trainingphase
Inferencephase
Modelset
Attack-awaretrainingprocess
suchasadversarialtraining,
randomsmoothing,etc
Testdata(inputdata)
Pruningthemodels
Attack-awarepredictionsuch
asdetection,noise
prurification,rejection,etc.
Onedefenseexample:adversarialtraining(AT)
Giventheknowledgethatthetestdatamaybeadversarial,
ATcarefullysimulatessomeadversarialattacksduringtraining.
Thus,themodelhasalreadyseenmanyadversarialtrainingdatainthepast,
andhopefullyitcangeneralizetoadversarialtestdatainthefuture.
Standard
trainingAdversarialtraining
Decision
boundary
MinimizingR"ob
data
Imagescomesfromthepaper“Attackswhichdonotkilltrainingmakeadversariallearningstronger”.
AT’sPurpose1:correctlyclassifythedata.
15
AT’sPurpose2:makethedecisionboundarythicksothatnodatalienearbythedecisionboundary.
Reading:Zhangetal.,
Attackswhichdonotkilltrainingmakeadversariallearningstronger
.
AT’sbasicformulationsandthecorespondingAT’simporvements
Minimaxformulation:
n∑1ℓ(f(i),yi),wherei=argmaxxeBe(xi)ℓ(f(),yi)
OuterminimizationInnermaximization[MadryKakelovSchmidtTsiprasVladu2019]
AT’simprovements/modifications,intriguingfindings&interestingapplications
1Collecting/generatingmore/smartertrainingdata
2Simulatingsmarterattacks
3Designingsmarterlearningobjective
4Designing/learningsmarternetworkstructures
5Leveragingsmartertricks
6Discoveringsomeintriguingfindings
7Developingsomeapplications
8Otherdirectionssuchassmarterattacks,detections.
Refertoavideo:/watch?v=3Z8bUgn41Fk
Thestatisticcomesfrom
16
Trainingset
(inputdata,
labels)
Predictions
(labels)
Inferencephase
Trainingphase
Security:(Poisoning)attackhappensattrainingphase
Amodel
Modelset
Testdata(inputdata)
Adversarialattackeraddssmall(human-impercetabeorhuman-perceptible)noisetotrainingdata,whichfoolsthetrainingphasetogeneratethe“bad”model!
Theattackerisagainstthelearning’swillonthepurpose.
•Inthepreviousslides,themodelisdenotedasafunctionf:x→y.
•Similarly,thelearningisalsodenotedasfunctionA:D→f,inwhichDisatrainingdataset,andfisamodel.
•Whatisthelearning’swill?Usually,returnagoodmodelthathassmallnaturalgeneralizationloss,i.e.,Εx~D[ℓfx,y].
•Sometimes,italsoneedsadifferentwillsmallrobustgeneralizationloss(forsecuritypurpose),i.e.,Εx~D[maxeBe(x)ℓf,y],whereBeisenormball.
Reading:Biggioetal.,Supportvectormachinesunderadversariallabelnoise.Zhaoetal.,Efficientlabelcontaminationattacksagainstblack-boxlearningmodels
Label-noisy
robusttraining
research
Faciliate
Aftertraining
ImagescomefromGuetal.,BadNets:IdentifyingVulnerabilitiesintheMachineLearningModelSupplyChain
Clean-labeltargetedattack
Aftertraining
Whatcanthepoisoningattackerdo?
Flippinglabels
(naturallynoisylabels,
orflipping-label
attacks)
Backdoor
Trigger
Perturbinginputdata
Oh!Thisisadog!(wrongprediction)
ImagescomefromGeipingetal.,WITCHES’BREW:INDUSTRIALSCALEDATAPOISON-INGVIAGRADIENTMATCHING
Onepoisoningexampleclean-labeltargetedattack
•Attackingalearningalgorithmismorechallenging!
Itisnotjustfoolingasinglemodel(suchasadversarialattack),butfoolingaseriesofmodelsinthelearningsequences.
ThelearningalgorithmAconvergestoabadmodelregion!
TheimagecomesfromHuangetal,MetaPoison:PracticalGeneral-purposeClean-labelDataPoisoning.
•Whatisclean-labeltargetedattack?
1poisoneddata(e.g.,images)appeartobeunmodifiedandlabeledcorrectly.
withoutaffectingbehavioronotherinputs,
2Theperturbedimagesoftenaffectclassifierbehavioronaspecifictargetinstance(xta8)ofalearnedmodel,
3Theclean-labelattacksareinsidiouslyhardtodetect.
clean-labeltargetedattack
•Performingpoisoningattackhastounrollthewholetrainingprocess(constrainedbileveloptimization),whichiscomputationallyintractableandcostly!
•Thenhow?Justuseasinglemodel(apretrainedfeatureextractor)topresentall!
•Featurecollision:xpoi=argminx[fx−fxtar2+px−xnat2],wherexpoiisgeneratedpoisoneddata,xtarisaspecifictargetinstanceinthetestdataset,xnatisoriginalbenigndata.Shafahietal.Poisonfrogs!targetedclean-labelpoisoningattacksonneuralnetworks
•Gradientalignment(WitchesBrew):Matchinggradientsbetweenpoisoneddataandtargetdata.xpoi=argminxp#ieB(xnat)ML[∇eLfxtar,yadv,∇eLfxpoi,ytrue],whereMLissimilarityloss,suchascosinesimilaritya,b=;yadvisattacker-chosenlabel(wrong).Geipingetal.,WITCHES’BREW:INDUSTRIALSCALEDATAPOISON-INGVIAGRADIENTMATCHING
Reading:Goldblumetal,DatasetSecurityforMachineLearning:DataPoisoning,BackdoorAttacks,andDefenses
Modelpruning
Predictions
(labels)
Inferencephase
Trainingphase
Antibackdoortraining,adversarialtraining,etc
Data
preprocessing
Modelset
Defenseagainstpoisoningattacks
Trainingset
(inputdata,
labels)
Amodel
Testdata(inputdata)
Privacy
Twodifferentnotionsofprivacy.
•Protectdataprivacyfrommachine.Howtoachievethis?Datapoisoning!
Reading:Zhiqietal.Human-imperceptibleprivacyprotectionagainstmachines,ACMMM19bestpaperaward
Huangetal.Unlearnableexamples:Makingpersonaldataunexploitable,ICLR21Spotlight•Protectdataprivacyfrompeople.Howtoachievethis?
Ahead-scratchingquestionnaire!
•Supposeyouwanttocollectanswersofaveryembarrisingquestion,
forexample,whetheryouconductimproperbehaviorsonthetraininthepastthreemonths.(Yes/No)
How?
Thisquestionisimportantonthepopulationlevel,butveryembarrisingontheindividuallevel.Therefore,peopletendtolieinthisquestion.
WhatcanIdotoget
thetruestatistics?
Weneedaprivatelearningprocess!
•Weintroducerandomness,i.e.,plausibledeniabilityforeachindividual.
•Step1:Thesubjectindividualflipsacointwice.
Blindto
•Step2:
a.Iffirstcoinwastail,reporttrueanswer.
b.reportYES,ifsecondcoinheads;reportNO,ifsecondcointails.
WecollectNsamples,inwhichNyesandNno=N−Nyes.
WewanttocaculatethetrueestimatedportionPofpeoplewhoconductimproperbehaviors.How?
Peoplewhodonotcommitcrime(1-P)have1/4chancestoreport“Yes”,i.e.,(1−P).
Answer:
P+(1−P)=
Differentialprivacy
WecollectNsamples,inwhichNyes;Nno=1−Nyes.
WewanttocaculatethetrueestimatedportionPofpeopleconductingimproperbehaviors.How?
First\second
Head
Tail
Tail
Trueanswers
Trueanswers
Head
Yes
No
Head
Tail
Tail
True
True
Head
Yes
No
Peoplewhotrulycommitcrime(P)have3/4chancestoreport“Yes”,i.e.,P.
Head
Tail
Tail
True
True
Head
Yes
No
Whatisdifferentiallyprivatealgorithm?arandomizedalgorithm.
D
Thisalgorithmreturnstheanswerwithaprobablity!
M
D′
M
PrMD≤eePr[MDW]DandD5hasonlyonerecorddifferent!
Reading:DworkandRoth,TheAlgorithmicFoundationsofDifferentialPrivacy.
Amodel
Inferencephase
Trainingphase
ExamplesofprivacyattacksinML
Trainingset
(inputdata,
labels)
Predictions
(labels)
Modelset
Testdata(inputdata)
Modelinverisionattack:Givenatrainedmodel,recovertheprivatedatasetusedtotrainthemodel.
Fredriksonetal.Modelinversionattacksthatexploitconfidenceinformationandbasiccountermeasures
Membershipinferenceattack:Givenatrainedmodel,detectwhetherthedataisusedtotrainthemodel.
Rezaetal.,MembershipInferenceAttacksagainstMachineLearningModels
Fairnessvariousdescriptions
•Proportionalfairness:Yougetwhatyoudeserve.Reading:Zhangetal.
Hierarchicallyfairfederatedlearning
,atechreport.
Amodelmayhavebiastowardssensitiveattributes,suchasgender,race,religion.
•Individualfairness:Twosimilarindividualsshouldbeclassifiedsimilarly.
•Groupfairness:Model’soutcomeshouldbethesameacrossdifferentgroups.Forexample,thereexistsdemographicparity:P(guilty|black)≠P(guilty|white).
Reading:1
Dworketal.,FairnessThroughAwareness.
2Barocasetal,FairnessandMachineLearning:limitationsandopportunities,
COMPAS
softwareusedinUScourts
Predictions
(labels)
Inferencephase
Trainingphase
Amodel
Interpretabilty—howtoexplainaMLmodeltohuman
Trainingset
(inputdata,
labels)
Howdoesthiswork?
Modelset
Testdata(inputdata)
Whatisinterpretabilty?Understandhowthemodelworkstowardsatask.
Mobilehome(incorretprediction)
Palace
(incorrectprediction)
Interpretabiltytwoexampledescriptions
Howcertainattributesinfluencethepredictions?(saliencymaps)
TestinputAttentionmap
Howcertaintrainingexamplesinfluencethepredictions?(prototype)
TestinputMostinfluentialtrainingimages
MachineLearning
Reinforcement
Thislecture’sscope
Supervisedlearning
Un-orSemi-
sepervised
learning
learnin
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 地下排水沟盖板施工方案
- 河南标准贝雷片施工方案
- 电视剧本改编权合同6篇
- 第2章构建三维模型2.1三维建模基础-高中教学同步《信息技术人工-三维设计与创意》教学设计(人教-中图版2019)
- 镇江防静电地坪施工方案
- 2025至2031年中国石英卤钨柱型灯泡行业投资前景及策略咨询研究报告
- 相城区基坑围护施工方案
- 2025至2031年中国欧化厨具行业投资前景及策略咨询研究报告
- 2025至2031年中国机械单门保险箱行业投资前景及策略咨询研究报告
- 2025至2031年中国全自动连续双温开水机行业投资前景及策略咨询研究报告
- 二年级乘除法口诀专项练习1000题-推荐
- 贷款项目资金平衡表
- 高标准农田建设项目监理日志
- [整理]10kv开关站标准设计说明(最终版)
- 分级诊疗制度管理办法
- 义务教育语文课程标准2022年版
- 公务员入职登记表
- 九年级新目标英语单词表默写最新版
- 临水临电计算公式案例
- 2022新教科版六年级科学下册第二单元《生物的多样性》全部教案(共7节)
- PEP人教版小学英语单词四年级上册卡片(可直接打印)
评论
0/150
提交评论