可信机器学习报告_第1页
可信机器学习报告_第2页
可信机器学习报告_第3页
可信机器学习报告_第4页
可信机器学习报告_第5页
已阅读5页,还剩59页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

TrustworthyMachineLearning

Lecturer:JingfengZhang

RIKEN-AIP

Homepage:https://zjfheart.github.io

2

ImageClassification

ReinforcementLearning

Machinelearning(ML)modelsexceedhumanabilityinmanytasks.

Classificationresults(fromIBMcloud):

crowd,people,Demontration,person,alizarineredcolor

Naturallanguageprocessing

Machinetranslation

Alexa,ordermealargepizza!

MLmodelsarealsoinhigh-stakeapplications.

Educationassessment

Self-drivingcars

Credit

Healthcare

Criminaljustice

Roboticsurgery

MLmodelsneedTRUST!

Contentrecommendations

Whatis“trust”inML?

Security

Fairness

Privacy

Interpretability

5

Anexampleadversarialattacks!

Naturaldata

Adversarialdata

AImakesthepigflyinghigh!

group.

Theimages&theamusementscomefromAleksanderMadry’s

ExamplesadversarialattacksposethreattoAI’sdeployment.

Glasses

Injectahiddenvoice

command

[CarliniWagner2018]

[SharifBhagavatulaBauerReiter2016]

AddHuman-

imperceptiblenoises

Smallstickers

[MopuriGaneshanBabu2018]

[EykholtEvtimovFernandesLiRahmatiXiaoPrakashKohnoSong2018]

zjfTHREAT!

6

Amodel

Inferencephase

Trainingphase

MLpipeline

Trainingset

(inputdata,

labels)

Predictions

(labels)

Modelset

Testdata(inputdata)

Dogorcat?(labels)

Inferencephase

Trainingphase

MLfordogandcatclassification

Welabeledthemasdog!

Welabeledthemascat!

Images+labels(dogandcats)

Anetwork

Neuralnetworks

Testdata(inputdata)

Predictions

(labels)

Inferencephase

Trainingphase

Security:(Evasion)adversarialattackhappensatinferencephase

Trainingset

(inputdata,

labels)

Amodel

Modelset

Testdata(inputdata)

Adversarialattackeraddssmall(human-imperceptible)noisetotestinputdata,whichfoolsthemodeltomakewrongpredictions!

Bad

Theadversarialattackisagainstthemodel’swillonthepurpose!Butwhatismodel’swill?

Letususefunctionftodenotemodel.

•Whatismodel’swill?Correctlylabelthetestinputdata,i.e.,f()=“dog”.

•Then,themodel’swillistominimizethe0-1lossℓ(f(),“dog”).

Lossvalueofmodelpredicting“dog”.

predictions

Good

predictions

Theinputxtothemodel

log-lossandexp-losscanbedifferentiable!

•InML,weusuallyusethesmoothedlossfunction,i.e.,ℓf,y,toupperboundthe0-1loss.Forexample,

Lp-normboundedadversarialattacker:maximizethemodelloss!

AttackerObjective:

=argmaXeBe(xi)ℓf,y

FindanadversarialdatawithintheLpnormballBexofnaturaldataxthatmaximizesthelossℓ(f(),y)withinthenormballconstrainte.

ATypicalMethod:

Projectedgradientdescent(PGD)–givenastartingpointX(0)andstepsizea,PGDworksasfollowed:

X(t+1)=ΠBx%Xt+asign∇xtℓfeXt,y,teN

ΠBx!projectsadversarialdataxtbackontothenormballifxtexceedsthenormballboundary;aisasmallstepsize;tissearchingstepnumbers.

Imagesmodifiedfrom/know-your-enemy-7f7c5038bdf3

1PGDmethod

X1

Randomsampledirection1

Randomsampledirection2

PGDmethod

2

X2

11

Differenttypesofadversarialattacks

•Humanimperceptableattacks,

e.g.,attackersusenormboundtomeasure

imperceptabiltysuchasL6,L2norm,

Wassersteinnorm.

Imagetakenfrom

/breaking-neural-networks-with-adversarial-attacks-f4290a9a45aa

•Patch-basedattacks.e.g.,L8norm.

Imagetaken

from/~sbhagava/

papers/face-rec-ccs16.pdfImagetakenfrom

/pdf/1712.09665.pdf

Others,suchasrotationattacks,out-of-distributionsattacks,etc

Whatifattackerisnotallowedtoaccessmodel’sparameter?

•Black-boxattacker:querythemodel’spredictionsonly.

•Grey-boxattacker:Knowsometrainingdata.Trainasubstitutemodel.

Performthetransfer-basedattacks.

Reading:

Papernotetal.,PracticalBlack-BoxAttacksagainstMachineLearning.

Defensemethodsagainstadversarialattacks?

Trainingset

(inputdata,

labels)

Predictions

(labels)

Amodel

Trainingphase

Inferencephase

Modelset

Attack-awaretrainingprocess

suchasadversarialtraining,

randomsmoothing,etc

Testdata(inputdata)

Pruningthemodels

Attack-awarepredictionsuch

asdetection,noise

prurification,rejection,etc.

Onedefenseexample:adversarialtraining(AT)

Giventheknowledgethatthetestdatamaybeadversarial,

ATcarefullysimulatessomeadversarialattacksduringtraining.

Thus,themodelhasalreadyseenmanyadversarialtrainingdatainthepast,

andhopefullyitcangeneralizetoadversarialtestdatainthefuture.

Standard

trainingAdversarialtraining

Decision

boundary

MinimizingR"ob

data

Imagescomesfromthepaper“Attackswhichdonotkilltrainingmakeadversariallearningstronger”.

AT’sPurpose1:correctlyclassifythedata.

15

AT’sPurpose2:makethedecisionboundarythicksothatnodatalienearbythedecisionboundary.

Reading:Zhangetal.,

Attackswhichdonotkilltrainingmakeadversariallearningstronger

.

AT’sbasicformulationsandthecorespondingAT’simporvements

Minimaxformulation:

n∑1ℓ(f(i),yi),wherei=argmaxxeBe(xi)ℓ(f(),yi)

OuterminimizationInnermaximization[MadryKakelovSchmidtTsiprasVladu2019]

AT’simprovements/modifications,intriguingfindings&interestingapplications

1Collecting/generatingmore/smartertrainingdata

2Simulatingsmarterattacks

3Designingsmarterlearningobjective

4Designing/learningsmarternetworkstructures

5Leveragingsmartertricks

6Discoveringsomeintriguingfindings

7Developingsomeapplications

8Otherdirectionssuchassmarterattacks,detections.

Refertoavideo:/watch?v=3Z8bUgn41Fk

Thestatisticcomesfrom

16

Trainingset

(inputdata,

labels)

Predictions

(labels)

Inferencephase

Trainingphase

Security:(Poisoning)attackhappensattrainingphase

Amodel

Modelset

Testdata(inputdata)

Adversarialattackeraddssmall(human-impercetabeorhuman-perceptible)noisetotrainingdata,whichfoolsthetrainingphasetogeneratethe“bad”model!

Theattackerisagainstthelearning’swillonthepurpose.

•Inthepreviousslides,themodelisdenotedasafunctionf:x→y.

•Similarly,thelearningisalsodenotedasfunctionA:D→f,inwhichDisatrainingdataset,andfisamodel.

•Whatisthelearning’swill?Usually,returnagoodmodelthathassmallnaturalgeneralizationloss,i.e.,Εx~D[ℓfx,y].

•Sometimes,italsoneedsadifferentwillsmallrobustgeneralizationloss(forsecuritypurpose),i.e.,Εx~D[maxeBe(x)ℓf,y],whereBeisenormball.

Reading:Biggioetal.,Supportvectormachinesunderadversariallabelnoise.Zhaoetal.,Efficientlabelcontaminationattacksagainstblack-boxlearningmodels

Label-noisy

robusttraining

research

Faciliate

Aftertraining

ImagescomefromGuetal.,BadNets:IdentifyingVulnerabilitiesintheMachineLearningModelSupplyChain

Clean-labeltargetedattack

Aftertraining

Whatcanthepoisoningattackerdo?

Flippinglabels

(naturallynoisylabels,

orflipping-label

attacks)

Backdoor

Trigger

Perturbinginputdata

Oh!Thisisadog!(wrongprediction)

ImagescomefromGeipingetal.,WITCHES’BREW:INDUSTRIALSCALEDATAPOISON-INGVIAGRADIENTMATCHING

Onepoisoningexampleclean-labeltargetedattack

•Attackingalearningalgorithmismorechallenging!

Itisnotjustfoolingasinglemodel(suchasadversarialattack),butfoolingaseriesofmodelsinthelearningsequences.

ThelearningalgorithmAconvergestoabadmodelregion!

TheimagecomesfromHuangetal,MetaPoison:PracticalGeneral-purposeClean-labelDataPoisoning.

•Whatisclean-labeltargetedattack?

1poisoneddata(e.g.,images)appeartobeunmodifiedandlabeledcorrectly.

withoutaffectingbehavioronotherinputs,

2Theperturbedimagesoftenaffectclassifierbehavioronaspecifictargetinstance(xta8)ofalearnedmodel,

3Theclean-labelattacksareinsidiouslyhardtodetect.

clean-labeltargetedattack

•Performingpoisoningattackhastounrollthewholetrainingprocess(constrainedbileveloptimization),whichiscomputationallyintractableandcostly!

•Thenhow?Justuseasinglemodel(apretrainedfeatureextractor)topresentall!

•Featurecollision:xpoi=argminx[fx−fxtar2+px−xnat2],wherexpoiisgeneratedpoisoneddata,xtarisaspecifictargetinstanceinthetestdataset,xnatisoriginalbenigndata.Shafahietal.Poisonfrogs!targetedclean-labelpoisoningattacksonneuralnetworks

•Gradientalignment(WitchesBrew):Matchinggradientsbetweenpoisoneddataandtargetdata.xpoi=argminxp#ieB(xnat)ML[∇eLfxtar,yadv,∇eLfxpoi,ytrue],whereMLissimilarityloss,suchascosinesimilaritya,b=;yadvisattacker-chosenlabel(wrong).Geipingetal.,WITCHES’BREW:INDUSTRIALSCALEDATAPOISON-INGVIAGRADIENTMATCHING

Reading:Goldblumetal,DatasetSecurityforMachineLearning:DataPoisoning,BackdoorAttacks,andDefenses

Modelpruning

Predictions

(labels)

Inferencephase

Trainingphase

Antibackdoortraining,adversarialtraining,etc

Data

preprocessing

Modelset

Defenseagainstpoisoningattacks

Trainingset

(inputdata,

labels)

Amodel

Testdata(inputdata)

Privacy

Twodifferentnotionsofprivacy.

•Protectdataprivacyfrommachine.Howtoachievethis?Datapoisoning!

Reading:Zhiqietal.Human-imperceptibleprivacyprotectionagainstmachines,ACMMM19bestpaperaward

Huangetal.Unlearnableexamples:Makingpersonaldataunexploitable,ICLR21Spotlight•Protectdataprivacyfrompeople.Howtoachievethis?

Ahead-scratchingquestionnaire!

•Supposeyouwanttocollectanswersofaveryembarrisingquestion,

forexample,whetheryouconductimproperbehaviorsonthetraininthepastthreemonths.(Yes/No)

How?

Thisquestionisimportantonthepopulationlevel,butveryembarrisingontheindividuallevel.Therefore,peopletendtolieinthisquestion.

WhatcanIdotoget

thetruestatistics?

Weneedaprivatelearningprocess!

•Weintroducerandomness,i.e.,plausibledeniabilityforeachindividual.

•Step1:Thesubjectindividualflipsacointwice.

Blindto

•Step2:

a.Iffirstcoinwastail,reporttrueanswer.

b.reportYES,ifsecondcoinheads;reportNO,ifsecondcointails.

WecollectNsamples,inwhichNyesandNno=N−Nyes.

WewanttocaculatethetrueestimatedportionPofpeoplewhoconductimproperbehaviors.How?

Peoplewhodonotcommitcrime(1-P)have1/4chancestoreport“Yes”,i.e.,(1−P).

Answer:

P+(1−P)=

Differentialprivacy

WecollectNsamples,inwhichNyes;Nno=1−Nyes.

WewanttocaculatethetrueestimatedportionPofpeopleconductingimproperbehaviors.How?

First\second

Head

Tail

Tail

Trueanswers

Trueanswers

Head

Yes

No

Head

Tail

Tail

True

True

Head

Yes

No

Peoplewhotrulycommitcrime(P)have3/4chancestoreport“Yes”,i.e.,P.

Head

Tail

Tail

True

True

Head

Yes

No

Whatisdifferentiallyprivatealgorithm?arandomizedalgorithm.

D

Thisalgorithmreturnstheanswerwithaprobablity!

M

D′

M

PrMD≤eePr[MDW]DandD5hasonlyonerecorddifferent!

Reading:DworkandRoth,TheAlgorithmicFoundationsofDifferentialPrivacy.

Amodel

Inferencephase

Trainingphase

ExamplesofprivacyattacksinML

Trainingset

(inputdata,

labels)

Predictions

(labels)

Modelset

Testdata(inputdata)

Modelinverisionattack:Givenatrainedmodel,recovertheprivatedatasetusedtotrainthemodel.

Fredriksonetal.Modelinversionattacksthatexploitconfidenceinformationandbasiccountermeasures

Membershipinferenceattack:Givenatrainedmodel,detectwhetherthedataisusedtotrainthemodel.

Rezaetal.,MembershipInferenceAttacksagainstMachineLearningModels

Fairnessvariousdescriptions

•Proportionalfairness:Yougetwhatyoudeserve.Reading:Zhangetal.

Hierarchicallyfairfederatedlearning

,atechreport.

Amodelmayhavebiastowardssensitiveattributes,suchasgender,race,religion.

•Individualfairness:Twosimilarindividualsshouldbeclassifiedsimilarly.

•Groupfairness:Model’soutcomeshouldbethesameacrossdifferentgroups.Forexample,thereexistsdemographicparity:P(guilty|black)≠P(guilty|white).

Reading:1

Dworketal.,FairnessThroughAwareness.

2Barocasetal,FairnessandMachineLearning:limitationsandopportunities,

COMPAS

softwareusedinUScourts

Predictions

(labels)

Inferencephase

Trainingphase

Amodel

Interpretabilty—howtoexplainaMLmodeltohuman

Trainingset

(inputdata,

labels)

Howdoesthiswork?

Modelset

Testdata(inputdata)

Whatisinterpretabilty?Understandhowthemodelworkstowardsatask.

Mobilehome(incorretprediction)

Palace

(incorrectprediction)

Interpretabiltytwoexampledescriptions

Howcertainattributesinfluencethepredictions?(saliencymaps)

TestinputAttentionmap

Howcertaintrainingexamplesinfluencethepredictions?(prototype)

TestinputMostinfluentialtrainingimages

MachineLearning

Reinforcement

Thislecture’sscope

Supervisedlearning

Un-orSemi-

sepervised

learning

learnin

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论