数据与模型安全 课件 第5周:对抗防御_第1页
数据与模型安全 课件 第5周:对抗防御_第2页
数据与模型安全 课件 第5周:对抗防御_第3页
数据与模型安全 课件 第5周:对抗防御_第4页
数据与模型安全 课件 第5周:对抗防御_第5页
已阅读5页,还剩60页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Adversarial

Defense姜育刚,马兴军,吴祖煊Recap:

week

4

1.

Adversarial

Example

DetectionSecondary

Classification

Methods

(二级分类法)Principle

Component

Analysis

(主成分分析法,PCA)Distribution

Detection

Methods

(分布检测法)Prediction

Inconsistency

(预测不一致性)Reconstruction

Inconsistency

(重建不一致性)Trapping

Based

Detection

(诱捕检测法)Adversarial

Attack

CompetitionLink:https://codalab.lisn.upsaclay.fr/competitions/15669?secret_key=77cb8986-d5bd-4009-82f0-7dde2e819ff8

Adversarial

Defense

vs

DetectionThe

weird

relationship

between

defense

and

detectionDetection

IS

defenseBut…

when

we

say

defense,

we

(most

of

the

time)

mean

the

model

is

secured,

yet

detection

cannot

do

that…In

survey

papers:

detection

is

defenseIn

technical

papers:

defense

is

defense

not

detectionDifferencesDefense

is

to

secure

the

model

or

the

systemDetection

is

to

identify

potential

threats,

which

should

befollowed

by

a

defense

strategy,

e.g.,

query

rejection

(but

mostly

ignored)By

defense,

it

mostly

means

robust

training

methodsDefense

MethodsEarly

Defense

MethodsEarly

Adversarial

Training

MethodsLater

Adversarial

Training

MethodsRemaining

Challenges

and

Recent

ProgressA

Recap

of

the

Timeline2014年Goodfellow等人提出快速单步攻击FGSM及对抗训练2015年简单检测方法(PCA)和对抗训练方法2016年提出对抗训练的min-max优化框架2017年大量的对抗样本检测方法和攻击方法(BIM、C&W)、10种检测方法被攻破2019年TRADES及大量其他对抗训练方法、第一篇Science文章2018年物理世界攻击方法、检测方法升级、PGD攻击与对抗训练、9种防御方法被攻破2020年AutoAttack攻击、Fast对抗训练2021年增大模型、增加数据的对抗训练、领域延伸2022年尚未解决的问题,攻击越来越多,防御越来越难2013年Biggio等人与Szegedy等人发现对抗样本Principles

of

DefenseBlock

the

attack

(掐头去尾)Mask

the

input

gradientsRegularize

the

input

gradientsDistill

the

logitsDenoise

the

inputRobustify

the

model

(增强中间)Smooth

the

decision

boundaryReduce

the

Lipschitzness

of

the

modelSmooth

the

loss

landscapeAdversarial

AttackSzegedyC,ZarembaW,SutskeverI,etal.Intriguingpropertiesofneuralnetworks[J].ICLR

2014.GoodfellowIJ,ShlensJ,SzegedyC.Explainingandharnessingadversarialexamples[J].ICLR

2015.

模型训练:

对抗攻击:

分类错误扰动很小测试阶段攻击

Performance

Metrics

Othermetrics:maximumperturbationfor100%attacksuccessrate

Defense

MethodsEarly

Defense

MethodsEarly

Adversarial

Training

MethodsAdvanced

Adversarial

Training

MethodsRemaining

Challenges

and

Recent

ProgressDefensive

DistillationMaking

large

logits

change

to

be

”small”Scaling

up

logits

by

a

few

magnitudes;Retrain

the

lastlayerwith

scaled

logits;Papernot

et

al.

Distillationasadefensetoadversarialperturbationsagainstdeepneuralnetworks,

S&P,

2016Defensive

DistillationPapernot

et

al.

Distillationasadefensetoadversarialperturbationsagainstdeepneuralnetworks,

S&P,

2016

12Distillation

with

temperature

TDefensive

DistillationPapernot

et

al.

Distillationasadefensetoadversarialperturbationsagainstdeepneuralnetworks,

S&P,

2016Defensive

DistillationPapernot

et

al.

Distillationasadefensetoadversarialperturbationsagainstdeepneuralnetworks,

S&P,

2016Defensive

DistillationPapernot

et

al.

Distillationasadefensetoadversarialperturbationsagainstdeepneuralnetworks,

S&P,

2016

Defensive

Distillation

Is

Not

RobustCarlini,Nicholas,andDavidWagner."Defensivedistillationisnotrobusttoadversarialexamples."

arXivpreprintarXiv:1607.04311

(2016).

It

can

be

evaded

by

attacking

the

distilled

network

with

the

temperature

T.

Lessons

LearnedCarlini,Nicholas,andDavidWagner."Defensivedistillationisnotrobusttoadversarialexamples."

arXivpreprintarXiv:1607.04311

(2016).Distillation

is

not

a

good

solution

for

adversarial

robustnessVanishing

input

gradients

can

still

be

recovered

by

a

reverse

operationA

defense

should

beevaluated

against

the

adaptive

attack

to

prove

real

robustnessInput

Gradients

RegularizationRoss

et

al."Improvingtheadversarialrobustnessandinterpretabilityofdeepneuralnetworksbyregularizingtheirinputgradients."

AAAI,2018.Drucker,Harris,andYannLeCun.“Improvinggeneralizationperformanceusingdoublebackpropagation.”

TNN,1992.

Classification

lossInput

gradients

regularization

Related

to

the

double

backpropagation

proposed

by

DruckerandLeCun(1992):Input

Gradients

RegularizationRoss

et

al."Improvingtheadversarialrobustnessandinterpretabilityofdeepneuralnetworksbyregularizingtheirinputgradients."

AAAI,2018.Issues:

1)

limited

adversarial

robustness,

2)

hurts

learning蒸馏的对抗训练的正则化的Feature

SqueezingXu

et

al."Featuresqueezing:Detectingadversarialexamplesindeepneuralnetworks."

NDSS,2018.Compress

the

input

spaceIt

also

hurts

performance

on

large-scale

image

datasets.ThermometerEncodingBuckmanetal."Thermometerencoding:Onehotwaytoresistadversarialexamples."

ICLR,2018.Discretize

the

input

to

break

small

noiseProposed

Thermometer

EncodingInput

TransformationsGuoetal."CounteringAdversarialImagesusingInputTransformations."

ICLR,2018.ImagecroppingandrescalingBit-depthreductionJPEGcompressionTotal

variance

minimizationImagequiltingObfuscated

Gradients

=

Fake

RobustnessAthalyeetal.“ObfuscatedGradientsGiveaFalseSenseofSecurity:CircumventingDefensestoAdversarialExamples.”

ICML,2018.

Athalye

et

al.Synthesizingrobustadversarialexamples.ICML,2018.BackwardPassDifferentiableApproximation(BPDA):

Expectation

Over

Transformation

(EOT)T:

a

set

of

randomized

transformationscan

break

non-differentiable

operation

based

defensescan

break

randomization

based

defensesfind

a

linear

approximation

of

the

non-differentiable

operations,

e.g.,

discretization,

compression

etc.

BPDA+EOT

breaks

7

defenses

published

at

ICLR

2018Athalyeetal.“ObfuscatedGradientsGiveaFalseSenseofSecurity:CircumventingDefensestoAdversarialExamples.”

ICML,2018.

Athalye

et

al.Synthesizingrobustadversarialexamples.ICML,2018.We

got

a

survivor!How

to

Properly

Evaluate

a

Defense?Carlini,Nicholas,etal.“Onevaluatingadversarialrobustness.”

arXivpreprintarXiv:1902.06705

(2019).

Athalyeetal.“ObfuscatedGradientsGiveaFalseSenseofSecurity:CircumventingDefensestoAdversarialExamples.”

ICML,2018Donotblindlyapplymultiple(similar)attacksTryatleastonegradient-freeattackandonehard-labelattackPerformatransferabilityattackusingasimilarsubstitutemodel.Forrandomizeddefenses,properlyensembleoverrandomnessFornon-differentiablecomponents,applydifferentiabletechniques

(BPDA)VerifythattheattackshaveconvergedundertheselectedhyperparametersCarefullyinvestigateattackhyperparametersandreportthoseselectedCompareagainstpriorworkandexplainimportantdifferencesTestbroaderthreatmodelswhenproposinggeneraldefensesRobust

Activation

FunctionsXiao

et

al."EnhancingAdversarialDefensebyk-Winners-Take-All."

ICLR,2020.Block

the

internal

activation:

break

the

continuityk-Winners-Take-All(k-WTA)activationRobust

Loss

FunctionPang

et

al.Rethinkingsoftmaxcross-entropylossforadversarialrobustness.ICLR,2020.Max-Mahalanobiscenter(MMC)

lossMax-Mahalanobiscenter(MMC);

SCE:

softmax

cross

entropyRobust

InferencePang

et

al.MixupInference:Betterexploitingmixuptodefendadversarialattacks.ICLR,2020.Mixup

Inference

(MI)New

Adaptive

Attacks

Break

These

DefensesTrameretal.“Onadaptiveattackstoadversarialexampledefenses.”

NeurIPS,

2020.T1:

AttackthefulldefenseT2:

Target

importantdefensepartsT3:

SimplifytheattackT4:

EnsureconsistentlossfunctionT5:

OptimizewithdifferentmethodsT6:

UsestrongadaptiveattacksHow

to

Evaluate

aDefense?CroceandHein.“Reliableevaluationofadversarialrobustnesswithanensembleofdiverseparameter-freeattacks.”

ICML,2020.

Gao

et

al.

FastandReliableEvaluationofAdversarialRobustnesswithMinimum-MarginAttack,

ICML

2022.Zimmermann

etal."IncreasingConfidenceinAdversarialRobustnessEvaluations."

arXivpreprintarXiv:2206.13991

(2022).Strong

attacks:AutoAttack

(one

must-to-test

attack)Margin

Decomposition

(MD)

attack

(better

than

AutoAttack

on

ViT)Minimum-Margin

(MM)

attack

(new

SOTA

attack

to

test?)Extra

robustness

tests

Attackunittests

(Zimmermannetal,2022)Adversarial

TrainingGoodfellowIJ,ShlensJ,SzegedyC.Explainingandharnessingadversarialexamples[J].ICLR

2015.The

idea

is

simple:

just

train

on

adversarial

examples!

对抗训练是一种数据增广方法原始数据->对抗攻击->对抗样本->模型训练Adversarial

Training

Adversarial

TrainingAdversarial

training

produces

smooth

decision

boundary正常边界生成对抗样本训练后Early

Adversarial

Training

MethodsSzegedyC,ZarembaW,SutskeverI,etal.Intriguingpropertiesofneuralnetworks[J].ICLR

2014.GoodfellowIJ,ShlensJ,SzegedyC.Explainingandharnessingadversarialexamples[J].ICLR

2015.2014年,Szegedy

et

al.

在其解释对抗样本的论文中已探索了对抗训练,用L-BFGS攻击对神经网络每一层生成对抗样本,并添加到训练过程中。发现:深层对抗样本更能提高鲁棒性2015年,Goodfellow

et

al.提出使用FGSM(单步)攻击生成的对抗样本来训练神经网络Goodfellow等人并未使用中间层的对抗样本,因为发现中间层对抗样本没有提升Min-max

Robust

OptimizationNoklandetal.Improvingback-propagationbyaddinganadversarialgradient.

arXiv:1510.04189,

2015.Huang

et

al.

Learningwithastrongadversary,

ICLR

2016.

Shaham

et

al.

Understandingadversarialtraining:Increasinglocalstabilityofneuralnetsthroughrobustoptimization,

arXiv:1511.05432,

2015The

First

Proposal

of

Min-Max

Optimization内部最大化Inner

maximization外部最小化Outer

minimization

Virtual

Adversarial

TrainingMiyato

etal.Distributionalsmoothingwithvirtualadversarialtraining.ICLR

2016.VAT:

a

method

to

improve

generalizationDifferences

to

adversarial

trainingL2

regularized

perturbationUse

both

clean

and

adv

examples

for

trainingUse

KL

divergence

to

generate

adv

examplesWeaknesses

of

Early

AT

MethodsMiyato

etal.Distributionalsmoothingwithvirtualadversarialtraining.ICLR

2016.

These

methods

are

fast!

Only

takes

x2

time

of

standard

trainingPGD

Adversarial

TrainingAthalyeetal.“ObfuscatedGradientsGiveaFalseSenseofSecurity:CircumventingDefensestoAdversarialExamples.”

ICML,2018.

Athalye

et

al.Synthesizingrobustadversarialexamples.ICML,2018.We

got

a

survivor!PGD

Adversarial

TrainingMadryetal."TowardsDeepLearningModelsResistanttoAdversarialAttacks."

ICLR.2018.A

Saddle

Point

Problem内部最大化Inner

maximization外部最小化Outer

minimizationA

saddle

point(constrained

bi-level

optimization)

problemIn

constrained

optimization,

Projected

Gradient

Descent

(PGD)

is

the

best

first-order

solver

PGD

Adversarial

TrainingMadryetal."TowardsDeepLearningModelsResistanttoAdversarialAttacks."

ICLR.2018.Projected

Gradient

Descent

(PGD)PGD

is

an

optimizerPGD

is

also

known

as

an

adv

attack

in

the

field

of

AML

Projection(Clipping)

PGD

Adversarial

TrainingMadryetal."TowardsDeepLearningModelsResistanttoAdversarialAttacks."

ICLR.2018.Projected

Gradient

Descent

(PGD)Random

initialization

+

Uniform

Noise

PGD

Adversarial

TrainingMadryetal.“TowardsDeepLearningModelsResistanttoAdversarialAttacks.”

ICLR.2018.

Ilyasetal.“Adversarialexamplesarenotbugs,theyarefeatures.”

NeurIPS,

2019.Characteristics

of

PGD

adversarial

training

PGD

Adversarial

TrainingMadryetal.“TowardsDeepLearningModelsResistanttoAdversarialAttacks.”

ICLR.2018.

Ilyasetal.“Adversarialexamplesarenotbugs,theyarefeatures.”

NeurIPS,

2019.决策边界:鲁棒特征:普通训练对抗训练普通训练对抗训练对抗样本Dynamic

Adversarial

Training

(DART)Wangetal.“OntheConvergenceandRobustnessofAdversarialTraining.”

ICML.2019.

PGD步长对鲁棒性影响PGD步数对鲁棒性影响训练初期使用简单攻击Dynamic

Adversarial

Training

(DART)Wangetal.“OntheConvergenceandRobustnessofAdversarialTraining.”

ICML.2019.

How

to

measure

the

convergence

of

the

inner

maximization?

Definition(

First-OrderStationaryCondition

(FOSC))

Dynamic

Adversarial

Training

(DART)Wangetal.“OntheConvergenceandRobustnessofAdversarialTraining.”

ICML.2019.

Dynamic

Adversarial

Training:Weak

attack

for

early

training,

strong

attack

for

later

trainingWeak

attack

improves

generalization,

strong

attack

improves

final

robustness.Convergence

analysis:DART

improves

robustnessRobustness

on

CIFAR-10

with

WideResNetTRADESZhangetal."Theoreticallyprincipledtrade-offbetweenrobustnessandaccuracy."

ICML,2019.Use

distribution

loss

(KL)

for

inner

and

outer

optimizationsTRADESZhangetal."Theoreticallyprincipledtrade-offbetweenrobustnessandaccuracy."

ICML,2019.Winning

solutions

of

NeurIPS2018AdversarialVisionChallengeCharacteristics

of

TRADES使用KL监督对抗样本的生成,鲁棒性提升显著干净样本也参与训练,有利于模型收敛和干净准确率基于KL的对抗样本生成包含自适应的过程能成训练得到比PGD对抗训练更平滑的决策边界TRADES既改进了内部最大化又改进了外部最小化TRADESZhangetal."Theoreticallyprincipledtrade-offbetweenrobustnessandaccuracy."

ICML,2019.Experimental

results

of

TRADESTRADES

vs

VAT

vs

ALPZhangetal."Theoreticallyprincipledtrade-offbetweenrobustnessandaccuracy."

ICML,2019.Miyato

etal.Distributionalsmoothingwithvirtualadversarialtraining.ICLR

2016.Kannan,Harini,AlexeyKurakin,andIanGoodfellow."Adversariallogitpairing."

arXivpreprintarXiv:1803.06373

(2018).TRADES:Virtual

Adversarial

Training:Adversarial

Logits

Pairing:相似的优化框架,不同的损失选择,结果差异很大MART:

MisclassificationAwareadveRsarialTrainingWang,etal.“Improvingadversarialrobustnessrequiresrevisitingmisclassifiedexamples.”

ICLR,

2019.Adversarialexamplesareonlydefinedoncorrectlyclassifiedexamples,

what

about

misclassified

examples

?

MART:

MisclassificationAwareadveRsarialTrainingWang,etal.“Improvingadversarialrobustnessrequiresrevisitingmisclassifiedexamples.”

ICLR,

2019.The

influence

of

misclassifiedandcorrectlyclassifiedexamples:

Misclassifiedexampleshaveasignificantimpactonthefinalrobustness!MART:

MisclassificationAwareadveRsarialTrainingWang,etal.“Improvingadversarialrobustnessrequiresrevisitingmisclassifiedexamples.”

ICLR,

2019.

differentmaximizationtechniques

have

negligibleeffectdifferentminimizationtechniques

have

significanteffectMART:

MisclassificationAwareadveRsarialTrainingMisclassificationawareadversarialrisk:Adversarial

risk:Correctlyclassified

and

misclassifiedexample:Misclassificationawareadversarialrisk:MART:

MisclassificationAwareadveRsarialTrainingSurrogatelossfunctions(existingmethodsandMART):Semi-supervisedextensionofMART:MART:

MisclassificationAwareadveRsarialTrainingRobustness

of

MART:White-boxrobustness:ResNet-18,CIFAR-10,𝜖=8/255White-boxrobustness:WideResNet-34-10,CIFAR-10,𝜖=8/255Using

More

Data

to

Improve

RobustnessAlayraceta

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论