版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Adversarial
Defense姜育刚,马兴军,吴祖煊Recap:
week
4
1.
Adversarial
Example
DetectionSecondary
Classification
Methods
(二级分类法)Principle
Component
Analysis
(主成分分析法,PCA)Distribution
Detection
Methods
(分布检测法)Prediction
Inconsistency
(预测不一致性)Reconstruction
Inconsistency
(重建不一致性)Trapping
Based
Detection
(诱捕检测法)Adversarial
Attack
CompetitionLink:https://codalab.lisn.upsaclay.fr/competitions/15669?secret_key=77cb8986-d5bd-4009-82f0-7dde2e819ff8
Adversarial
Defense
vs
DetectionThe
weird
relationship
between
defense
and
detectionDetection
IS
defenseBut…
when
we
say
defense,
we
(most
of
the
time)
mean
the
model
is
secured,
yet
detection
cannot
do
that…In
survey
papers:
detection
is
defenseIn
technical
papers:
defense
is
defense
not
detectionDifferencesDefense
is
to
secure
the
model
or
the
systemDetection
is
to
identify
potential
threats,
which
should
befollowed
by
a
defense
strategy,
e.g.,
query
rejection
(but
mostly
ignored)By
defense,
it
mostly
means
robust
training
methodsDefense
MethodsEarly
Defense
MethodsEarly
Adversarial
Training
MethodsLater
Adversarial
Training
MethodsRemaining
Challenges
and
Recent
ProgressA
Recap
of
the
Timeline2014年Goodfellow等人提出快速单步攻击FGSM及对抗训练2015年简单检测方法(PCA)和对抗训练方法2016年提出对抗训练的min-max优化框架2017年大量的对抗样本检测方法和攻击方法(BIM、C&W)、10种检测方法被攻破2019年TRADES及大量其他对抗训练方法、第一篇Science文章2018年物理世界攻击方法、检测方法升级、PGD攻击与对抗训练、9种防御方法被攻破2020年AutoAttack攻击、Fast对抗训练2021年增大模型、增加数据的对抗训练、领域延伸2022年尚未解决的问题,攻击越来越多,防御越来越难2013年Biggio等人与Szegedy等人发现对抗样本Principles
of
DefenseBlock
the
attack
(掐头去尾)Mask
the
input
gradientsRegularize
the
input
gradientsDistill
the
logitsDenoise
the
inputRobustify
the
model
(增强中间)Smooth
the
decision
boundaryReduce
the
Lipschitzness
of
the
modelSmooth
the
loss
landscapeAdversarial
AttackSzegedyC,ZarembaW,SutskeverI,etal.Intriguingpropertiesofneuralnetworks[J].ICLR
2014.GoodfellowIJ,ShlensJ,SzegedyC.Explainingandharnessingadversarialexamples[J].ICLR
2015.
模型训练:
对抗攻击:
分类错误扰动很小测试阶段攻击
Performance
Metrics
Othermetrics:maximumperturbationfor100%attacksuccessrate
Defense
MethodsEarly
Defense
MethodsEarly
Adversarial
Training
MethodsAdvanced
Adversarial
Training
MethodsRemaining
Challenges
and
Recent
ProgressDefensive
DistillationMaking
large
logits
change
to
be
”small”Scaling
up
logits
by
a
few
magnitudes;Retrain
the
lastlayerwith
scaled
logits;Papernot
et
al.
Distillationasadefensetoadversarialperturbationsagainstdeepneuralnetworks,
S&P,
2016Defensive
DistillationPapernot
et
al.
Distillationasadefensetoadversarialperturbationsagainstdeepneuralnetworks,
S&P,
2016
12Distillation
with
temperature
TDefensive
DistillationPapernot
et
al.
Distillationasadefensetoadversarialperturbationsagainstdeepneuralnetworks,
S&P,
2016Defensive
DistillationPapernot
et
al.
Distillationasadefensetoadversarialperturbationsagainstdeepneuralnetworks,
S&P,
2016Defensive
DistillationPapernot
et
al.
Distillationasadefensetoadversarialperturbationsagainstdeepneuralnetworks,
S&P,
2016
Defensive
Distillation
Is
Not
RobustCarlini,Nicholas,andDavidWagner."Defensivedistillationisnotrobusttoadversarialexamples."
arXivpreprintarXiv:1607.04311
(2016).
It
can
be
evaded
by
attacking
the
distilled
network
with
the
temperature
T.
Lessons
LearnedCarlini,Nicholas,andDavidWagner."Defensivedistillationisnotrobusttoadversarialexamples."
arXivpreprintarXiv:1607.04311
(2016).Distillation
is
not
a
good
solution
for
adversarial
robustnessVanishing
input
gradients
can
still
be
recovered
by
a
reverse
operationA
defense
should
beevaluated
against
the
adaptive
attack
to
prove
real
robustnessInput
Gradients
RegularizationRoss
et
al."Improvingtheadversarialrobustnessandinterpretabilityofdeepneuralnetworksbyregularizingtheirinputgradients."
AAAI,2018.Drucker,Harris,andYannLeCun.“Improvinggeneralizationperformanceusingdoublebackpropagation.”
TNN,1992.
Classification
lossInput
gradients
regularization
Related
to
the
double
backpropagation
proposed
by
DruckerandLeCun(1992):Input
Gradients
RegularizationRoss
et
al."Improvingtheadversarialrobustnessandinterpretabilityofdeepneuralnetworksbyregularizingtheirinputgradients."
AAAI,2018.Issues:
1)
limited
adversarial
robustness,
2)
hurts
learning蒸馏的对抗训练的正则化的Feature
SqueezingXu
et
al."Featuresqueezing:Detectingadversarialexamplesindeepneuralnetworks."
NDSS,2018.Compress
the
input
spaceIt
also
hurts
performance
on
large-scale
image
datasets.ThermometerEncodingBuckmanetal."Thermometerencoding:Onehotwaytoresistadversarialexamples."
ICLR,2018.Discretize
the
input
to
break
small
noiseProposed
Thermometer
EncodingInput
TransformationsGuoetal."CounteringAdversarialImagesusingInputTransformations."
ICLR,2018.ImagecroppingandrescalingBit-depthreductionJPEGcompressionTotal
variance
minimizationImagequiltingObfuscated
Gradients
=
Fake
RobustnessAthalyeetal.“ObfuscatedGradientsGiveaFalseSenseofSecurity:CircumventingDefensestoAdversarialExamples.”
ICML,2018.
Athalye
et
al.Synthesizingrobustadversarialexamples.ICML,2018.BackwardPassDifferentiableApproximation(BPDA):
Expectation
Over
Transformation
(EOT)T:
a
set
of
randomized
transformationscan
break
non-differentiable
operation
based
defensescan
break
randomization
based
defensesfind
a
linear
approximation
of
the
non-differentiable
operations,
e.g.,
discretization,
compression
etc.
BPDA+EOT
breaks
7
defenses
published
at
ICLR
2018Athalyeetal.“ObfuscatedGradientsGiveaFalseSenseofSecurity:CircumventingDefensestoAdversarialExamples.”
ICML,2018.
Athalye
et
al.Synthesizingrobustadversarialexamples.ICML,2018.We
got
a
survivor!How
to
Properly
Evaluate
a
Defense?Carlini,Nicholas,etal.“Onevaluatingadversarialrobustness.”
arXivpreprintarXiv:1902.06705
(2019).
Athalyeetal.“ObfuscatedGradientsGiveaFalseSenseofSecurity:CircumventingDefensestoAdversarialExamples.”
ICML,2018Donotblindlyapplymultiple(similar)attacksTryatleastonegradient-freeattackandonehard-labelattackPerformatransferabilityattackusingasimilarsubstitutemodel.Forrandomizeddefenses,properlyensembleoverrandomnessFornon-differentiablecomponents,applydifferentiabletechniques
(BPDA)VerifythattheattackshaveconvergedundertheselectedhyperparametersCarefullyinvestigateattackhyperparametersandreportthoseselectedCompareagainstpriorworkandexplainimportantdifferencesTestbroaderthreatmodelswhenproposinggeneraldefensesRobust
Activation
FunctionsXiao
et
al."EnhancingAdversarialDefensebyk-Winners-Take-All."
ICLR,2020.Block
the
internal
activation:
break
the
continuityk-Winners-Take-All(k-WTA)activationRobust
Loss
FunctionPang
et
al.Rethinkingsoftmaxcross-entropylossforadversarialrobustness.ICLR,2020.Max-Mahalanobiscenter(MMC)
lossMax-Mahalanobiscenter(MMC);
SCE:
softmax
cross
entropyRobust
InferencePang
et
al.MixupInference:Betterexploitingmixuptodefendadversarialattacks.ICLR,2020.Mixup
Inference
(MI)New
Adaptive
Attacks
Break
These
DefensesTrameretal.“Onadaptiveattackstoadversarialexampledefenses.”
NeurIPS,
2020.T1:
AttackthefulldefenseT2:
Target
importantdefensepartsT3:
SimplifytheattackT4:
EnsureconsistentlossfunctionT5:
OptimizewithdifferentmethodsT6:
UsestrongadaptiveattacksHow
to
Evaluate
aDefense?CroceandHein.“Reliableevaluationofadversarialrobustnesswithanensembleofdiverseparameter-freeattacks.”
ICML,2020.
Gao
et
al.
FastandReliableEvaluationofAdversarialRobustnesswithMinimum-MarginAttack,
ICML
2022.Zimmermann
etal."IncreasingConfidenceinAdversarialRobustnessEvaluations."
arXivpreprintarXiv:2206.13991
(2022).Strong
attacks:AutoAttack
(one
must-to-test
attack)Margin
Decomposition
(MD)
attack
(better
than
AutoAttack
on
ViT)Minimum-Margin
(MM)
attack
(new
SOTA
attack
to
test?)Extra
robustness
tests
Attackunittests
(Zimmermannetal,2022)Adversarial
TrainingGoodfellowIJ,ShlensJ,SzegedyC.Explainingandharnessingadversarialexamples[J].ICLR
2015.The
idea
is
simple:
just
train
on
adversarial
examples!
对抗训练是一种数据增广方法原始数据->对抗攻击->对抗样本->模型训练Adversarial
Training
Adversarial
TrainingAdversarial
training
produces
smooth
decision
boundary正常边界生成对抗样本训练后Early
Adversarial
Training
MethodsSzegedyC,ZarembaW,SutskeverI,etal.Intriguingpropertiesofneuralnetworks[J].ICLR
2014.GoodfellowIJ,ShlensJ,SzegedyC.Explainingandharnessingadversarialexamples[J].ICLR
2015.2014年,Szegedy
et
al.
在其解释对抗样本的论文中已探索了对抗训练,用L-BFGS攻击对神经网络每一层生成对抗样本,并添加到训练过程中。发现:深层对抗样本更能提高鲁棒性2015年,Goodfellow
et
al.提出使用FGSM(单步)攻击生成的对抗样本来训练神经网络Goodfellow等人并未使用中间层的对抗样本,因为发现中间层对抗样本没有提升Min-max
Robust
OptimizationNoklandetal.Improvingback-propagationbyaddinganadversarialgradient.
arXiv:1510.04189,
2015.Huang
et
al.
Learningwithastrongadversary,
ICLR
2016.
Shaham
et
al.
Understandingadversarialtraining:Increasinglocalstabilityofneuralnetsthroughrobustoptimization,
arXiv:1511.05432,
2015The
First
Proposal
of
Min-Max
Optimization内部最大化Inner
maximization外部最小化Outer
minimization
Virtual
Adversarial
TrainingMiyato
etal.Distributionalsmoothingwithvirtualadversarialtraining.ICLR
2016.VAT:
a
method
to
improve
generalizationDifferences
to
adversarial
trainingL2
regularized
perturbationUse
both
clean
and
adv
examples
for
trainingUse
KL
divergence
to
generate
adv
examplesWeaknesses
of
Early
AT
MethodsMiyato
etal.Distributionalsmoothingwithvirtualadversarialtraining.ICLR
2016.
These
methods
are
fast!
Only
takes
x2
time
of
standard
trainingPGD
Adversarial
TrainingAthalyeetal.“ObfuscatedGradientsGiveaFalseSenseofSecurity:CircumventingDefensestoAdversarialExamples.”
ICML,2018.
Athalye
et
al.Synthesizingrobustadversarialexamples.ICML,2018.We
got
a
survivor!PGD
Adversarial
TrainingMadryetal."TowardsDeepLearningModelsResistanttoAdversarialAttacks."
ICLR.2018.A
Saddle
Point
Problem内部最大化Inner
maximization外部最小化Outer
minimizationA
saddle
point(constrained
bi-level
optimization)
problemIn
constrained
optimization,
Projected
Gradient
Descent
(PGD)
is
the
best
first-order
solver
PGD
Adversarial
TrainingMadryetal."TowardsDeepLearningModelsResistanttoAdversarialAttacks."
ICLR.2018.Projected
Gradient
Descent
(PGD)PGD
is
an
optimizerPGD
is
also
known
as
an
adv
attack
in
the
field
of
AML
Projection(Clipping)
PGD
Adversarial
TrainingMadryetal."TowardsDeepLearningModelsResistanttoAdversarialAttacks."
ICLR.2018.Projected
Gradient
Descent
(PGD)Random
initialization
+
Uniform
Noise
PGD
Adversarial
TrainingMadryetal.“TowardsDeepLearningModelsResistanttoAdversarialAttacks.”
ICLR.2018.
Ilyasetal.“Adversarialexamplesarenotbugs,theyarefeatures.”
NeurIPS,
2019.Characteristics
of
PGD
adversarial
training
PGD
Adversarial
TrainingMadryetal.“TowardsDeepLearningModelsResistanttoAdversarialAttacks.”
ICLR.2018.
Ilyasetal.“Adversarialexamplesarenotbugs,theyarefeatures.”
NeurIPS,
2019.决策边界:鲁棒特征:普通训练对抗训练普通训练对抗训练对抗样本Dynamic
Adversarial
Training
(DART)Wangetal.“OntheConvergenceandRobustnessofAdversarialTraining.”
ICML.2019.
PGD步长对鲁棒性影响PGD步数对鲁棒性影响训练初期使用简单攻击Dynamic
Adversarial
Training
(DART)Wangetal.“OntheConvergenceandRobustnessofAdversarialTraining.”
ICML.2019.
How
to
measure
the
convergence
of
the
inner
maximization?
Definition(
First-OrderStationaryCondition
(FOSC))
Dynamic
Adversarial
Training
(DART)Wangetal.“OntheConvergenceandRobustnessofAdversarialTraining.”
ICML.2019.
Dynamic
Adversarial
Training:Weak
attack
for
early
training,
strong
attack
for
later
trainingWeak
attack
improves
generalization,
strong
attack
improves
final
robustness.Convergence
analysis:DART
improves
robustnessRobustness
on
CIFAR-10
with
WideResNetTRADESZhangetal."Theoreticallyprincipledtrade-offbetweenrobustnessandaccuracy."
ICML,2019.Use
distribution
loss
(KL)
for
inner
and
outer
optimizationsTRADESZhangetal."Theoreticallyprincipledtrade-offbetweenrobustnessandaccuracy."
ICML,2019.Winning
solutions
of
NeurIPS2018AdversarialVisionChallengeCharacteristics
of
TRADES使用KL监督对抗样本的生成,鲁棒性提升显著干净样本也参与训练,有利于模型收敛和干净准确率基于KL的对抗样本生成包含自适应的过程能成训练得到比PGD对抗训练更平滑的决策边界TRADES既改进了内部最大化又改进了外部最小化TRADESZhangetal."Theoreticallyprincipledtrade-offbetweenrobustnessandaccuracy."
ICML,2019.Experimental
results
of
TRADESTRADES
vs
VAT
vs
ALPZhangetal."Theoreticallyprincipledtrade-offbetweenrobustnessandaccuracy."
ICML,2019.Miyato
etal.Distributionalsmoothingwithvirtualadversarialtraining.ICLR
2016.Kannan,Harini,AlexeyKurakin,andIanGoodfellow."Adversariallogitpairing."
arXivpreprintarXiv:1803.06373
(2018).TRADES:Virtual
Adversarial
Training:Adversarial
Logits
Pairing:相似的优化框架,不同的损失选择,结果差异很大MART:
MisclassificationAwareadveRsarialTrainingWang,etal.“Improvingadversarialrobustnessrequiresrevisitingmisclassifiedexamples.”
ICLR,
2019.Adversarialexamplesareonlydefinedoncorrectlyclassifiedexamples,
what
about
misclassified
examples
?
MART:
MisclassificationAwareadveRsarialTrainingWang,etal.“Improvingadversarialrobustnessrequiresrevisitingmisclassifiedexamples.”
ICLR,
2019.The
influence
of
misclassifiedandcorrectlyclassifiedexamples:
Misclassifiedexampleshaveasignificantimpactonthefinalrobustness!MART:
MisclassificationAwareadveRsarialTrainingWang,etal.“Improvingadversarialrobustnessrequiresrevisitingmisclassifiedexamples.”
ICLR,
2019.
differentmaximizationtechniques
have
negligibleeffectdifferentminimizationtechniques
have
significanteffectMART:
MisclassificationAwareadveRsarialTrainingMisclassificationawareadversarialrisk:Adversarial
risk:Correctlyclassified
and
misclassifiedexample:Misclassificationawareadversarialrisk:MART:
MisclassificationAwareadveRsarialTrainingSurrogatelossfunctions(existingmethodsandMART):Semi-supervisedextensionofMART:MART:
MisclassificationAwareadveRsarialTrainingRobustness
of
MART:White-boxrobustness:ResNet-18,CIFAR-10,𝜖=8/255White-boxrobustness:WideResNet-34-10,CIFAR-10,𝜖=8/255Using
More
Data
to
Improve
RobustnessAlayraceta
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- DB51T 1598.5-2023 低压线路电气火灾原因认定 第5部分:接地漏电
- DB51T 1524-2012 桑树品种耐旱性鉴定技术规程
- 免疫性血小板减少性紫癜病因介绍
- 新建电水浴式气化器项目立项申请报告
- 戊酸生产加工项目可行性研究报告
- 反循环钻机生产加工项目可行性研究报告
- 新建压敏氧化锌项目立项申请报告
- (施工建设)云母电容器项目可行性研究报告
- 2024-2030年新版中国金纶铸件项目可行性研究报告
- 2024-2030年新版中国发光二极管显示器项目可行性研究报告
- 《安装规范全》课件
- 烧烤羊肉串的做法
- 跌倒或坠床相关知识培训课件
- 光纤温度传感器的原理及应用研究
- 浙江电大资本经营作业1-4
- 广东省深圳市宝安区2023-2024学年高一年级上册调研测试物理试卷
- 冰雪旅游安全知识假期旅行安全攻略
- 婴儿推车设计方案
- 城市轨道交通售检票系统 课件 项目四 自动售票机
- 虚实结合(上课改)课件
- uv印刷工艺注意问题
评论
0/150
提交评论