数据与模型安全课件第7周：后门攻击和防御

上传人：y*** IP属地：山东上传时间：2024-10-12 格式：PPTX 页数：70 大小：44.42MB 积分：15 举报 版权申诉

已阅读5页，还剩65页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

Backdoor

Attacks

and

Defenses姜育刚，马兴军，吴祖煊Recap:

week

Data

Poisoning

Atttacks

and

DefensesData

Poisoning

AttacksData

Poisoning

DefensesPoisoning

for

Data

ProtectionFuture

ResearchBackdoor

Attacks

and

DefensesA

Brief

History

Backdoor

LearningBackdoor

AttacksBackdoor

DefensesFuture

ResearchBackdoorvs

(Pure)

PoisoningPoisoningattackTrainingtimeattackChangeclassificationboundaryBackdoorattackTrainingtimeattackDoes

not

change

the

original

boundaryAdd

new

boundary后门攻击–动机模型训练大量互联网数据可能存在后门样本后门模型But，后门攻击！=投毒攻击这是两个不同的话题数据投毒是后门攻击的一种实现方式后门攻击-流程步骤1：后门注入；步骤2：后门激活后门攻击的特点:模型在干净数据上性能不变触发器出现即预测后门类别后门攻击-例子数字触发器物理世界攻击Gu

al."Badnets:Identifyingvulnerabilitiesinthemachinelearningmodelsupplychain."

arXiv:1708.06733

(2017).后门攻击-方法分类脏标签攻击:

添加触发器并修改类别BadNets(Guetal.,2019)Trojanattack(Liuetal.,2018)Blendattack(Chenetal.,2017)净标签攻击:

只添加触发器Clean-labelattack

(CL)(Turneretal.,2019)Sinusoidalsignalattack

(SIG)(Barnietal.,2019)Reflectionbackdoor

(Refool)(Liuetal.,2020,

ECCV)Video

backdoor

(Zhao

al.,

2020,

CVPR)Li,Yiming,etal.“Backdoorlearning:Asurvey.”

IEEETNNLS,

2022.后门攻击-优化目标隐蔽性尽量少的毒化样本尽量隐蔽的触发器尽量小的影响模型在干净样本上的性能成功率尽量高的攻击成功率可完成多目标攻击迁移性迁移到不同的训练方法迁移到不同的模型鲁棒性可躲避后门检测防御可躲避后门移除防御后门攻击六种经典攻击所使用的触发器样式攻击成功率A

Brief

History:

The

Early

WorkGu,Tianyu,BrendanDolan-Gavitt,andSiddharthGarg."Badnets:Identifyingvulnerabilitiesinthemachinelearningmodelsupplychain."

arXiv:1708.06733

(2017).正常模型攻击者想要安插额外功能在不改变原网络的情况下安插结果Trojan攻击Liu,Yingqi,etal."Trojaningattackonneuralnetworks."(2017).Idea：诱导模型增强局部链接：Step1：寻找最大化激活某个神经元的patternStep2：逆向生成最大化某个类别的训练数据Step3：逆向数据+pattern

->重新训练模型Blend攻击Chenetal."Targetedbackdoorattacksondeeplearningsystemsusingdatapoisoning."

arXivpreprintarXiv:1712.05526

(2017).饰品注入饰品融合背景混合背景图像Clean-label攻击Turner,Alexander,DimitrisTsipras,andAleksanderMadry."Clean-labelbackdoorattacks."(2018).添加对抗噪声让后门样本变难操纵潜在编码（latent

code）进行类别转移正弦信号攻击Barni

al."Anewbackdoorattackincnnsbytrainingsetcorruptionwithoutlabelpoisoning."

ICIP,2019.特点：不需要类别标签需要添加很明显的条纹攻击成功率并不是很高反光攻击Liu,Yunfei,etal."Reflectionbackdoor:Anaturalbackdoorattackondeepneuralnetworks."

ECCV,2020.反光光学原理特点：无目标攻击（出现反光就分错）向图像中添加背景反光需要提前设计好反光效果攻击成功率不是很高视频攻击Zhao,Shihao,etal."Clean-labelbackdoorattacksonvideorecognitionmodels."

CVPR,2020.优化触发器指向目标类扰动投毒样本抹除自然信息特点：净标签攻击解决高维输入上的后门攻击挑战解决多类别间干扰、高分辨率干扰物理攻击Wenger,Emily,etal.“Backdoorattacksagainstdeeplearningsystemsinthephysicalworld.”

CVPR,2021.物理攻击：触发器的大小、位置、空间很关键隐藏触发器攻击Saha,Aniruddha,AkshayvarunSubramanya,andHamedPirsiavash.“Hiddentriggerbackdoorattacks.”

AAAI,

2020.t:

targetz:

poisoneds:

sourcetsz特征一致视觉一致特征一致视觉一致样本级别触发器Li,Yuezun,etal.“Invisiblebackdoorattackwithsample-specifictriggers.”

ICCV,2021.嵌入类别相关的隐编码每个样本的触发器都不同保证隐码既能融入也能分离动态攻击Nguyen,TuanAnh,andAnhTran.“Input-awaredynamicbackdoorattack.”

NeurIPS,

2020.使用生成器为每个后门样本生成一个特定的触发器攻击物体追踪模型Li,Yiming,etal."Few-ShotBackdoorAttacksonVisualObjectTracking."

ICLR.2022.Visualobjecttracking(VOT):基于第一针里物体信息，预测其在后续针的出现位置Few-ShotBackdoorAttack(FSBA)Li,Yiming,etal."Few-ShotBackdoorAttacksonVisualObjectTracking."

ICLR.2022.同时攻击模板和搜索区域：添加后门噪声的样本在特征空间远离干净样本Experiments:

Physical-World

Attack追踪iPad追踪PersonLi,Yiming,etal."Few-ShotBackdoorAttacksonVisualObjectTracking."

ICLR.2022.Understanding

FSBA特征越不一样，攻击越有效Li,Yiming,etal."Few-ShotBackdoorAttacksonVisualObjectTracking."

ICLR.2022.Understanding

FSBA攻击Template和search

region都能让关注区域消失Li,Yiming,etal."Few-ShotBackdoorAttacksonVisualObjectTracking."

ICLR.2022.攻击人群计数模型Sun,Yuhua,etal."BackdoorAttacksonCrowdCounting."

ACMMultimedia.2022.人群计数（Crowd

Counting）：计算一张图片里的人头数是一个回归问题现有计数方法Sun,Yuhua,etal.“BackdoorAttacksonCrowdCounting.”

ACMMultimedia,2022.基于检测的：密度图回归：密度回归GeneratethepseudodensitymapfrompointannotationsbyGaussiankernels.Useper-pixelsupervision(L2loss)totrainthemodelGroundTruthSupervisionPredictedDensityMapPredictedDensityMapGaussiankernelsSun,Yuhua,etal.“BackdoorAttacksonCrowdCounting.”

ACMMultimedia,2022.密度图篡改后门攻击＋95.5725DMBA

-α=0.2

494.6700Trigger×0.3CleanImage×0.7

DensityMapPoisonedImageAlteredDensityMap＋989.3400DensityMapDMBA+α=2.0

Trigger×0.3AlteredDensityMapCleanImage×0.7

PoisonedImage494.6700主要想法：通过添加触发器和修改密度图来安插后门通过触发器操控模型生成任意大小的密度图预测关键在于如何精准控制计数改变的大小Sun,Yuhua,etal.“BackdoorAttacksonCrowdCounting.”

ACMMultimedia,2022.DMBA-和DMBA+攻击494.6700142.3633361.3056CleanImage×0.7

＋95.5725DensityMapDMBA

-α=0.2

494.6700Trigger×0.3PoisonedImage＋989.3400DensityMapDMBA+α=2.0

Trigger×0.3AlteredDensityMapCleanImage×0.7

PoisonedImage494.6700ModelTrainingPredictionHeadCountCBackdooredModelPredictionHeadCountCCleanTestImageGroundTruth=257.08725PoisonedTestImageGroundTruth=48.454586CleanDensityEstimationBackdoorDensityEstimationPrediction=275.49704Prediction=103.62213AlteredDensityMapSun,Yuhua,etal.“BackdoorAttacksonCrowdCounting.”

ACMMultimedia,2022.实验结果传统攻击无法精准操纵最终结果提出攻击Sun,Yuhua,etal.“BackdoorAttacksonCrowdCounting.”

ACMMultimedia,2022.实验结果Impactoftriggersize:Multipletriggerpatterns：Sun,Yuhua,etal.“BackdoorAttacksonCrowdCounting.”

ACMMultimedia,2022.后门防御后门检测：检测后门样本或后门模型ActivationCluster(Chenetal.2018)Spectral

Signature

(Tran

al.

2018)STRIP(Gaoetal.2019)NeuralCleanse(Wangetal.2019)后门移除:从后门模型中移除后门触发器Fine-tuning(Liuetal.2018)、Fine-pruning(Liuetal.2018)ModeConnectivityRepair(MCR)(Zhaoetal.2020)Neural

Attention

Distillation

(Li

al.

2021)Adversarial

Neuron

Pruning

(ANP)

(Wu

al.

2021)Anti-Backdoor

Learning

(ABL)

(Li

al.

2021)ActivationCluster（AC）Chen,Bryant,etal."Detectingbackdoorattacksondeepneuralnetworksbyactivationclustering."

arXiv:1811.03728

(2018).假设：后门数据聚类可分神经网最后一层输出ICA降维，去前三个主成分聚类分析Spectral

Signature

(SS)Tran,Brandon,JerryLi,andAleksanderMadry.“Spectralsignaturesinbackdoorattacks.”

NeurIPS,

2018.后门样本的SVD降维输入空间深度特征空间根据SVD降维得到的协方差矩阵的top奇异向量进行检测绿色：后门数据STRIPGao,Yansong,etal."Strip:Adefenceagainsttrojanattacksondeepneuralnetworks."

ACSAC.2019.后门样本具有输入无关性扰动后预测不变的样本是后门样本Neural

Cleanse

(NC)Wang,Bolun,etal."Neuralcleanse:Identifyingandmitigatingbackdoorattacksinneuralnetworks."

S&P,2019.后门模型中其他类到后门类的距离更近通过逐个类别寻找最近距离可以确定模型是否有后门以及后门类别后门防御后门检测：检测后门样本或后门模型ActivationCluster(Chenetal.2018)Spectral

Signature

(Tran

al.

Attention

Distillation

(Li

al.

2021)Adversarial

Neuron

Pruning

(ANP)

(Wu

al.

2021)Anti-Backdoor

Learning

(ABL)

(Li

al.

2021)后门移除-优化目标高效性尽量少的使用干净数据尽量快的移除所有后门代价小不影响模型正常性能移除后不需要重训练通用性能同时移除不同类型的后门能保护不同类型的模型、数据集等鲁棒性对Adaptive

Attack鲁棒对不同设置鲁棒后门移除–实验设置假设后门模型已经确定大部分方法需要少量干净数据大部分方法需要在完成后对模型再次微调多少都会影响一点模型性能Fine-pruning

(FP)Liu,Kang

al."Fine-Pruning:DefendingAgainstBackdooringAttacksonDeepNeuralNetworks."

arXiv:1805.12185

(2018).步骤1：参见冗余神经元；步骤2：微调以恢复性能=裁剪+微调裁剪在干净数据上休眠的神经元ModeConnectivityRepair(MCR)Zhao,Pu,etal.“Bridgingmodeconnectivityinlosslandscapesandadversarialrobustness.”

ICLR,

2020.

模型mixup后门模型少量干净数据训练的模型在两个模型之间寻找一个更好的过渡Neural

Attention

Distillation

(NAD)使用知识蒸馏移除后门需要一小部分干净数据先微调用微调后的模型作为教师对中间层通道平均激活进行对齐蒸馏Li,Yige,etal.“NeuralAttentionDistillation:ErasingBackdoorTriggersfromDeepNeuralNetworks.”

ICLR,2020.Neural

Attention

Distillation

(NAD)DefinitionofattentionfunctionsDefinitionofattentiondistillationlossOveralltraininglossNeural

Attention

Distillation

(NAD)PerformanceofNADThe

mosteffectiveagainst

all

6backdoorattacks.Minimum

impact

clean

accuracy.Reducing

ASR

91.82%

average,

sacrificing

only

2.66%

accuracy.Neural

Attention

Distillation

(NAD)Performanceunderdifferent%ofcleandata:5%

looks

good,

10%

sufficient,

the

better!(a)Attacksuccessrate(b)CleanaccuracyNeural

Attention

Distillation

(NAD)Effectiveness

different

teacher-studentcombinations:(a)Attacksuccessrate(b)CleanaccuracyFine-tuned

teacher

(B-F)

better,

average.Teacher

trained

clean

subset

also

goodbut

sacrifices

clean

accuracy.

Neural

Attention

Distillation

(NAD)Effectsofdifferent

attentionfunctions:Note:Theredboxeshighlightour

recommendedattention

function.AbackdooredimageAnti-Backdoor

Learning

(ABL)反后门学习：如何在毒化数据上训练一个干净无后门的模型？Li,Yige,etal.“Anti-backdoorlearning:Trainingcleanmodelsonpoisoneddata.”

NeurIPS,

2021.找到后门投毒样本的共性在训练阶段检测并抛弃有毒样本Anti-Backdoor

Learning

(ABL)Trainingloss

onCleanexamples

(blue)VS.Backdooredexamples

(yellow)Weaknesses

backdoor

attacks:1.Thebackdoortaskismucheasierthanthecleantask.

(Weakness1)2.Abackdoorattackenforces

explicitcorrelationbetweenthetriggerandthetargetclasstosimplifyandacceleratetheinjectionofthebackdoortrigger.(Weakness2)Anti-Backdoor

Learning

(ABL)反后门学习：如何在毒化数据上训练一个干净无后门的模型？后门样本学的更快步骤1：局部梯度上升：控制Loss不要太高步骤2：后门样本检测：loss低的是后门样本步骤3：全局梯度上升：在后门样本上做反学习ABL学习机制Anti-Backdoor

Learning

(ABL)ProblemFormulation

OverviewofABLLGA:

local

gradient

ascent;

GGA:

global

gradient

ascentAnti-Backdoor

Learning

(ABL)PerformanceofABL:Conclusions:The

mosteffectivedefense

against

all

10backdoorattacks;Minimum

impact

clean

accuracy.Anti-Backdoor

Learning

(ABL)PerformanceofourABLwithdifferentisolationratesonCIFAR-10dataset:1%

isolation

achieves

goodtrade-offbetweenASRandCA!Anti-Backdoor

Learning

(ABL)PerformanceofourABLwithdifferentγonCIFAR-10againstBadNets:Thelargerγ,thebetterseparationeffect!（同色的两条线相隔越远）Anti-Backdoor

Learning

(ABL)StresstestingofourABLonCIFAR-10:PerformanceofourABLunderdifferentturningepochsonCIFAR-10:ABLwithonly1%isolationremainseffectiveagainstupto1)70%BadNets;and2)50%Trojan,Blend,andDynamic.Epoch20achievesthebestoverallresults.Anti-Backdoor

Learning

(ABL)PerformanceofvariousunlearningmethodsagainstBadNetsattackonCIFAR-10:ABLachievesthebestunlearningperformanceofASR3.04%andCA86.11%,followed

(discard

isolated

data

then)

Re-trainingfromscratch!Adversarial

Neuron

Pruning

(ANP)Wu

andWang.“Adversarialneuronpruningpurifiesbackdooreddeepmodels.”

NeurIPS,

2021.后门模型对对抗扰动更敏感从后门模型中发现并移除敏感神经元

对抗扰动提出重构式神经元剪枝

人人文库> 全部分类> 教育资料 > 课件下载

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

数据与模型安全课件第7周：后门攻击和防御

文档简介

温馨提示

最新文档

评论

数据与模型安全 课件 第7周：后门攻击和防御

文档简介

温馨提示

最新文档

评论

相关文档

数据与模型安全课件第7周：后门攻击和防御