数据与模型安全 课件 第3周:对抗样本_第1页
数据与模型安全 课件 第3周:对抗样本_第2页
数据与模型安全 课件 第3周:对抗样本_第3页
数据与模型安全 课件 第3周:对抗样本_第4页
数据与模型安全 课件 第3周:对抗样本_第5页
已阅读5页,还剩49页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Adversarial

Examples姜育刚,马兴军,吴祖煊

1.

Deep

Neural

Networks

2.

Explainable

Machine

LearningRecap:

week

2Principles

and

MethodologiesLearning

DynamicsThe

Learned

ModelInferenceGeneralizationRobustness

to

Common

CorruptionsThis

Week

1.

Adversarial

Examples

2.

Adversarial

Attacks

3.

Adversarial

Vulnerability

UnderstandingMachine

Learning

Is

EverywhereMedicineandBiologySecurityandDefenseAutonomousVehicleIoTFinancialSystemMachineLearningMediaandEntertainmentCriticalInfrastructureBeat

Humans

on

Many

TasksSpeechRecognitionBaiduDeepSpeech2:End-to-endDeepLearningforEnglishandMandarinSpeechRecognitionEnglishandMandarinspeechrecognitionTransitionfromEnglishtoMandarinmadesimplerbyend-to-endDLNofeatureengineeringorMandarin-specificsrequiredMoreaccuratethanhumansErrorrate3.7%vs.4%forhumantestshttp://svail.github.io/mandarin//pdf/1512.02595.pdfOutperform

Human

on

Many

TasksStrategicGamesAlphaGo:

FirstComputerProgramtoBeataHumanGoProfessionalTrainingDNNs:3weeks,340milliontrainingstepson50GPUsPlay:Asynchronousmulti-threadedsearchSimulationsonCPUs,policyandvalueDNNsinparallelonGPUsSinglemachine:40searchthreads,48CPUs,and8GPUsDistributedversion:40searchthreads,1202CPUsand176GPUsOutcome:BeatbothEuropeanandWorldGochampionsinbestof5matches/nature/journal/v529/n7587/full/nature16961.htmlOutperform

Human

on

Many

TasksStrategicGamesAlphaGo:

FirstComputerProgramtoBeataHumanGoProfessionalTrainingDNNs:3weeks,340milliontrainingstepson50GPUsPlay:Asynchronousmulti-threadedsearchSimulationsonCPUs,policyandvalueDNNsinparallelonGPUsSinglemachine:40searchthreads,48CPUs,and8GPUsDistributedversion:40searchthreads,1202CPUsand176GPUsOutcome:BeatbothEuropeanandWorldGochampionsinbestof5matches/nature/journal/v529/n7587/full/nature16961.htmlOutperform

Human

on

Many

TasksLarge-scale

Image

RecognitionDALL·E2AlphaFold

V2LargeLanguageModel(LLM):ChatGPT

LargeMultimodelModel:GPT-4OpenAI在2023年3月发布的多模态对话大模型,能够接受图像和文本输入,并输出文本,具有超出ChatGPT的图文理解能力、运算能力、代码生成能力、以及很多专业考试能力。参数量:1万亿基础模型:GPT-4训练数据:

在GPT-3.5、ChatGPT基础之上增加了多模态数据、更多的人工标注数据等等Outperform

Human

on

Many

TasksImageRecognitionGoogLeNet:/people/karpathy/ilsvrc/LabrapoodleorFriedchickenSheepdogorMopBarnowlorAppleRawchickenorDonaldTrumpParrotorGuacamoleVulnerabilities

of

DNNsDog,82%confidenceOstrich,98%confidenceVulnerabilities

of

DNNsDog,82%confidenceAdversarial

ExamplesSzegedyC,ZarembaW,SutskeverI,etal.Intriguingpropertiesofneuralnetworks[J].ICLR

2014.GoodfellowIJ,ShlensJ,SzegedyC.Explainingandharnessingadversarialexamples[J].ICLR

2015.Small

perturbations

can

fool

DNNsAdversarial

AttackSzegedyC,ZarembaW,SutskeverI,etal.Intriguingpropertiesofneuralnetworks[J].ICLR

2014.GoodfellowIJ,ShlensJ,SzegedyC.Explainingandharnessingadversarialexamples[J].ICLR

2015.

DNN

Training:

Adversarial

Attack:

Misclassification

test

time

attack

Adversarial

Attack123Characteristics

of

Adversarial

ExamplesSmallImperceptibleHiddenTransferUniversalAdversarial

ExamplesExample

AttacksPerturbationsaresmall,imperceptibletohumaneyes.Adversarialexamplesareeasytogenerateandtransferacrossmodels.Maetal.,

“UnderstandingAdversarialAttacksonDeepLearningBasedMedicalImageAnalysisSystems”,Pattern

Recognition,

2021.BenignNevus,73%confidenceAdversarialnoiseMalignantNevus,89%confidenceExample

AttacksCleanvideoframes:

Correct

ClassAdversarial

video:

Wrong

ClassJiangetal.,

“Black-boxAdversarialAttacksonVideoRecognitionModels”,ACMMM,

2019.Example

AttacksEykholt,Kevin,etal.“Robustphysical-worldattacksondeeplearningvisualclassification.”

CVPR,2018.Physical-world

attacks

against

traffic

signsScience

Museum

at

LondonStop

signs

recognized

as

45km

speed

limitExample

AttacksAthalye,Anish,etal."Synthesizingrobustadversarialexamples."

ICML,2018.3D

printed

turtle

recognized

as

a

rifle

from

any

angle

Example

AttacksBrown,TomB.,etal."Adversarialpatch."

arXivpreprintarXiv:1712.09665

(2017).Adversarial

patch

makes

people

invisible

to

object

detection

(YOLO)Example

Attacks/Adversarial

attack

or

new

fashion?Example

AttacksXu,Kaidi,etal.“Adversarialt-shirt!evadingpersondetectorsinaphysicalworld.”

ECCV,

2020.Adversarial

t-shirt:

one

step

closer

to

real-world

attackExample

AttacksDuan

etal.AdversarialCamouflage:HidingPhysical-WorldAttacksWith

Natural

Styles.

CVPR,

2020.Tree

bark

->

street

signpeople+pikachu

t-shirt

->

dogCamouflage

adversarial

patterns

into

realistic

stylesExample

AttacksDuan,Ranjie,etal."Adversariallaserbeam:Effectivephysical-worldattacktodnnsinablink."

CVPR,2021Night

scene

adversarial

attack

with

laser

pointerExample

AttacksCao,Yulong,etal."Invisibleforbothcameraandlidar:Securityofmulti-sensorfusionbasedperceptioninautonomousdrivingunderphysical-worldattacks."

S&P,2021.Attacking

both

camera

and

lidar

using

adversarial

objectsExample

AttacksCarlini,Nicholas,andDavidWagner.“Audioadversarialexamples:Targetedattacksonspeech-to-text.”

S&PW,2018./code/audio_adversarial_examples/AdversarialMusic:RealworldAudioAdversaryagainstWake-wordDetectionSystem/watch?v=r4XXGDVs0f8Attacking

speech/command

recognition

modelsExample

AttacksQ&AAdversariesRibeiro

et

al.“SemanticallyequivalentadversarialrulesfordebuggingNLPmodels.”

ACL,

2018.Threats

to

AI

ApplicationsTransportationindustryTrickautonomousvehiclesintomisinterpretingstopsignsorspeedlimitCybersecurityindustryBypassAI-basedmalwaredetectiontoolsMedicalindustryForgemedicalconditionSmartHomeindustryFoolvoicecommandsFinancialIndustryTrickanomalyandfrauddetectionenginesDefinition

of

Adversarial

ExamplesNostandardcommunity-accepteddefinition“Adversarialexamplesareinputstomachinelearningmodelsthatanattackerhasintentionallydesignedtocausethemodeltomakeamistake”Goodfellow,Ian.“Defenseagainstthedarkarts:Anoverviewof

adversarialexamplesecurityresearchandfutureresearch

directions."

arXiv:1806.04169

(2018).TaxonomyofAttacksAttacktimingPoisoningattackEvasionattackAttacker’sgoalTargetedattackUntargetedattackAttacker’sknowledgeBlack-boxWhite-boxGray-boxUniversalityIndividualUniversalAttack

TimingEvasion(Causation)attackTesttimeattackChangeinputexamplePoisoningattackTrainingtimeattackChangeclassificationboundaryAttacker's

GoalTargetedattackCauseaninputtoberecognizedascomingfromaspecificclassUntargetedattackCauseaninputtoberecognizedasanyincorrectclassOstrich

Anyclass,exceptdog

Adversary's

KnowledgeWhite-boxattack:Attackerhasfullaccesstothemodel,includingmodeltype,modelarchitecture,valuesofparametersandtrainingweightsBlack-boxattack:AttackerhasnoknowledgeaboutthemodelunderattackRelyontransferabilityofadversarialexamplesGray-boxattack(Semi-black-boxattack)Attacker

may

know

some

hyperparameters

like

model

architectureUniversalityIndividualattackGeneratedifferentperturbationsforeachcleaninputUniversalattackOnlycreateauniversalperturbationforthewholedataset.Makeiteasiertodeployadversaryexamples.Moosavi-Dezfooli,Seyed-Mohsen,etal.“Universaladversarialperturbations.”

CVPR

2017.A

Brief

History

of

Adversarial

Machine

LearningBiggioetal.“Evasionattacksagainstmachinelearningattesttime.”;

Szegedy,Christian,etal."Intriguingpropertiesofneuralnetworks."

2014年Goodfellow等人提出快速单步攻击FGSM及对抗训练2015年简单检测方法(PCA)和对抗训练方法2016年提出对抗训练的min-max优化框架2017年大量的对抗样本检测方法和攻击方法(BIM、C&W)、10种检测方法被攻破2019年TRADES及大量其他对抗训练方法、第一篇Science文章2018年物理世界攻击方法、检测方法升级、PGD攻击与对抗训练、9种防御方法被攻破2020年AutoAttack攻击、Fast对抗训练2021年增大模型、增加数据的对抗训练、领域延伸2022年尚未解决的问题,攻击越来越多,防御越来越难2013年Biggio等人与Szegedy等人发现对抗样本White-box

Attacks单步攻击:FastGradientSign

Method(FGSM)

(Goodfellowetal.2014):多步攻击:IterativeMethods(BIM,

PGD),(Kurakinetal.2016;

Madry

et

al.

2018):ProjectedGradientDescent

(PGD):

strongest

first-order

attack.基于优化的攻击:C&W

attack(Carlini&Wagner2017):

CW

attack

was

the

strongest

attack

集成攻击:AutoAttack

(Croceetal.2020):

current

strongest

attackWhy

Adversarial

Examples

Exist?Non-linear

Explanation1stlayerViewing

DNNas

a

sequence

of

transformed

spaces:10thlayer20thlayerSzegedyC,ZarembaW,SutskeverI,etal.Intriguingpropertiesofneuralnetworks[J].ICLR

2014;

Ma

et

al.

CharacterizingAdversarialSubspaceUsingLocalIntrinsicDimensionality.

ICLR

2018High

dimensional

non-linearexplanation:

Non-lineartransformationsleadstotheexistenceofsmall“pockets”inthedeepspace:Regionsoflowprobability(notnaturallyoccurring).Densely

scatteredregions.Continuousregions.Closetonormaldatasubspace.Linear

ExplanationViewing

DNNas

a

stack

of

linear

operations:

GoodfellowIJ,ShlensJ,SzegedyC.Explainingandharnessingadversarialexamples[J].ICLR

2015.VulnerabilityIncreaseswithIntrinsicDimensionalityAmsaleget

al.

TheVulnerabilityofLearningtoAdversarialPerturbationIncreaseswithIntrinsicDimensionality.

WIFS,

2017Y-axis:

the

minimum

adversarial

noise

required

to

subvert

a

KNN

classifierX-axis:

LID

valuesRed

curve:

theoretical

boundCIFAR-10ImageNetInsufficient

Training

Data

Insufficient

Training

Data

SizeofthetrainingdatasetAccuracyonitsowntestdatasetAccuracyonthetestdatasetwith4×104pointsAccuracyontheboundarydataset8010092.760.880099.097.474.9800099.599.694.18000099.999.998.9SizeofthetrainingdatasetAccuracyonitsowntestdatasetAccuracyonthetestdatasetwith4×104pointsAccuracyontheboundarydataset8010096.370.180099.899.085.7800099.999.897.38000099.9899.9899.5Unnecessary

Features

Wanget

al."Atheoreticalframeworkforrobustnessof(deep)classifiersagainstadversarialexamples."

arXiv:1612.00334

(2016).Unnecessary

FeaturesAdversarialsamplescanbefarawayfr

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论