数据与模型安全 课件 第8周:数据抽取和模型窃取_第1页
数据与模型安全 课件 第8周:数据抽取和模型窃取_第2页
数据与模型安全 课件 第8周:数据抽取和模型窃取_第3页
数据与模型安全 课件 第8周:数据抽取和模型窃取_第4页
数据与模型安全 课件 第8周:数据抽取和模型窃取_第5页
已阅读5页,还剩55页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Data

Extraction

and

Model

Stealing姜育刚,马兴军,吴祖煊Recap:

week

7A

Brief

History

of

Backdoor

LearningBackdoor

AttacksBackdoor

DefensesFuture

ResearchThis

WeekData

Extraction

Attack

&

DefenseModel

Stealing

AttackFuture

ResearchThis

WeekData

Extraction

Attack

&

DefenseModel

Stealing

AttackFuture

ResearchData

Extraction

Attack通过模型逆向得到训练数据:8001/dss/imageClassify

TerminologyThe

following

terms

describe

the

same

thing:Data

Extraction

AttackData

Stealing

AttackTraining

Data

Extraction

AttackModel

Memorization

AttackModel

Inversion

AttackSecurity

ThreatsMysocialsecuritynumberis078-Personal

Info

LeakageSensitive

Info

LeakageThreats

to

National

SecurityIllegal

Data

Trading…Memorization

of

DNNsEvidence

1:

DNN

learns

different

levels

of

representationsMemorization

of

DNNsEvidence

2:

DNN

can

memorize

random

labels/pixels真实标签随机标签乱序像素随机像素高斯噪声Zhang,Chiyuan,etal.“Understandingdeeplearningrequiresrethinkinggeneralization.”ICLR

2017.Memorization

of

DNNsEvidence

3:

The

success

of

GANs

and

diffusion

models/;

/

Intended

vs.

Unintended

MemorizationIntended

MemorizationTask-relatedStatisticsInputs

and

LabelsArpitetal.“Acloserlookatmemorizationindeepnetworks.”

ICML,2017.

Carlinietal.“Thesecretsharer:Evaluatingandtestingunintendedmemorizationinneuralnetworks.”USENIXSecurity,2019.第一层Filter正常CIFAR-10第一层Filter随机标注CIFAR-10自然语言翻译模型记忆:“我的社保号码是xxxx”Unintended

MemorizationTask-irrelevant

but

memorizedEven

appear

only

a

few

times出现4次就能全记住现有数据窃取攻击黑盒窃取主动测试:煤矿里的金丝雀“随机号码为****”“我的社保号码为****”主动注入,然后先兆数据在语言模型中的“曝光度”(Exposure)Carlinietal.“Thesecretsharer:Evaluatingandtestingunintendedmemorizationinneuralnetworks.”USENIXSecurity,2019.意外记忆测试和量化:’先兆’黑盒窃取针对通用语言模型:逆向出大量的:名字、手机号、邮箱、社保号等大模型比小模型更容易记住这些信息即使只在一个文档里出现也能被记住Carlini,Nicholas,etal.“Extractingtrainingdatafromlargelanguagemodels.”

USENIXSecurity,2021.训练数据萃取攻击Training

Data

Extraction

AttackDefinition

of

MemorizationCarlini,Nicholas,etal.“Extractingtrainingdatafromlargelanguagemodels.”

USENIXSecurity,2021.模型知识提取k-逼真记忆攻击步骤Carlini,Nicholas,etal.“Extractingtrainingdatafromlargelanguagemodels.”

USENIXSecurity,2021.步骤1:生成大量文本;步骤2:文本筛选和确认实验结果604条“意外”记忆只在一个文档里出现的记忆模型越大记忆越强Memorization

ofDiffusion

Models美国马里兰大学和纽约大学联合研究发现,生成扩散模型会记忆原始训练数据,导致在特定文本提示下,泄露原始数据生成的:原始的:Memorization

ofDiffusion

ModelsDefinition

of

Replication:Wesaythatageneratedimagehasreplicatedcontentifitcontainsanobject(eitherintheforegroundorbackground)thatappearsidenticallyinatrainingimage,neglectingminorvariationsinappearancethatcouldresultfromdataaugmentation.Somepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.Memorization

ofDiffusion

ModelsSomepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.OriginalSegmixDiagonal

OutpaintingPatch

OutpaintingCreate

Synthetic

and

Real

DatasetsExisting

image

retrieval

datasets:OxfordParisINSTREGPR1200Memorization

ofDiffusion

ModelsSomepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.Train

Image

Retrieval

ModelsMemorization

ofDiffusion

ModelsSomepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.Similarity

metric:

inner

product

token-wise

inner

productDiffusion

model:

DDPMDataset:

Celeb-AThe

top-2

matches

of

diffusion

models

trained

on

300,

3000,

and

30000

images

(the

full

set

is

30000).Results:Green:

copyBlue:

close

but

no

exact

copyOthers:

similar

but

not

the

sameMemorization

ofDiffusion

ModelsSomepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.Gen-train

vs

train-train

similarity

score

distribution数据越少Copy越多Memorization

ofDiffusion

ModelsSomepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.Many

close

copy

but

no

exact

match

(similarity

score

<0.65)Case

study:

ImageNet

LDMMost

similar:

theatercurtain,peacock,andbananasLeast

similar:

sealion,bee,andswingMemorization

ofDiffusion

ModelsSomepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.Case

study:

StableDiffusionLAIONAestheticsv26+:

12M

imagesRandom

select

9000

images

as

source

and

use

their

captions

to

promptMemorization

ofDiffusion

ModelsSomepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.Case

study:

StableDiffusionSome

keywords

(those

in

red)

are

associated

with

certain

fixed

patterns.

Key

wordsMemorization

ofDiffusion

ModelsSomepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.Case

study:

StableDiffusionStyle

copying

using

text

prompt:

<Name

of

the

painting>

by

<name

of

the

artist>Memorization

of

Large

Language

Models

(LLMs)Shi,Weijia,etal."DetectingPretrainingDatafromLargeLanguageModels."

arXivpreprintarXiv:2310.16789

(2023).PretrainingdatadetectionMIN-K%PROBMemorization

of

Large

Language

Models

(LLMs)Shi,Weijia,etal."DetectingPretrainingDatafromLargeLanguageModels."

arXivpreprintarXiv:2310.16789

(2023).Detection

on

WIKIMIAA

dynamic

benchmark:

WIKIMIA白盒窃取白盒窃取需要利用梯度信息,也称梯度逆向攻击(Gradient

Inversion

Attack)针对梯度共享的训练:分布式训练联邦学习并行训练无中心化训练两种分布式训练范式白盒窃取白盒窃取需要利用梯度信息,也称梯度逆向攻击(Gradient

Inversion

Attack)Zhang

et

al.“ASurveyonGradientInversion:Attacks,DefensesandFutureDirections.”

IJCAI

2022.迭代逆向(逐层)递归逆向逼近反推白盒窃取:迭代逆向迭代逆向:通过构造数据来接近真实梯度真实梯度,假设已知一次前传两次后传生成数据产生的梯度

Zhang

et

al.“ASurveyonGradientInversion:Attacks,DefensesandFutureDirections.”

IJCAI

2022.白盒窃取:迭代逆向已有工作汇总Zhang

et

al.“ASurveyonGradientInversion:Attacks,DefensesandFutureDirections.”

IJCAI

2022.白盒窃取:递归逆向递归逆向:基于真实梯度追层逆向推导关键点:图像大小(32x32)Batch大小(大多为1)模型大小真实梯度,已知Zhang

et

al.“ASurveyonGradientInversion:Attacks,DefensesandFutureDirections.”

IJCAI

2022.白盒窃取:递归逆向已有工作汇总Zhang

et

al.“ASurveyonGradientInversion:Attacks,DefensesandFutureDirections.”

IJCAI

2022.白盒防御已有工作汇总Zhang

et

al.“ASurveyonGradientInversion:Attacks,DefensesandFutureDirections.”

IJCAI

2022.This

WeekData

Extraction

Attack

&

DefenseModel

Stealing

AttackFuture

ResearchAI模型训练代价高昂BERTGoogle$160万大规模、高性能的AI模型训练耗费巨大数据资源计算资源人力资源模型窃取的动机巨大的商业价值尽量保持模型性能不希望被发现宝贵的AI模型模型窃取为其所用模型窃取的方式输入输出模型微调模型剪枝窃取攻击StealingmachinelearningmodelsviapredictionAPIs,

USENIXSecurity,

2016;

Practicalblack-boxattacksagainstmachinelearning,

ASIACCS,

2017;

Knockoffnets:Stealingfunctionalityofblack-boxmodels,

CVPR,

2019;

Maze:Data-free

modelstealing

attackusingzeroth-ordergradientestimation,

CVPR,

2021;基于方程式求解的攻击攻击思路示例基于方程式求解的攻击Tramèr,Florian,etal."Stealingmachinelearningmodelsviaprediction{APIs}."

USENIXSecurity,2016.100%窃取某些商业模型所需的查询数和时间基于方程式求解的攻击:窃取参数攻击算法参数个数为d通过d+1个输入,构造d+1个下列方程

主要特点:针对传统机器学习模型:SVM、LR、DT可精确求解,需要模型返回精确的置信度窃取得到的模型还可能泄露训练数据(数据逆向攻击)Tramèr,Florian,etal."Stealingmachinelearningmodelsviaprediction{APIs}."

USENIXSecurity,2016.基于方程式求解的攻击:窃取超参Wang,Binghui,andNeilZhenqiangGong."Stealinghyperparametersinmachinelearning."

S&P,2018.攻击思想:模型训练完了的状态应该是Loss梯度为0

基于替代模型的攻击Orekondy

et

al."Knockoffnets:Stealingfunctionalityofblack-boxmodels."

CVPR,2019.攻击思想:在查询目标模型的过程中训练一个替代模型模拟其行为基于替代模型的攻击Orekondy

et

al."Knockoffnets:Stealingfunctionalityofblack-boxmodels."

CVPR,2019.Knockoff

Nets攻击:“仿冒网络”基于替代模型的攻击Knockoff

Nets攻击:攻击流程采样大量查询样本训练替代模型强化学习,学习如何高效选择样本Orekondy

et

al."Knockoffnets:Stealingfunctionalityofblack-boxmodels."

CVPR,2019.基于替代模型的攻击Jagielski,Matthew,etal.“Highaccuracyandhighfidelityextractionofneuralnetworks.”

USENIXSecurity,2020.高准确(accuracy)vs高保真(fidelity)窃取攻击蓝色:目标决策边界橙色:高准确窃取绿色:高保真窃取基于替代模型的攻击Jagielski,Matthew,etal.“Highaccuracyandhighfidelityextractionofneuralnetworks.”

USENIXSecurity,2020.高准确(accuracy)vs高保真(fidelity)窃取攻击目标模型(黑盒)查询图片替代模型模型输出作为标签指导替代模型训练

概率输出类别输出基于替代模型的攻击Jagielski,Matthew,etal.“Highaccuracyandhighfidelityextractionofneuralnetworks.”

USENIXSecurity,2020.功能等同窃取FunctionallyEquivalentExtraction攻击步骤:寻找在某个Neuron上,让ReLU=0的关键点在关键点两侧探索边界,确定对应权重只能窃取两层网络基于替代模型的攻击Carlini

et

al."Cryptanalyticextractionofneuralnetworkmodels."

AnnualInternationalCryptologyConference,2020.加密分析窃取CryptanalyticExtraction思想:ReLU的二级导为0

&有限差分(finite

difference)ReLU=0基于替代模型的攻击加密分析窃取CryptanalyticExtraction窃取0-deep神经网络:窃取1-deep神经网络:Carlini

et

al."Cryptanalyticextractionofneuralnetworkmodels."

AnnualInternationalCryptologyConference,2020.基于替代模型的攻击Yuan,Xiaoyong,eta

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论