TextBugger应用程序对抗性文本生成技术

上传人：贾*** IP属地：四川上传时间：2022-08-25 格式：PPTX 页数：36 大小：2.31MB 积分：25 举报 版权申诉

已阅读5页，还剩31页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

1、TextBugger应用程序对抗性文本生成技术TextBugger: Generating Adversarial Text Against Real-world ApplicationsMachine Learning For Natural Language ProcessingMachine Learning For Multiple Tasks2Question AnsweringInformation RetrievalSentiment AnalysisMachine TranslationInformation ExtractionMachine Learning As A Se

2、rvice For NLP3Breaking Thing Is EasyRecent works have revealed the vulnerabilities of DNNs in image and speech domainThe DNNs for image classification are vulnerable to adversarial images. Goodfellow et al., ICLR15Automatic speech recognition systems can be broken down by adversarial audios in physi

3、cal world. Yuan et al., USENIX184Do the adversarial examples also exist in text domain?Are the MLaaS for NLP also vulnerable to adversarial examples?Preliminaries5Adversarial TextWhat is the adversarial text?Carefully generated by adding small perturbations to the legitimate text.What is the challen

4、ge for generating adversarial texts?The discrete property of text makes it hard to optimize.Small perturbations in text are usually clearly perceptible.Replacement of a single word may drastically alter the semantics of the sentence.6Related Works For Generating Adversarial Texts7Gradient-based Meth

5、odsModifying an input text repetitively until it is misclassified. Papernot et al., MILCOM 16Changing one token to another by a gradient-based optimization method. Ebrahimi et al. ,NAACL 18Perturbing the important words determined by embedding gradient with hand-crafted synonyms. Samanta et al., arX

6、iv17Out-of-vocabulary WordsBreaking machine learning systems down by random character manipulations. Belinkov et al., ICLR 18Attacking black-box models by applying random character perturbations. Gao et al. SPW 18Changing the toxicity score of the texts by adding spaces or dots between characters. H

7、osseini et al., arXiv 17Related Works For Generating Adversarial Texts8Replace with Semantically/Syntactically Similar WordsOnly replacing words with semantically similar ones. Alzantot et al., arXiv 18Replacing tokens by random words of the same POS tag with a probability proportional to the embedd

8、ing similarity. Ribeiro et al., ACL 18Other MethodsAttacking reading comprehension systems by adding distracting sentences to the input document. Jia et al., EMNLP 17Generating adversarial sequence by Generative Adversarial Networks (GANs). Zhao et al., ICLR 18Limitations9These works are limited in

9、practice due to at least one of the following reasons:Limited to short textsSignificantly affect the original meaningNeed hand-crafted synonyms and typosRequires manual intervention to polish the added sentencesNot computationally efficientTextBugger10Black-box Attack ModelWhite-box Attack ModelAtta

10、ck ModelOnline PlatformOfine ModelText ClassicationCondence valueGradient informationWord EmbeddingNoiseAdversarial Textfeed backTextvectorsFramework For TextBugger11Threat ModelWhite-boxHave complete knowledge about the targeted modelBlack-boxDo not know the model architecture, parameters or traini

11、ng dataOnly capable of querying the targeted model with output as the prediction or confidence scores12Step 1: Finding Important WordsWhite-box attackFind important words by gradient information.Denotes:! is the input text, # is the $%& word in !.is the confidence value of the (%& class.is the impor

12、tance of word #.) is the total number of words in !.* is the total number of classes.13Step 1: Finding Important WordsSentence: It is so laddish and juvenile, only teenage boys could possibly find it funny .Black-box attackFind important sentences!#$%#%$ !() *according to Delete sentences in !#$%#%$

13、 ifFind important words for each sentence in !#$%#%$Denotes:*+ is the ,-. sentence in the input text /.is *+s confidence value of the predicted class 1.!#$%#%$is the important sentences set.is the importance of word *+,is the importance of the 2-. word in *+.14Step 2: Bugs GenerationCharacter-level

14、perturbation: out-of-vocabulary phenomenonInsert: Insert a space into the word.Delete: Delete a random character of the word.Swap: Swap random two adjacent letters in the word.Substitute-C (Sub-C) : Replace characters with visually similar characters or adjacent characters in the keyboard.Word-level

15、 perturbation: nearest neighbor searching in the embedding spaceSubstitute-W (Sub-W) : Replace a word with its top k nearest neighbors in a context-awareword vector space.15Step 3: Replacing Important Word By Generated BugOptimal bug selectionchoose the optimal bug according to the change of the con

16、fidence valueImportant word replacementReplace the important word by the selected optimal bugRepeat until “convergence”the semantic similarity is below the thresholdthe new text is misclassified by the classifier16Attack Evaluation17Case StudySentiment AnalysisToxic Content Detection18Attack Evaluat

17、ion: Sentiment AnalysisBaseline AlgorithmsWhite-box: Random, FGSM+NNS (Nearest Neighbor Search), DeepFool+NNSBlack-box: DeepWordBugDatasetIMDB: 50,000 positive and negative movie reviewsRotten Tomatoes Movie Reviews (MR): 5,331 positive and 5,331 negative snippetsTargeted ModelWhite-box models: LR,

18、CNN, LSTMReal-world Online Platforms:19Attack Evaluation: Sentiment AnalysisEvaluation MetricsEdit DistanceJaccard Similarity CoefficientEuclidean DistanceSemantic Similarity20Important Words Selected By TextBugger21Generated Adversarial TextsSuccessful Attack Examples22Attack Performance: Effective

19、ness And EfficiencyWhite-box AttackRemarksChoosing important words to modify is necessary.Effective: TextBugger has high attack success rate on all models and performs better than baselines.Evasive : TextBugger perturbs few words to fool the models.23Attack Performance: Effectiveness And EfficiencyB

20、lack-box AttackRemarksEffective: TextBugger has higher attack success rate against all online platforms than DeepWordBug.Evasive: TextBugger only perturbs fewer words than DeepWordBug.Efficient: TextBugger spends less time than DeepWordBug.24Attack Performance: Change Of ConfidenceSentiment Score Di

21、stributionRemarksTextBugger greatly changes the confidence value of the classification results.IBM Watson is more sensitive to the adversarial texts generated by TextBugger.25Utility Analysis: White-box Attack(a) IMDBRemarksThe generated adversarial texts preserve good word-level and vector-level ut

22、ility.26Utility Analysis: Black-box Attack(a) IMDBRemarksTextBugger generates higher quality adversarial texts than DeepWordBug.27The Impact Of Document LengthThe Impact of Document Length on Attack PerformanceRemarksLength has little impact on the success rate, but may decrease the change of negati

23、ve classs confidence value.The time required for generating one adversarial text increases slightly as the length grows.28The Impact Of Document Length29The Impact of Document Length on The Utility of Generated Adversarial Texts.RemarksLonger document length leads to more perturbed words.The increas

24、ing perturbed words do not decrease the semantic similarity of the adversarial texts.Bug DistributionRemarksAzure and AWS are sensitive to the insert bugWatson and fastText are sensitive to Sub-CDelete and Sub-W are used less than others30Further AnalysisUser StudyTransferability31TransferabilityRem

25、arksTransferability also exists in adversarial texts among models and online platforms.Transferability can be used to attack online platforms even they have call limits.32User Study(a) The distribution of all mistakes in the samples.(b) The proportion of found bugs accounting for each kind of bug added in the samples.33RemarksAdversarial texts generated by TextBugger are hard to distinguish.The

人人文库> 全部分类> 专业文献 > IT计算机

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

TextBugger应用程序对抗性文本生成技术

文档简介

温馨提示

最新文档

评论

TextBugger应用程序对抗性文本生成技术

文档简介

温馨提示

最新文档

评论

相关文档