![CSET:人工智能生成代码的网络安全风险(英文)_第1页](http://file4.renrendoc.com/view14/M01/3D/27/wKhkGWdiCSyALEfwAAEy8iKpD6k867.jpg)
![CSET:人工智能生成代码的网络安全风险(英文)_第2页](http://file4.renrendoc.com/view14/M01/3D/27/wKhkGWdiCSyALEfwAAEy8iKpD6k8672.jpg)
![CSET:人工智能生成代码的网络安全风险(英文)_第3页](http://file4.renrendoc.com/view14/M01/3D/27/wKhkGWdiCSyALEfwAAEy8iKpD6k8673.jpg)
![CSET:人工智能生成代码的网络安全风险(英文)_第4页](http://file4.renrendoc.com/view14/M01/3D/27/wKhkGWdiCSyALEfwAAEy8iKpD6k8674.jpg)
![CSET:人工智能生成代码的网络安全风险(英文)_第5页](http://file4.renrendoc.com/view14/M01/3D/27/wKhkGWdiCSyALEfwAAEy8iKpD6k8675.jpg)
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
ExecutiveSummary
Recentdevelopmentshaveimprovedtheabilityoflargelanguagemodels(LLMs)andotherAIsystemstogeneratecomputercode.Whilethisispromisingforthefieldof
softwaredevelopment,thesemodelscanalsoposedirectandindirectcybersecurity
risks.Inthispaper,weidentifythreebroadcategoriesofriskassociatedwithAIcodegenerationmodels:1)modelsgeneratinginsecurecode,2)modelsthemselvesbeingvulnerabletoattackandmanipulation,and3)downstreamcybersecurityimpactssuchasfeedbackloopsintrainingfutureAIsystems.
Existingresearchhasshownthat,underexperimentalconditions,AIcodegenerationmodelsfrequentlyoutputinsecurecode.However,theprocessofevaluatingthe
securityofAI-generatedcodeishighlycomplexandcontainsmanyinterdependentvariables.TofurtherexploretheriskofinsecureAI-writtencode,weevaluated
generatedcodefromfiveLLMs.Eachmodelwasgiventhesamesetofprompts,whichweredesignedtotestlikelyscenarioswherebuggyorinsecurecodemightbe
produced.Ourevaluationresultsshowthatalmosthalfofthecodesnippetsproducedbythesefivedifferentmodelscontainbugsthatareoftenimpactfulandcould
potentiallyleadtomaliciousexploitation.Theseresultsarelimitedtothenarrowscopeofourevaluation,butwehopetheycancontributetothelargerbodyofresearch
surroundingtheimpactsofAIcodegenerationmodels.
Givenbothcodegenerationmodels’currentutilityandthelikelihoodthattheircapabilitieswillcontinuetoimprove,itisimportanttomanagetheirpolicyandcybersecurityimplications.Keyfindingsincludethebelow.
●IndustryadoptionofAIcodegenerationmodelsmayposeriskstosoftware
supplychainsecurity.However,theseriskswillnotbeevenlydistributedacrossorganizations.Larger,morewell-resourcedorganizationswillhaveanadvantageoverorganizationsthatfacecostandworkforceconstraints.
●Multiplestakeholdershaverolestoplayinhelpingtomitigatepotentialsecurity
risksrelatedtoAI-generatedcode.TheburdenofensuringthatAI-generated
codeoutputsaresecureshouldnotrestsolelyonindividualusers,butalsoonAIdevelopers,organizationsproducingcodeatscale,andthosewhocanimprove
securityatlarge,suchaspolicymakingbodiesorindustryleaders.Existing
guidancesuchassecuresoftwaredevelopmentpracticesandtheNIST
CybersecurityFrameworkremainsessentialtoensurethatallcode,regardlessofauthorship,isevaluatedforsecuritybeforeitentersproduction.Other
cybersecurityguidance,suchassecure-by-designprinciples,canbeexpandedto
CenterforSecurityandEmergingTechnology|1
includecodegenerationmodelsandotherAIsystemsthatimpactsoftwaresupplychainsecurity.
●Codegenerationmodelsalsoneedtobeevaluatedforsecurity,butitiscurrentlydifficulttodoso.Evaluationbenchmarksforcodegenerationmodelsoftenfocusonthemodels’abilitytoproducefunctionalcodebutdonotassesstheirabilitytogeneratesecurecode,whichmayincentivizeadeprioritizationofsecurityover
functionalityduringmodeltraining.Thereisinadequatetransparencyaround
models’trainingdata—orunderstandingoftheirinternalworkings—toexplorequestionssuchaswhetherbetterperformingmodelsproducemoreinsecure
code.
CenterforSecurityandEmergingTechnology|2
TableofContents
ExecutiveSummary 1
Introduction 4
Background 5
WhatAreCodeGenerationModels? 5
IncreasingIndustryAdoptionofAICodeGenerationTools 7
RisksAssociatedwithAICodeGeneration 9
CodeGenerationModelsProduceInsecureCode 9
Models’VulnerabilitytoAttack 11
DownstreamImpacts 13
ChallengesinAssessingtheSecurityofCodeGenerationModels 15
IsAIGeneratedCodeInsecure? 18
Methodology 18
EvaluationResults 22
UnsuccessfulVerificationRates 22
VariationAcrossModels 24
SeverityofGeneratedBugs 25
Limitations 26
PolicyImplicationsandFurtherResearch 28
Conclusion 32
Authors 33
Acknowledgments 33
AppendixA:Methodology 34
AppendixB:EvaluationResults 34
Endnotes 35
CenterforSecurityandEmergingTechnology|3
Introduction
AdvancementsinartificialintelligencehaveresultedinaleapintheabilityofAI
systemstogeneratefunctionalcomputercode.Whileimprovementsinlargelanguage
modelshavedrivenagreatdealofrecentinterestandinvestmentinAI,codegenerationhasbeenaviableusecaseforAIsystemsforthelastseveralyears.
SpecializedAIcodingmodels,suchascodeinfillingmodelswhichfunctionsimilarlyto“autocompleteforcode,”and“general-purpose”LLM-basedfoundationmodelsare
bothbeingusedtogeneratecodetoday.Anincreasingnumberofapplicationsand
softwaredevelopmenttoolshaveincorporatedthesemodelstobeofferedasproductseasilyaccessiblebyabroadaudience.
Thesemodelsandassociatedtoolsarebeingadoptedrapidlybythesoftware
developercommunityandindividualusers.AccordingtoGitHub’sJune2023survey,92%ofsurveyedU.S.-baseddevelopersreportusingAIcodingtoolsinandoutof
work.1AnotherindustrysurveyfromNovember2023similarlyreportedahighusagerate,with96%ofsurveyeddevelopersusingAIcodingtoolsandmorethanhalfof
respondentsusingthetoolsmostofthetime.2Ifthistrendcontinues,LLM-generatedcodewillbecomeanintegralpartofthesoftwaresupplychain.
ThepolicychallengeregardingAIcodegenerationisthatthistechnological
advancementpresentstangiblebenefitsbutalsopotentialsystemicrisksforthe
cybersecurityecosystem.Ontheonehand,thesemodelscouldsignificantlyincreaseworkforceproductivityandpositivelycontributetocybersecurityifappliedinareas
suchasvulnerabilitydiscoveryandpatching.Ontheotherhand,researchhasshownthatthesemodelsalsogenerateinsecurecode,posingdirectcybersecurityrisksif
incorporatedwithoutproperreview,aswellasindirectrisksasinsecurecodeendsupinopen-sourcerepositoriesthatfeedintosubsequentmodels.
Asdevelopersincreasinglyadoptthesetools,stakeholdersateverylevelofthe
softwaresupplychainshouldconsidertheimplicationsofwidespreadAI-generated
code.AIresearchersanddeveloperscanevaluatemodeloutputswithsecurityinmind,programmersandsoftwarecompaniescanconsiderhowthesetoolsfitintoexisting
security-orientedprocesses,andpolicymakershavetheopportunitytoaddressbroadercybersecurityrisksassociatedwithAI-generatedcodebysettingappropriate
guidelines,providingincentives,andempoweringfurtherresearch.ThisreportprovidesanoverviewofthepotentialcybersecurityrisksassociatedwithAI-generatedcodeanddiscussesremainingresearchchallengesforthecommunityandimplicationsforpolicy.
CenterforSecurityandEmergingTechnology|4
Background
WhatAreCodeGenerationModels?
CodegenerationmodelsareAImodelscapableofgeneratingcomputercodein
responsetocodeornatural-languageprompts.Forexample,ausermightprompta
modelwith“WritemeafunctioninJavathatsortsalistofnumbers”andthemodelwilloutputsomecombinationofcodeandnaturallanguageinresponse.Thiscategoryof
modelsincludesbothlanguagemodelsthathavebeenspecializedforcodegenerationaswellasgeneral-purposelanguagemodels—alsoknownas“foundationmodels”—
thatarecapableofgeneratingothertypesofoutputsandarenotexplicitlydesignedto
outputcode.ExamplesofspecializedmodelsincludeAmazonCodeWhisperer,
DeepSeekCoder,WizardCoder,andCodeLlama,whilegeneral-purposemodelsincludeOpenAI’sGPTseries,Mistral,Gemini,andClaude.
Earlieriterationsofcodegenerationmodels—manyofwhichpredatedthecurrent
generationofLLMsandarestillinwidespreaduse—functionedsimilarlyto
“autocompleteforcode,”inwhichamodelsuggestsacodesnippettocompletealine
asausertypes.These“autocomplete”models,whichperformwhatisknownascode
infilling,aretrainedspecificallyforthistaskandhavebeenwidelyadoptedinsoftwaredevelopmentpipelines.Morerecentimprovementsinlanguagemodelcapabilitieshaveallowedformoreinteractivity,suchasnatural-languagepromptingorauserinputtingacodesnippetandaskingthemodeltocheckitforerrors.Likegeneral-purposelanguagemodels,userscommonlyinteractwithcodegenerationmodelsviaadedicatedinterfacesuchasachatwindoworaplugininanotherpieceofsoftware.Recently,specialized
scaffoldingsoftwarehasfurtherincreasedwhatAImodelsarecapableofincertaincontexts.Forinstance,somemodelsthatcanoutputcodemayalsobecapableof
executingthatcodeanddisplayingtheoutputstotheuser.3
Aslanguagemodelshavegottenlargerandmoreadvancedoverthepastfewyears,
theircodegenerationcapabilitieshaveimprovedinstepwiththeirnaturallanguage-
generationcapabilities.4Codinglanguagesare,afterall,intentionallydesignedto
encodeandconveyinformation,andhavetheirownrulesandsyntacticalexpectationsmuchlikehumanlanguages.Researchersinthefieldofnaturallanguageprocessing
(NLP)havebeeninterestedintranslatingbetweennaturallanguageandcomputercode
formanyyears,butthesimultaneousintroductionoftransformer-basedlanguage
modelarchitecturesandlargedatasetscontainingcodeledtoarapidimprovementincodegenerationcapabilitiesbeginningaround2018–2019.Asnewmodelswere
released,researchersalsobeganexploringwaystomakethemmoreaccessible.Inmid-2021,forexample,OpenAIreleasedthefirstversionofCodex,aspecializedlanguage
CenterforSecurityandEmergingTechnology|5
modelforcodegeneration,alongwiththeHumanEvalbenchmarkforassessingthe
correctnessofAIcodeoutputs.5Github,inpartnershipwithOpenAI,thenlauncheda
previewofaCodex-poweredAIpairprogrammingtoolcalledGithubCopilot.6Althoughitinitiallyfunctionedmoresimilarlyto“autocompleteforcode”thanacurrent-
generationLLMchatbot,GithubCopilot’srelativeaccessibilityandearlysuccesshelped
spurinterestincodegenerationtoolsamongprogrammers,manyofwhomwereinterestedinadoptingAItoolsforbothworkandpersonaluse.
Tobecomeproficientatcodegeneration,modelsneedtobetrainedondatasets
containinglargeamountsofhuman-writtencode.Modernmodelsareprimarilytrainedonpublicly-available,open-sourcecode.7Muchofthiscodewasscrapedfromopen-
sourcewebrepositoriessuchasGithub,whereindividualsandcompaniescanstore
andcollaborateoncodingprojects.Forexample,thefirstversionofthe6-terabyte
datasetknownasTheStackconsistsofsourcecodefilesin358differentprogramminglanguages,andhasbeenusedtopretrainseveralopencodegenerationmodels.8Otherlanguagemodeltrainingdatasetsareknowntocontaincodeinadditiontonatural-
languagetext.The825-gigabytedatasetcalledThePilecontains95gigabytesofGithubdataand32gigabytesscrapedfromStackExchange,afamilyofquestion-answeringforumsthatincludescodesnippetsandothercontentrelatedto
programming.9However,thereisoftenlimitedvisibilityintothedatasetsthat
developersusefortrainingmodels.Wecanspeculatethatthemajorityofcodebeing
usedtotraincodegenerationmodelshasbeenscrapedfromopen-sourcerepositories,butotherdatasetsusedfortrainingmaycontainproprietarycodeorsimplybeexcludedfrommodelcardsorotherformsofdocumentation.
Additionally,somespecializedmodelsarefine-tunedversionsofgeneral-purpose
models.Usually,theyarecreatedbytraininggeneral-purposemodelswithadditional
dataspecifictotheusecase.Thisisparticularlylikelyininstanceswherethemodel
needstotranslatenatural-languageinputsintocode,asgeneral-purposemodelstendtobebetteratfollowingandinterpretinguserinstructions.OpenAI’sCodexisonesuchexample,asitwascreatedbyfine-tuningaversionofthegeneral-purposeGPT-3
modelon159gigabytesofPythoncodescrapedfromGithub.10CodeLlamaandCodeLlamaPython—basedonMeta’sLlama2model—areotherexamplesofsuchmodels.
ResearchinterestinAIcodegenerationhasconsistentlyincreasedinthepastdecade,especiallyexperiencingasurgeinthepastyearfollowingthereleaseofhigh-
performingfoundationmodelssuchasGPT-4andopen-sourcemodelssuchasLlama2.Figure1illustratesthetrendbycountingthenumberofresearchpapersoncode
generationbyyearfrom2012–2023.Thenumberofresearchpapersoncode
CenterforSecurityandEmergingTechnology|6
generationmorethandoubledfrom2022to2023,demonstratingagrowingresearchinterestinitsusage,evaluation,andimplications.
Figure1:NumberofPapersonCodeGenerationbyYear*
Source:CSET’sMergedAcademicCorpus.
IncreasingIndustryAdoptionofAICodeGenerationTools
Codegenerationpresentsoneofthemostcompellingandwidelyadoptedusecasesforlargelanguagemodels.InadditiontoclaimsfromorganizationssuchasMicrosoftthattheirAIcodingtoolGitHubCopilothad1.8millionpaidsubscribersasofspring2024,
upfrommorethanamillioninmid-2023,11softwarecompaniesarealsoadopting
*ThisgraphcountsthenumberofpapersinCSET’sMergedAcademicCorpusthatcontainthe
keywords“codegeneration,”“AI-assistedprogramming,”“AIcodeassistant,”“codegenerating
LLM,”or“codeLLM”andarealsoclassifiedasAI-orcybersecurity-relatedusingCSET’sAIclassifierandcybersecurityclassifier.NotethatatthetimeofwritinginFebruary2024,CSET’sMerged
AcademicCorpusdidnotyetincludeallpapersfrom2023duetoupstreamcollectionlags,which
mayhaveresultedinanundercountingofpapersin2023.ThecorpuscurrentlyincludesdatafromClarivate’sWebofScience,TheLens,arXiv,PaperswithCode,SemanticScholar,andOpenAlex.
MoreinformationregardingourmethodologyforcompilingtheMergedAcademicCorpusaswellasbackgroundonourclassifiersandadetailedcitationofdatasourcesareavailablehere:
https://eto.tech/dataset-docs/mac/
;
/publication/identifying-ai-research/.
CenterforSecurityandEmergingTechnology|7
internalversionsofthesemodelsthathavebeentrainedonproprietarycodeand
customizedforemployeeuse.GoogleandMetahavecreatednon-public,customcodegenerationmodelsintendedtohelptheiremployeesdevelopnewproductsmore
efficiently.12
ProductivityisoftencitedasoneofthekeyreasonsindividualsandorganizationshaveadoptedAIcodegenerationtools.Metricsformeasuringhowmuchdeveloper
productivityimprovesbyleveragingAIcodegenerationtoolsvarybystudy.Asmall
GitHubstudyusedbothself-perceivedproductivityandtaskcompletiontimeas
productivitymetrics,buttheauthorsacknowledgedthatthereislittleconsensusaboutwhatmetricstouseorhowproductivityrelatestodeveloperwell-being.13AMcKinseystudyusingsimilarmetricsclaimedthatsoftwaredevelopersusinggenerativeAItoolscouldcompletecodingtasksuptotwiceasfastasthosewithoutthem,butthatthesebenefitsvarieddependingontaskcomplexityanddeveloperexperience.14Companieshavealsoruninternalproductivitystudieswiththeiremployees.AMetastudyontheirinternalcodegenerationmodelCodeComposeusedmetricssuchascodeacceptancerateandqualitativedeveloperfeedbacktomeasureproductivity,findingthat20%of
usersstatedthatCodeComposehelpedthemwritecodemorequickly,whileaGooglestudyfounda6%reductionincodingiterationtimewhenusinganinternalcode
completionmodelascomparedtoacontrolgroup.15Morerecently,aSeptember2024studyanalyzingdatafromrandomizedcontroltrialsacrossthreedifferentorganizationsfounda26%increaseinthenumberofcompletedtasksamongdevelopersusing
GitHubCopilotasopposedtodeveloperswhowerenotgivenaccesstothetool.16Moststudiesareinagreementthatcodegenerationtoolsimprovedeveloperproductivityin
general,regardlessoftheexactmetricsused.
AIcodegenerationtoolsareundoubtedlyhelpfultosomeprogrammers,especially
thosewhoseworkinvolvesfairlyroutinecodingtasks.(Generally,themorecommonacodingtaskorcodinglanguage,thebetteracodegenerationmodelcanbeexpectedtoperformbecauseitismorelikelytohavebeentrainedonsimilarexamples.)Automatingrotecodingtasksmayfreeupemployees’timeformorecreativeorcognitively
demandingwork.TheamountofsoftwarecodegeneratedbyAIsystemsisexpectedtoincreaseinthenear-tomedium-termfuture,especiallyasthecodingcapabilitiesof
today’smostaccessiblemodelscontinuetoimprove.
Broadlyspeaking,evidencesuggeststhatcodegenerationtoolshavebenefitsatboththeindividualandorganizationallevels,andthesebenefitsarelikelytoincreaseover
timeasmodelcapabilitiesimprove.Therearealsoplentyofincentives,suchaseaseof
accessandpurportedproductivitygains,fororganizationstoadopt—oratleastexperimentwith—AIcodegenerationforsoftwaredevelopment.
CenterforSecurityandEmergingTechnology|8
RisksAssociatedwithAICodeGeneration
Thistechnologicalbreakthrough,however,mustalsobemetwithcaution.Increasing
usageofcodegenerationmodelsinroutinesoftwaredevelopmentprocessesmeans
thatthesemodelswillsoonbeanimportantpartofthesoftwaresupplychain.Ensuringthattheiroutputsaresecure—orthatanyinsecureoutputstheyproduceareidentifiedandcorrectedbeforecodeentersproduction—willalsobeincreasinglyimportantfor
cybersecurity.However,codegenerationmodelsareseldomtrainedwithsecurityasabenchmarkandareinsteadoftentrainedtomeetvariousfunctionalitybenchmarkssuchasHumanEval,asetof164human-writtenprogrammingproblemsintendedto
evaluatemodels’code-writingcapabilityinthePythonprogramminglanguage.17Asthe
functionalityofthesecodegenerationmodelsincreasesandmodelsareadoptedintothestandardroutineoforganizationsanddevelopers,overlookingthepotential
vulnerabilitiesofsuchcodemayposesystemiccybersecurityrisks.
Theremainderofthissectionwillexaminethreepotentialsourcesofriskingreater
detail:1)codegenerationmodels’likelihoodofproducinginsecurecode,2)themodels’vulnerabilitytoattacks,and3)potentialdownstreamcybersecurityimplicationsrelatedtothewidespreaduseofcodegenerationmodels.
CodeGenerationModelsProduceInsecureCode
Anemergingbodyofresearchonthesecurityofcodegenerationmodelsfocusesonhowtheymightproduceinsecurecode.Thesevulnerabilitiesmaybecontainedwithinthecodeitselforinvolvecodethatcallsapotentiallyvulnerableexternalresource.
Human-computerinteractionfurthercomplicatesthisproblem,as1)usersmay
perceiveAI-generatedcodeasmoresecureormoretrustworthythanhuman-
generatedcode,and2)researchersmaybeunabletopinpointexactlyhowtostopmodelsfromgeneratinginsecurecode.Thissectionexploresthesevarioustopicsinmoredetail.
Firstly,variouscodegenerationmodelsoftensuggestinsecurecodeasoutputs.Pearceetal.(2021)showthatapproximately40%ofthe1,689programsgeneratedbyGithubCopilot18werevulnerabletoMITRE’s“2021CommonWeaknessEnumerations(CWE)Top25MostDangerousSoftwareWeaknesses”list.19SiddiqandSantos(2022)foundthatoutof130codesamplesgeneratedusingInCoderandGithubCopilot,68%and
73%ofthecodesamplesrespectivelycontainedvulnerabilitieswhenchecked
manually.20Khouryetal.(2023)usedChatGPTtogenerate21programsinfive
differentprogramminglanguagesandtestedforCWEs,showingthatonlyfiveoutof21wereinitiallysecure.Onlyafterspecificpromptingtocorrectthecodedidan
CenterforSecurityandEmergingTechnology|9
additionalsevencasesgeneratesecurecode.21Fuetal.(2024)showthatoutof452real-worldcasesofcodesnippetsgeneratedbyGithubCopilotfrompubliclyavailableprojects,32.8%ofPythonand24.5%ofJavaScriptsnippetscontained38different
CWEs,eightofwhichbelongtothe2023CWETop25list.22
Incertaincodinglanguages,codegenerationmodelsarealsolikelytoproducecodethatcallsexternallibrariesandpackages.Theseexternalcodesourcescanpresenta
hostofproblems,somesecurity-relevant:Theymaybenonexistentandmerely
hallucinatedbythemodel,outdatedandunpatchedforvulnerabilities,ormaliciousin
nature(suchaswhenattackersattempttotakeadvantageofcommonmisspellingsinURLsorpackagenames).23Forexample,VulcanCybershowedthatChatGPTroutinelyrecommendednonexistentpackageswhenansweringcommoncodingquestions
sourcedfromStackOverflow—over40outof201questionsinNode.jsandover80outof227questionsinPythoncontainedatleastonenonexistentpackageintheanswer.24Furthermore,someofthesehallucinatedlibraryandpackagenamesarepersistent
acrossbothusecasesanddifferentmodels;asafollow-upstudydemonstrated,a
potentialattackercouldeasilycreateapackagewiththesamenameandgetuserstounknowinglydownloadmaliciouscode.25
Despitetheseempiricalresults,thereareearlyindicationsthatusersperceiveAI-
generatedcodetobemoresecurethanhuman-writtencode.This“automationbias”
towardsAI-generatedcodemeansthatusersmayoverlookcarefulcodereviewand
acceptinsecurecodeasitis.Forinstance,ina2023industrysurveyof537technologyandITworkersandmanagers,76%respondedthatAIcodeismoresecurethanhumancode.26Perryetal.(2023)furthershowedinauserstudythatstudentparticipantswithaccesstoanAIassistantwrotesignificantlylesssecurecodethanthosewithout
access,andweremorelikelytobelievethattheywrotesecurecode.27However,thereissomedisagreementonwhetherornotusersofAIcodegenerationtoolsaremorelikelytowriteinsecurecode;otherstudiessuggestthatuserswithaccesstoAIcode
assistantsmaynotbesignificantlymorelikelytoproduceinsecurecodethanusers
withoutAItools.28Thesecontradictoryfindingsraiseaseriesofrelatedquestions,suchas:Howdoesauser’sproficiencywithcodingaffecttheiruseofcodegeneration
models,andtheirlikelihoodofacceptingAI-generatedcodeassecure?Could
automationbiasleadhumanprogrammerstoaccept(potentiallyinsecure)AI-generatedcodeassecuremoreoftenthanhuman-authoredcode?Regardless,thefactthatAI
codingtoolsmayprovideinexperienceduserswithafalsesenseofsecurityhas
cybersecurityimplicationsifAI-generatedcodeismoretrustedandlessscrutinizedforsecurityflaws.
CenterforSecurityandEmergingTechnology|10
Furthermore,thereremainsuncertaintyaroundwhycodegenerationmodelsproduceinsecurecodeinthefirstplace,andwhatcausesvariationinthesecurityofcode
outputsacrossandwithinmodels.Partoftheanswerliesinthatmanyofthesemodelsaretrainedoncodefromopen-sourcerepositoriessuchasGithub.Theserepositories
containhuman-authoredcodewithknownvulnerabilities,largelydonotenforcesecure
codingpractices,andlackdatasanitizationprocessesforremovingcodewitha
significantnumberofknownvulnerabilities.Recentworkhasshownthatsecurity
vulnerabilitiesinthetrainingdatacanleakintooutputsoftransformer-basedmodels,
whichdemonstratesthatvulnerabilitiesintheunderlyingtrainingdatacontributetotheproblemofinsecurecodegeneration.29Addingtothechallenge,thereisoftenlittleto
notransparencyinexactlywhatcodewasincludedintrainingdatasetsandwhetherornotanyattemptsweremadetoimproveitssecurity.
Manyotheraspectsofthequestionofhow—andwhy—codegenerationmodelsproduceinsecurecodearestillunanswered.Forexample,a2023Metastudythat
comparedseveralversionsofLlama2,CodeLlama,andGPT-3.5and4foundthat
modelswithmoreadvancedcodingcapabilitiesweremorelikelytooutputinsecure
code.30Thissuggestsapossibleinverserelationshipbetweenfunctionalityandsecurityincodegenerationmodelsandshouldbeinvestigatedfurther.Inanotherexample,
researchersconductedacomparativestudyoffourmodels–GPT-3.5,GPT-4,Bard,
andGemini–andfoundthatpromptingmodelstoadopta“securitypersona”eliciteddivergentresults.31WhileGPT-3.5,GPT-4,andBardsawareductioninthenumberofvulnerabilitiesfromthenormalpersona,Gemini’scodeoutputcontainedmore
vulnerabilities.32Theseearlystudieshighlightsomeoftheknowledgegapsconcerning
howinsecurecodeoutputsaregeneratedandhowtheychangeinresponsetovariablessuchasmodelsizeandpromptengineering.
Models’VulnerabilitytoAttack
Inadditiontothecodethattheyoutput,codegenerationmodelsaresoftwaretoolsthatneedtobeproperlysecured.AImodelsarevulnerabletohacking,tampering,or
manipulationinwaysthathumansarenot.33Figure2illustratesthecodegenerationmodeldevelopmentworkflow,wheretheportionsinredindicatevariouswaysa
maliciouscyberactormayattackamodel.
CenterforSecurityandEmergingTechnology|11
Figure2:CodeGenerationModelDevelopmentWorkflowandItsCybersecurityImplications
Source:CSET.
GenerativeAIsystemshaveknownvulnerabilitiestoseveraltypesofadversarial
attacks.Theseincludedatapoisoningattacks,inwhichanattackercontaminatesamodel’strainingdatatoelicitadesiredbehavior,andbackdoorattacks,inwhichan
attackerattemptstoproduceaspecificoutputbypromptingthemodelwitha
predeterminedtriggerphrase.Inthecodegenerationcontext,adatapoisoningattack
maylooklikeanattackermanipulatingamodel’strainingdatatoincreaseitslikelihoodofproducingcodethatimportsamaliciouspackageorlibrary.Abackdoorattackonthemodelitself,meanwhile,coulddramaticallychangeamodel’sbehaviorwithasingle
triggerthatmaypersistevenifdeveloperstrytoremoveit.34Thischangedbehaviorcanresultinanoutputthatviolatesrestrictionsplacedonthemodelbyitsdevelopers(suchas“don’tsuggestcodepatternsassociatedwithmalware”)orthatmayreveal
unwantedorsensitiveinformation.Researchershavepointedoutthatbecausecodegenerationmodelsaretrainedonlargeamountsofdatafromafinitenumberof
unsanitizedcoderepositories,attackerscouldeasilyseedthese
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年全球及中国冷冻广式点心行业头部企业市场占有率及排名调研报告
- 2025-2030全球半导体旋涂玻璃行业调研及趋势分析报告
- 2025年全球及中国高分辨率盘式离心粒度分析仪行业头部企业市场占有率及排名调研报告
- 2025销售合同天津步思特科技有限公司货物与售后服务
- 家庭装修合同书
- 2025二期消防水炮火灾自动报警及联动控制系统供货维修项目施工合同
- 2025钢筋劳务用工合同全面版
- 预拌混凝土采购合同
- 提高污水处理效果的技术改进研究
- 民间借款合同示范文本
- 2025版大学食堂冷链食材配送服务合同模板3篇
- 新能源发电项目合作开发协议
- 2025年上半年潞安化工集团限公司高校毕业生招聘易考易错模拟试题(共500题)试卷后附参考答案
- 2024年铁岭卫生职业学院高职单招职业技能测验历年参考题库(频考版)含答案解析
- 2025年山东鲁商集团有限公司招聘笔试参考题库含答案解析
- 大型活动中的风险管理与安全保障
- 课题申报书:个体衰老差异视角下社区交往空间特征识别与优化
- 江苏省招标中心有限公司招聘笔试冲刺题2025
- 综采工作面过空巷安全技术措施
- 云南省丽江市2025届高三上学期复习统一检测试题 物理 含解析
- 2025年集体经济发展计划
评论
0/150
提交评论