首个根据人类审查的安全指令进行微调的开源多语言模型

上传人：策*** IP属地：山西上传时间：2024-04-29 格式：DOCX 页数：57 大小：368.26KB 积分：19.9 举报 版权申诉

已阅读5页，还剩52页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

Preprint1arXiv:2404.00399v1arXiv:2404.00399v1AURORA-M:TheFirstOpenSourceMultilingualLanguageTaishiNakamura1*MayankMishra2*SimoneTedeschi3,4*YekunChai5JasonTStillermanFelixFriedrich6,7PrateekYadav8VuMinhChien9TerryYueZhuo10,11DigantaMisra12,13BenBogin14Xuan-SonVu15,16,17MarzenaKarpinska18ArnavVarmaDantuluriWojciechKusaTommasoFurlanelloRioYokota1NiklasMuennighoffSuhasPai19TosinAdewumi20VeronikaLaippalaXiaozheYao21AdalbertoJuniorAlpayAriyak22,23AleksandrDrozd24JordanClive25KshitijGupta12LiangyuChenQiSun1KenTsuiNoahPersaudNourFahmyTianlongChen8MohitBansal8NicolòMonti26TaiDang18ZiyangLuo27Tien-TungBui28RobertoNavigli3VirendraMehta29MatthewBlumberg30†VictorMay31,32†HuuNguyen32†SampoPyysalo33†1TokyoInstituteofTechnology2MIT-IBMWatsonLab3SapienzaUniversityofRome4Babelscape5LAION6TUDarmstadt7hessian.AI8UNCChapel-Hill9DetomoInc.10CSIRO’sData6111MonashUniversity12Mila-QuebecAIInstitute13CarnegieMellonUniversity14AllenInstituteforAI15WASPMedia&Language16UmeaUniversity17DeepTensorAB18UniversityofMassachusettsAmherst19HudsonLabs20LuleåUniversityofTechnology21ETHZurich22RunPod23OpenChat24RIKENCCS25ChattermillAI26ASC2727HongKongBaptistUniversity28DopikAIJSC29UniversityofTrento30GridRepublic31Chegg32Ontocord.AI33UniversityofTurkutaishi.nakamura@rio.gsic.titech.ac.jpmayank.mishra2@tedeschi@diag.uniroma1.itpraty@diganta.misra@mila.quebecmayvic@huu@ontocord.aisampo.pyysalo@utu.fiPretrainedlanguagemodelsunderpinseveralAIapplications,buttheirhighcomputationalcostfortraininglimitsaccessibility.InitiativessuchasBLOOMandStarCoderaimtodemocratizeaccesstopretrainedmodelsforcollaborativecommunitydevelopment.However,suchexistingmodelsfacechallenges:limitedmultilingualcapabilities,continualpretrainingcausingcatastrophicforgetting,whereaspretrainingfromscratchiscom-putationallyexpensive,andcompliancewithAIsafetyanddevelopment laws.ThispaperpresentsAURORA-M,a15Bparametermultilingualopen- sourcemodeltrainedonEnglish,Finnish,Hindi,Japanese,Vietnamese,*Equalcontribution†EqualmentoringPreprint2PerformancePerformance(↑)andcode.ContinuallypretrainedfromStarCoderPluson435billionad-ditionaltokens,AURORA-Msurpasses2trilliontokensintotaltrainingtokencount.Itisthefirstopen-sourcemultilingualmodelfine-tunedonhuman-reviewedsafetyinstructions,thusaligningitsdevelopmentnotonlywithconventionalred-teamingconsiderations,butalsowiththespe-cificconcernsarticulatedintheBiden-HarrisExecutiveOrderontheSafe,Secure,andTrustworthyDevelopmentandUseofArtificialIntelligence.AURORA-Misrigorouslyevaluatedacrossvarioustasksandlanguages,demonstratingrobustnessagainstcatastrophicforgettingandoutperform-ingalternativesinmultilingualsettings,particularlyinsafetyevaluations.Topromoteresponsibleopen-sourceLLMdevelopment,AURORA-Manditsvariantsarereleasedhere.1IntroductionAurora-M(Red-teamed)StarCoderBaseStarCoderPlusEnglishVietnameseCode(HumanEval)Code(EnglishVietnameseCode(HumanEval)Code(MultiPL-E)Code(MBPP)JapaneseAverage(HumanEvalFix)Figure1:ComparisonofoverallperformancebetweenAURORA-M-redteamedanditspredecessors,StarCoderBaseandStarCoderPlus,acrossdiversecodeandmultilinguallanguageevaluationbenchmarks.Pass@1performanceaveragesforcodebenchmarksarereported.Fornaturallanguageevaluations,0-shotaccuracyaveragesarereportedforlanguagesotherthanEnglishandJapanese.Englishevaluationis8-shot,whileJapaneseevaluationusesacombinationof4-shotand1-shot.LargeLanguageModels(LLMs)arefundamentaltoolsinartificialintelligence,poweringapplicationssuchasmachinetranslation,textsummarization,dialoguesystems,andcodegeneration.TheseLLMsarepretrainedonextensivetextdatatoenhancedownstreamtask-specificadaptation.However,theexcessivecomputationalexpenseofpretrainingLLMscreatesbarrierstoaccess,constrainingwiderdevelopment.Open-sourceinitiativessuchasBLOOM(Scaoetal.,2023),StarCoder(Lietal.,2023a),StarCoder-2(Lozhkovetal.,2024),Pythia(Bidermanetal.,2023),andOLMo(Groeneveldetal.,2024;Soldainietal.,2024)haveemergedtodemocratizeaccesstopretrainedLLMs.Theseinitiativesstimulateinnovation,allowingresearchersanddeveloperstoleverageexistingadvancements.However,despitetheircontributions,severalsignificantchallengespersistinthedomainofopen-sourceLLMdevelopment.Primarily,severalstudies(Bangetal.,2023;Jiaoetal.,2023;Hendyetal.,2023;Huangetal.,2023)haveunderscoredtheongoingstruggleofLLMswithnon-Englishtexts,particularlyinlow-orextremelylow-resourcelanguages.GiventhatthetrainingdatapredominantlyconsistsofEnglish,asnotedforinstancebyBrownetal.(2020)whoreportedthatEnglishaccountsfor93%ofGPT-3’strainingcorpus,thereisapressingneedtopromotethede-velopmentofmultilingualmodels(Chaietal.,2023)todemocratizeLLMsandalleviate3performancedisparitiesacrossdifferentlanguages.Secondly,continualpretraining–atech-niqueinvolvingfurtherupdatingpretrainedmodelsonnewdatadistributionstoenhancetheircapabilities–posesasignificantchallenge.Whilethisapproachholdspromiseforcomputationalsavingandperformanceimprovement,itoftenleadstocatastrophicforget-ting,wherethemodellosespreviouslyacquiredknowledge.Thischallengeisexacerbatedwhenconsideringthecontinualpretrainingofmodelsacrossadiversearrayofgrammaticalandlexicalstructures.Lastly,ensuringcompliancewithrecentregulationsmandatingsafeandsecureAIdevelopmentpracticesrepresentsanothercriticalaspectoftenoverlookedinopen-sourceLLMdevelopment,specifically,formultilingualmodels.ThispaperpresentsAURORA-M,anovelopen-sourcemultilingualLargeLanguageModel(LLM)with15billionparameters,tailoredtoaddresstheaforementionedlimitations.AURORA-Misdesignedtocatertosixlinguisticallydiverselanguages:English,Finnish,Hindi,Japanese,Vietnamese,andcode.AURORA-MiscontinuallypretrainedfromtheStarCoderPlusmodel(Lietal.,2023a)onanextensivedatasetcomprising435billiontokens,resultinginatotaltrainingtokencountofanimpressive2trilliontokens.ThisrigorouspretrainingregimenequipsAURORA-Mwithacomprehensiveunderstandingofdiverselanguagesandcode.Moreover,safetyisafundamentaldesignprincipleofAURORA-M.Itstandsoutasthefirstopen-sourcemultilingualLLMfine-tunedonacomprehensivecollectionofhuman-reviewedsafetyinstructionsaddressingconcernsintheBiden-HarrisExecutiveOrderonSafe,Secure,andTrustworthyDevelopmentandUseofArtificialIn-telligence(WhiteHouse,2023).Thisfine-tuningprocessnotonlyaddressesconventionalred-teamingconcerns(Gangulietal.,2022;Perezetal.,2022;Zhuoetal.,2023;Geetal.,2023)aimedattestingsystemvulnerabilities,butalsoalignswiththespecificsafetyandsecurityguidelinesoutlinedintheOrder.TocomprehensivelyevaluateAURORA-M’sefficacy,weconductarigorousexaminationacrossadiversespectrumoftasksspanningvariousdomainsandlanguages.Ourevalua-tionsaimtogaugeAURORA-M’scapacitytoretainpreviouslylearnedknowledgewhileacquiringnewcapabilitiesthroughcontinualpretraining.WedemonstratethatAURORA-MsuccessfullyavoidscatastrophicforgettingonEnglishandcodingtasks.Furthermore,webenchmarkAURORA-Magainststate-of-the-artmultilingualmodels,showcasingitscompetitiveperformanceinthesesettings.Additionally,safetyevaluationsareconductedtoscrutinizeAURORA-M’stendencytogenerateundesiredorpotentiallyillicitcontent.ThefindingsfromtheseassessmentsaffirmAURORA-M’scommitmenttosafetyandtheadherencetoresponsibleAIdevelopmentpractices.Ourcontributionscanbesuccinctlysummarizedasfollows.•WeintroduceAURORA-M,anew15Bcontinuallypretrainedred-teamedmultilin-gualLLMbuiltontopoftheStarCoderPlusmodel(Lietal.,2023a).•Wedevelopatwo-stagecurriculumofcontinualpretrainingconsistingofContinualAuxiliaryPretraining(CAP)andContinualAlignmentTuning(CAT)aimedatmaximizingadaptation,minimizingcatastrophicforgetting,andaligningAURORA-Mwithsafetyobjectives.•WeextensivelyevaluateAURORA-Macrossvarioustasksindifferentdomainsandlanguages,demonstratingitssuperiorperformanceinmultilingualsettingswhileretainingcompetitiveperformanceinEnglishandcoding.•Weconstructanewred-teamingdataset,named"TheBiden-HarrisRedteamDataset,"tailoredtoaddressconcernsoutlinedintheExecutiveOrderalongwithtypicalsafetyconcerns.Wethenfine-tuneAURORA-Monthisdatasetandevaluateonseveralsafetybenchmarks.•Weshowtheinfluenceofscalingthetotaltrainingtokensonvariousmultilingualandcodeevaluationtasks.DataCuration.ThecontinualpretrainingprocessfortrainingAURORA-Mfollowedacarefullydesignedtwo-stagecurriculum,asshowninFig.2.Inthefirststage,termedas442.32B11.2%41.28B10.9%133.71B35.4%40.93B10.8%12.39B23.5%15.7B29.8%6.2B11.8%42.32B11.2%41.28B10.9%133.71B35.4%40.93B10.8%12.39B23.5%15.7B29.8%6.2B11.8%5.53B10.5%1.68B3.19%5.63B10.7%5.57B10.6%61.4B61.4B16.3%44.27B11.7%13.58B3.6%Figure2:Trainingdatadistributionoflanguages,code,andinstructionsusedforthetwo-stagecontinualpretrainingoftheAURORA-Mmodel.Thereareatotalof377Band58BtokensintheContinualAuxiliaryPretraining(CAP)andContinualAlignmentTuning(CAT)stagesrespectively.ContinualAuxiliaryPretraining(CAP),alargecorpusofgeneralmultilingualwebdatawasusedtoexposethemodeltodiversedata,layingarobustfoundationforsubsequenttraining.Thesecondstage,termedasContinualAlignmentTuning(CAT)employedastrategicdata-mixingapproachtobolsterthemodel’sperformanceintargetedareasandalignitwithourpredefinedobjectives.FollowingTayloretal.(2022);Lietal.(2023b),wealsomixedinpubliclyavailableinstructiontuningdatasetsinbothstagesoftraining.InCAP,weincorporated377Btokensofprocessedandfilteredwebdatafromvarioussources,includingStack(Kocetkovetal.,2022),RefinedWeb(Penedoetal.,2023),RedPa-jama(Together,2023),andasubsetofthePile(Gaoetal.,2020).Additionally,multilingualdatafromHPLT(deGibertetal.,2024),MC4(Zhuetal.,2023a),Paracrawl(Ghussinetal.,2023),OSCAR(Abadjietal.,2022),alongwithWikipedia(Foundation),andinstructiontuningdatafromsourcessuchasOpenAssistant(Köpfetal.,2023),APIBench(Patiletal.,2023),andOIG(LAION,2023)wereincluded.ForCAT,weoptedforagreaterpercentageofcodeandachangedmixofhigh-qualitypublicinstructiondatasets(Mishraetal.,2022a;Dingetal.,2023;Ivisonetal.,2023),encompassingcoding(Luoetal.,2023;Mishraetal.,2023a)andmathematicalreasoning(Yuetal.,2023;Mishraetal.,2023b).Theintentionwastonotoverfitonthehighqualityinstructiondata,andthusthehighqualitydatawasusedinCATonly.WealsosubsampleddatafromCAPforquality,asdescribedbelow.Furthermore,weintroducedanewsafetyinstructionsdatasetnamedBiden-HarrisRedteam,detailedinSection4.ThetotaldatasetsizeforCATis58Btokens.PleaserefertoFig.2forthedistributionoflanguagesinbothtrainingstages.ThecompletelistofdatasetsisavailableinAppendixB.DataFiltering.Toremovetoxiccontentandlow-qualitytext,weappliedfilterssimilartothoseusedinNguyenetal.(2023b);Scaoetal.(2023),suchasstop-wordproportionsandtextlength.Forallwebtext,wefollowedaprocessakintoPenedoetal.(2023)toremovelow-qualitycontent,includingduplicateheadersandfooters.Additionally,intheCATdataset,wefurtherfilteredwebtextwithhighproportionsofsymbolsandnumbers.InthecaseofRefinedWeb(Penedoetal.,2023),weutilizedtheRedPajama(Together,2023)fastTextclassifiertoretainEnglishwebpagesresembling"high-quality"contentsimilartoWikipedia-linkedarticles.Wetrainedandemployedasimilarclassifiertofilterotherlanguagesinourdataset,exceptforFinnish,wheretheprocedurecausedover-filtering,5resultinginanexcessivelylowsamplevolumepost-filtering.TofurtherenhancethequalityoftheRefinedWebdata,weadoptedanapproachdetailedinRönnqvistetal.(2021).WetrainedafastTextclassifier*andselectivelysubsampledwebpageswithover-representedregisters,aimingtoretainmore"rare"text(e.g.,lyricalorpoetictext).ThisfilteringprocesswasspecificallyappliedtoEnglishtextduetotheprohibitiveslownessofourmultilingualclassifiers.Addressingthislimitationrepresentsanareaforfutureresearch.DataProcessing.Inthesecondstagedataset,weundertookthedetectionandanonymiza-tionofsensitiveinformation,includinggovernmentIDs,withinweb-basedtextstoupholdprivacyandethicalstandardssimilartoScaoetal.(2023).FordatasegmentsderivedfromarXiv,USPTO,andStackExchangewithinthePiledataset(Gaoetal.,2020),werecon-structedthedatafromtheoriginalsourcetorestoremetadata,whichwethenappropriatelyappendedtothetexts.3ModelTrainingAURORA-MwastrainedontheLUMIsupercomputer†,utilizing128AMDMI250XGPUsfor48days.Thetrainingprocessoperatedentirelyon100%hydro-poweredenergyandincludedwasteheatrecycling.Fororchestration,weadaptedasegmentoftheBigcodeforkofMegatron-LM(Narayananetal.,2021)usingtheHIPruntime.Fortraining,wedistributedthemodelusing4-wayTensorParallelismand4-wayPipelineParallelismusingthe1F1Bscheduletoreducethepipelinebubble(Narayananetal.,2021).WealsousedMegatron’sdistributedoptimizer(Narayananetal.,2021)todistributetheoptimizerstatesacrossdata-parallelprocessesandeliminateredundancy,reducingtherequiredmemoryForthetrainingofAURORA-M,wemaintainedaconsistentbatchsizeof2048andasequencelengthof2048tokens.Thelearningratewaslinearlywarmedupto10−4over2,000steps,followedbyacosinedecayschedulersettodecaythelearningrateto10−5by120,000steps.whileoptimizationutilizedtheAdamWoptimizer(Kingma&Ba,2017;Loshchilov&Hutter,2019)withcoefficientsβ1=0.9andβ2=0.95.Additionally,Megatron-LM’sdistributedoptimizerwithmixedprecisiontraining(Micikeviciusetal.,2018)wasused.FurthertrainingdetailscanbefoundintheAppendixA.Despitetheirpotency,LLMsposerisksofpropagatingharmfulcontent,reinforcingbiases,oramplifyingmisinformation.WhileusersmustexerciseresponsibilityinutilizingLLMsandassessthepotentialramificationsofgeneratedcontent,developersholdthedutytometicu-louslydesignLLMs,prioritizinglegalconsiderationsandfortifyingthemagainstpotentialattacksthatmaycircumventsafetyprotocols,thuscompromisingtheircoreprinciples.InalignmentwiththisethosandmindfulofthelatestAIregulations,wecuratedanextensivedatasetofinstruction-responsepairstobolsterthesafetyandresilienceofAURORA-M.OurendeavorspecificallyaddresseskeyconcernsoutlinedintheBiden-HarrisUSExecutiveOrderonAI(WhiteHouse,2023),encompassingthefollowingmainareas:•Harmtooneselforothers(e.g.homicide,suicide,intentionalinjury,etc.).•Requestsonhowtocreatecyber-attacks(e.g.attackingbusinesses,schools,andgovernmentsthroughtheInternet).•Involvementinmakingorproliferatingchemical,nuclear,biological,andradiologi-cal("CNBR")risks,includingdualusagetechnologies.•Participationinanyillegalact(e.g.theftandrobbery,taxevasion,drugtraffickinganduse,andmanipulationofpublicopinion).*Similarto/TurkuNLP/register-labeling?tab=readme-ov-file†https://www.lumi-supercomputer.eu/6•Infringementofprivacyorrights(e.g.stealingpersonalprivacyinformation).•Attemptstocircumventred-teamingcontrols.Withthesemaincategoriesinmind,wecuratedtheBiden-HarrisRedteamDatasetcom-prising5000red-teaminginstructions,human-reviewed,andeditedinstruction-responsepairstoaddresslawfulnessandsafetyconcerns,includingthoseoutlinedintheExecutiveOrder(WhiteHouse,2023).Theinstructionsweresourcedfromfilteringthehumanprefer-encedatasetonharmlessnessfromAnthropic(Baietal.,2022)andutilizingsemi-automatictemplate-basedmethods.Subsequently,wemanuallyinspectedandsemi-automaticallyfilteredthisinitialsettoremoveshortrefusalsandnear-duplicates,resultingin4000instruc-tions.ToaddresspotentialharmfulresponsesbyAURORA-Minthefirststageofpretraining,wealsousedanapproximately1000instructionssubsetandhand-wroteorcreatedcontinu-ationsbythisversionofAURORA-M.Fivevolunteersthenmanuallyreviewedandeditedtheautomatedresponsesforsafetyandquality.Weutilizedtheresultantapproximately5000instructionsdatasetforinstruction-tuning(referredtoastheBiden-HarrisRedteamDataset)ofAURORA-Mandevaluateditssafetylevelsonvarioussafetyevaluationdatasetsbothbeforeandaftertheinstruction-tuningstep.DetailsandresultsareprovidedinSection5.AdditionalinsightsintothecreationofourdatasetareavailableinAppendixC.5.1EvaluationSetupEnglishEvaluationDatasets.WeusedtheLanguageModelEvaluationHarness(Gaoetal.,2022).Weevaluatedquestionansweringtasks,includingOpenBookQA(Mihaylovetal.,2018)andTriviaQA(Joshietal.,2017),naturallanguageinferencewithHellaSwag(Zellersetal.,2019),machinereadingcomprehensionwithSQuAD2.0(Rajpurkaretal.,2018),XWINO(Tikhonov&Ryabinin,2021),andarithmeticreasoningwithGSM8K(Cobbeetal.,2021)using8-shotinference.JapaneseEvaluationDatasets.Followingswallow-llama‡,weutilizedllm-jp-eval(Hanetal.,2024)andtheJPLanguageModelEvaluationHarness§.llm-jp-evalutilizesJCom-monsenseQA(JCom)(Kuriharaetal.,2022)toevaluatemultiplechoicequestionanswering,JEMHopQA(JEMHop)(Ishiietal.,2023)andNIILC(Sekine,2003)forfree-formquestionanswering,andJSQuAD(Kuriharaetal.,2022)formachinereadingcomprehensionusing4-shotinference.JPLanguageModelEvaluationHarnessevaluatesautomaticsummarizationonXL-Sum(Hasanetal.,2021)using1-shotinference,arithmeticreasoningonMGSM(Shietal.,2023)using4-shotinference,andJapanese-EnglishandEnglish-JapanesemachinetranslationonWMT2020Japanese↔English(Barraultetal.,2020)using4-shotinference.FinnishEvaluationDatasets.WeadoptedtheevaluationmethodusedinFinGPT(Luukko-nenetal.,2023a).EvaluationwascarriedoutusingFIN-bench¶.FIN-benchisbasedonasubsetoftheBIG-bench(Srivastavaetal.,2023)taskcollection.Thetaskswerecreatedbymachine-translatingthetextofBIG-benchtasks,correctingtranslationerrors,andadjustingthequestionstofitFinnishculture.Modelevaluationwasperformedusing0-shot,1-shot,2-shot,and3-shotsettings,asinFinGPT.Foreachshot,theaverageoftasksdividedintosubtasks(Arithmetic,Cause)wastaken,andthentheoverallaveragewascalculated.HindiandVietnameseEvaluationDatasets.Weusedthemlmmevaluation||foreval-uation.WeevaluatedAI2ReasoningChallenge(Clarketal.,2018),HellaSwag,MMLU(Hendrycksetal.,2021a),andTruthfulQA(Linetal.,2022)using0-shotinference.ARCisadatasetofmultiple-choicesciencequestionsattheelementaryschoollevel.HellaSWAGis‡swallow-llama:https://tokyotech-llm.github.io/swallow-llama§/Stability-AI/lm-evaluation-harness7ModelMC4-shotQARCJSQuAD4-shotSUMXL-SumMATHMGSM4-shotMT(WMT20)4-shot4-shotJEMHop4-shotNIILC4-shotStarCoderBase(Lietal.,2023a)StarCoderPlus(Lietal.,2023a)Llama-2-7b(Touvronetal.,2023)Llama-2-13b(Touvronetal.,2023)AURORA-M(Red-teamed)(Ours)Table1:JapaneseEvaluation.ModelGPT3-Finnish-8B(Luukkonenetal.,2GPT3-Finnish-13B(LuukkoneneLlama-2-7b(TouvronAURORA-M(Red-teamed)(Ours)51.80Table2:FinnishEvaluation.adatasetforstudyinggroundedcommonsenseinference.Eachquestionhasfourchoicesaboutwhathappensnextinthescene.Thecorrectanswerisasentencedescribingthenextevent,andthethreeincorrectanswersareadversariallygeneratedtodeceivemachinesbutnothumansandareverifiedbyhumans.MMLUincludesmultiplechoicequestionsderivedfromvariousfieldsofknowledge,includinghumanities,socialsciences,andnaturalModelARCHellaSwagMMLUTruthfulQAVIHIVIHIVIHIVIHIVIHIStarCoderPlus(Lietal.,2023a)Llama-2-7b(Touvronetal.,2023)Llama-2-13b(Touvronetal.,2023)VinaLlama-7b(Nguyenetal.,2023a)AURORA-M(Red-teamed)(Ours)Table3:0-shotevaluationResultsforVietnamese(VI)andHindi(HI).CodeEvaluationDatasets.Forcodeevaluation,weusedMBPP(Austinetal.,2021),HumanEval(Chenetal.,2021),MultiPL-E(Cassanoetal.,2022)andHumanEvalFix(Muen-nighoffetal.,2023a).Allevaluationswereconductedusing0-shotinference.ForMultiPL-EandHumanEvalFix,weperformedcodegenerationusinggreedydecodingandevaluatedthePass@1score,followingCodeLlama(Rozièreetal.,2024).ForHumanEvalandMBPP,greedydecoding.ForPass@10andPass@100,wesettoppto0.95andtemperatureto0.8.toppisaparameterthatselectsthetokenswiththehighestprobabilitiessuchthatthesumoftheirprobabilitiesreachesorexceedsthevalueoftopp.Toexecutetheevaluations,weusedbigcode-evaluation-harness(BenAllaletal.,2022)library.SafetyEvaluationDatasets.Foroursafetyevaluation,weemploytheevaluationsuiteprovidedbyBianchietal.(2024)tomeasuresafetyacrossvariousdimensions.Moreover,weconstructedourown40EnglishBiden-Harrisconcernedfocusedinstructionsinthecategoriesofprivacy,misinformation,harmpromotion,malware,CNBR,illegalacts,andcyberattacks.Thenwetranslatedthesetotheotherlanguages,resultingin280instructions,whichwecalltheBiden-HarrisRedteamTestset.Additionally,weusetheDangerousQAdataset(Bhardwaj&Poria,2023)tomeasuretheAttackSuccessRate(ASR)ofharmfulquerieswhenprovidedasinputtobothourbaseandred-teamedmodels.8I-PhysicalSafetySafeI-PhysicalSafetyUnsafeI-PhysicalSafetySafeI-PhysicalSafetyUnsafeModelOpenBookQATriviaQAHellaSwagSQuAD2.0XWINOGSM8KStarCoderBase(LietStarCoderPlus(Lietal.,2023a)Llama-2-7b(Touvronetal.,2023)Llama-2-13b(Touvronetal.,2023)AURORA-M(Red-teamed)(Ours)36.60Table4:EnglishEvaluation.5.2EvaluationResultsFigure1illustratesthesuperiorperformanceofAURORA-Mcomparedtoitsbasemodel(i.e.,StarCoderPlus)acrossanextensiverangeofcodeandmultilingualbenchmarks,un-derscoringtheefficacyofAURORA-Macrossdiversefieldsandlanguages.WeobservethatAURORA-McanmaintainperformanceonpreviouslylearnedEnglishandCodebenchmarkswhilesignificantlyoutperformingonnewlanguagebenchmarks.EvaluationonNaturalLanguages.Tables1,2,3,4demonstratetherespectiveperfor-manceonthetargetedlanguages,showingthatAURORA-Mconsistentlyoutperformstheperformanceofitsstartingcheckpoint,StarCoderPlus,andmanyotherbaselines,suchasLlama-2-7b.ModelHumanEvalMBPPStarCoderBase(Lietal.,2023a)StarCoderPlus(Lietal.,2023a)AURORA-M(Red-teamed)(Ours)29.27Table5:HumanEval&MBPPevaluationresults.CodeEvaluation.Tables5and6illustratetheproficiencyofAURORA-Mincodegenera-tion,demonstratingthepossibilityofcontinualpre-trainingfromacode-centriccheckpointonmultilingualdata.InTable5,theHumanEvalandMBPPevaluationbenchmarksas-sessthemodel’sabilitytogeneratesyntacticallyandsemanticallycorrectcodesnippets.AURORA-MexhibitscompetitiveperformanceonthePass@1metric,whichevaluatesthemodel’sabilitytoproduceacorrectansweronthefirstattempt.Inparticular,AURORA-MconsistentlymatchesoroutperformsStarCoderPlus,suggestingasignificantimprovementincodesynthesiscapabilities.InAppendixD,weshowresultsonadditionalcodedatasetsandfurtheranalyzethebehaviorofoursystembylookingattherelationshipbetweenitsperformanceandthenumberoftrainingtokensacrossvariouslanguagesandmodalities.MeanValues (a)Harmfulnessscoresofourbasemodel(red)comparedtoitsinstruction-tunedversion(blue).Thelowerthebetter.Aurora-M(Red-teamed)Aurora-M(Base) JAVI(b)CARPscoresfortheBH-readteamedmodelandthebasemodelontheBiden-HarrisRedteamFigure3:Overallsafetyresults.9SafetyEvaluationInFigure3,weprovidethesafetyresultscomparingourbasemodelagainstourBiden-Harrisred-teamedmodelobtainedbyinstruction-tuningtheformeronthedatasetintroducedinSection4.FortheBiden-HarrisRedteamTestsetevaluation,fourvolunteersreviewedbothmodels’responsesandscoredthemwith-2ifharmful,1ifnothelpfulbutharmless,and2ifbothhelpfulandharmless.WetermthepercentageofthetotalscorepercategorycomparedtoitsmaximumpossiblescoreastheContinualAlignmentRedteamPercentage("CARP").WecanimmediatelyappreciatetheconsiderablylowerharmfulnessbothontheexistingbenchmarksandonourownBiden-Harrisred-teamtestsetasevidentbytheCARPscoresobtainedbyourred-teamedAURORA-M.Wealsonotethusshowingstrongindicationsofcross-lingualred-teamingeffects.Furthermore,asshowninAppendixD,theAttackSuccessRate(ASR)onDangerousQAwasalsoreduced.ExpandingMultilingualLanguageModels.Initially,thedevelopmentofLLMshaspre-dominantlytargetedtheEnglishlanguage(Brownetal.,2020),leveragingtheextensivecorpusofEngl

人人文库> 全部分类> 应用文书 > 研究报告

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

首个根据人类审查的安全指令进行微调的开源多语言模型

文档简介

温馨提示

最新文档

评论

首个根据人类审查的安全指令进行微调的开源多语言模型

文档简介

温馨提示

最新文档

评论

相关文档