【英伟达】大语言模型新手入门指南-A Beginners Guide to Large Language Models_第1页
【英伟达】大语言模型新手入门指南-A Beginners Guide to Large Language Models_第2页
【英伟达】大语言模型新手入门指南-A Beginners Guide to Large Language Models_第3页
【英伟达】大语言模型新手入门指南-A Beginners Guide to Large Language Models_第4页
【英伟达】大语言模型新手入门指南-A Beginners Guide to Large Language Models_第5页
已阅读5页,还剩36页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

ABeginner’sGuidetoLargeLanguageModels

Part1

Contributors:

AnnamalaiChockalingamAnkurPatel

ShashankVermaTiffanyYeung

TableofContents

Preface 3

Glossary 5

IntroductiontoLLMs 8

WhatAreLargeLanguageModels(LLMs)? 8

FoundationLanguageModelsvs.Fine-TunedLanguageModels 11

EvolutionofLargeLanguageModels 11

NeuralNetworks 12

Transformers 14

HowEnterprisesCanBenefitFromUsingLargeLanguageModels 20

ChallengesofLargeLanguageModels 21

WaystoBuildLLMs 21

HowtoEvaluateLLMs 22

NotableCompaniesintheLLMField 23

PopularStartup-developedLLMApps 23

ABeginner’sGuidetoLargeLanguageModels2

ABeginner’sGuidetoLargeLanguageModeIs3

Preface

Languagehasbeenintegraltohumansocietyforthousandsofyears.Along-prevailingtheory,laryngealdescenttheoryorLDT,suggeststhatspeechand,thus,language,

mayhaveevoIvedabout200,000or300,000yearsago,whiIenewerresearchshowsitcouId’vehappenedevensooner.

Regardlessofwhenitfirstappeared,languageremainsthecornerstoneofhumancommunication.IthastakenonanevengreaterroIeintoday’sdigitaIage,whereanunprecedentedportionofthe

populationcancommunicateviabothtextandspeechacrosstheglobe.

Thisisunderscoredbythefactthat347.3billionemailmessagesaresentandreceivedworldwide

everyday,andthatfivebillionpeople一orover63%oftheentireworldpopulation一sendandreceivetextmessages.

Languagehasthereforebecomeavasttroveofinformationthatcanhelpenterprisesextractvaluableinsights,identifytrends,andmakeinformeddecisions.Asanexample,enterprisescananalyzetextsIikecustomerreviewstoidentifytheirproducts’best-sellingfeaturesandfine-tunetheirfuture

productdevelopment.

Similarly,languageproduction一asopposedtolanguageanalysis一isalsobecominganincreasinglyimportanttoolforenterprises.Creatingblogposts,forexample,canhelpenterprisesraisebrand

awarenesstoapreviouslyunheard-ofextent,whilecomposingemailscanhelpthemattractnewstakeholdersorpartnersatanunmatchedspeed.

However,bothlanguageanalysisandproductionaretime-consumingprocessesthatcandistract

employeesanddecision-makersfrommoreimportanttasks.Forinstance,leadersoftenneedtosiftthroughvastamountsoftextinordertomakeinformeddecisionsinsteadofmakingthembasedonextractedkeyinformation.

Enterprisescanminimizetheseandotherproblems,suchastheriskofhumanerror,byemploying

largelanguagemodels(LLMs)forlanguage-relatedtasks.LLMscanhelpenterprisesaccelerateandlargelyautomatetheireffortsrelatedtobothlanguageproductionandanalysis,savingvaluabletimeandresourceswhileimprovingaccuracyandefficiency.

Unlikeprevioussolutions,suchasrule-basedsystems,LLMsareincrediblyversatileandcanbeeasilyadaptedtoawiderangeoflanguage-relatedtasks,likegeneratingcontentorsummarizinglegal

documentation.

ABeginner’sGuidetoLargeLanguageModels4

ThegoalofthisbookistohelpenterprisesunderstandwhatmakesLLMssogroundbreaking

comparedtoprevioussolutionsandhowtheycanbenefitfromadoptingordevelopingthem.ItalsoaimstohelpenterprisesgetaheadstartbyoutliningthemostcrucialstepstoLLMdevelopment,

training,anddeployment.

Toachievethesegoals,thebookisdividedintothreeparts:

>Part1definesLLMsandoutlinesthetechnologicalandmethodologicaladvancementsoverthe

yearsthatmadethempossible.Italsotacklesmorepracticaltopics,suchashowenterprisescandeveloptheirownLLMsandthemostnotablecompaniesintheLLMfield.Thisshouldhelp

enterprisesunderstandhowadoptingLLMscanunlockcutting-edgepossibilitiesandrevolutionizetheiroperations.

>Part2discussesfivemajorusecasesofLLMswithinenterprises,includingcontentgeneration,summarization,andchatbotsupport.Eachusecaseisexemplifiedwithreal-lifeappsandcasestudies,soastoshowhowLLMscansolverealproblemsandhelpenterprisesachievespecificobjectives.

>Part3isapracticalguideforenterprisesthatwanttobuild,train,anddeploytheirownLLMs.Itprovidesanoverviewofnecessarypre-requirementsandpossibletrade-offswithdifferent

developmentanddeploymentmethods.MLengineersanddatascientistscanusethisasareferencethroughouttheirLLMdevelopmentprocesses.

Hopefully,thiswillinspireenterprisesthathavenotyetadoptedordevelopedtheirownLLMstodososooninordertogainacompetitiveadvantageandoffernewSOTAservicesorproducts.Themostbenefitswillbe,asusual,reservedforearlyadoptersortrulyvisionaryinnovators.

ABeginner’sGuidetoLargeLanguageModels5

Glossary

TermsDescription

Deeplearningsystems

Systemsthatrelyonneuralnetworkswithmanyhiddenlayerstolearncomplexpatterns.

GenerativeAI

AIprogramsthatcangeneratenewcontent,liketext,images,andaudio,ratherthanjustanalyzeit.

Largelanguagemodels(LLMs)

Languagemodelsthatrecognize,summarize,translate,predict,andgeneratetextandothercontent.They’recalledlarge

becausetheyaretrainedonlargeamountsofdataandhavemanyparameters,withpopularLLMsreachinghundredsofbillionsofparameters.

Naturallanguageprocessing(NLP)

Theabilityofacomputerprogramtounderstandandgeneratetextinnaturallanguage.

Longshort-termmemoryneuralnetwork(LSTM)

AspecialtypeofRNNswithmorecomplexcellblocksthatallowittoretainmorepastinputs.

Naturallanguagegeneration(NLG)

ApartofNLPthatreferstotheabilityofacomputerprogramtogeneratehuman-liketext.

Naturallanguageunderstanding(NLU)

ApartofNLPthatreferstotheabilityofacomputerprogramtounderstandhuman-liketext.

Neuralnetwork(NN)

Amachinelearningalgorithminwhichtheparametersare

organizedintoconsecutivelayers.ThelearningprocessofNNsisinspiredbythehumanbrain.Muchlikehumans,NNs“learn”

importantfeaturesviarepresentationlearningandrequirelesshumaninvolvementthanmostotherapproachestomachine

learning.

PerceptionAI

AIprogramsthatcanprocessandanalyzebutnotgeneratedata,mainlydevelopedbefore2020.

Recurrentneuralnetwork(RNN)

Neuralnetworkthatprocessesdatasequentiallyandcanmemorizepastinputs.

ABeginner’sGuidetoLargeLanguageModels6

TermsDescription

Rule-basedsystem

Asystemthatreliesonhuman-craftedrulestoprocessdata.

Traditionalmachinelearning

Traditionalmachinelearningusesastatisticalapproach,drawingprobabilitydistributionsofwordsorothertokensbasedona

largeannotatedcorpus.Itrelieslessonrulesandmoreondata.

Transformer

Atypeofneuralnetworkarchitecturedesignedtoprocesssequentialdatanon-sequentially.

Structureddata

Datathatisquantitativeinnature,suchasphonenumbers,andcanbeeasilystandardizedandadjustedtoapre-definedformatthatMLalgorithmscanquicklyprocess.

Unstructureddata

Datathatisqualitativeinnature,suchascustomerreviews,anddifficulttostandardize.Suchdataisstoredinitsnativeformats,likePDFfiles,beforeuse.

Fine-tuning

Atransferlearningmethodusedtoimprovemodelperformanceonselecteddownstreamtasksordatasets.It’susedwhenthe

targettaskissimilartothepre-trainingtaskandinvolvescopyingtheweightsofaPLMandtuningthemondesiredtasksordata.

Customization

AmethodofimprovingmodelperformancebymodifyingonlyoneorafewselectedparametersofaPLMinsteadofupdatingtheentiremodel.Itinvolvesusingparameter-efficient

techniques(PEFT).

Parameter-efficienttechniques(PEFT)

Techniqueslikepromptlearning,LoRa,andadaptertuning

whichallowresearcherstocustomizePLMsfordownstreamtasksordatasetswhilpreservingandleveragingexisting

knowledgeofPLMs.Thesetechniquesareusedduringmodelcustomizationandallowforquickertrainingandoftenmoreaccuratepredictions.

Promptlearning

AnumbrellatermfortwoPEFTtechniques,prompttuningand

p-tuning,whichhelpcustomizemodelsbyinsertingvirtualtokenembeddingsamongdiscreteorrealtokenembeddings.

Adaptertuning

APEFTtechniquethatinvolvesaddinglightweightfeed-forwardlayers,calledadapters,betweenexistingPLMlayersand

updatingonlytheirweightsduringcustomizationwhilekeepingtheoriginalPLMweightsfrozen.

Open-domainquestionanswering

Answeringquestionsfromavarietyofdifferentdomains,likelegal,medical,andfinancial,insteadofjustonedomain.

Extractivequestionanswering

Answeringquestionsbyextractingtheanswersfromexistingtextsordatabases.

ABeginner’sGuidetoLargeLanguageModels7

TermsDescription

ThroughputAmeasureofmodelefficiencyandspeed.Itreferstothe

amountofdataorthenumberofpredictionsthatamodelcanprocessorgeneratewithinapre-definedtimeframe.

LatencyTheamountoftimeamodelneedstoprocessinputand

generateoutput.

DataReadinessThesuitabilityofdataforuseintraining,basedonfactorssuch

asdataquantity,structure,andquality.

ABeginner’sGuidetoLargeLanguageModels8

IntroductiontoLLMs

Alargelanguagemodelisatypeofartificialintelligence(AI)system

thatiscapableofgeneratinghuman-liketextbasedonthepatterns

andrelationshipsitlearnsfromvastamountsofdata.Largelanguagemodelsuseamachinelearningtechniquecalleddeeplearningtoanalyzeandprocesslargesetsofdata,suchasbooks,articles,andwebpages.

LargelanguagemodelsunlockednumerousunprecedentedpossibilitiesinthefieldofNLPandAI.ThiswasmostnotablydemonstratedbythereleaseofOpenAI’sGPT-3in2020,thethen-largestlanguagemodeleverdeveloped.

Thesemodelsaredesignedtounderstandthecontextandmeaningoftextandcangeneratetextthatisgrammaticallycorrectandsemanticallyrelevant.Theycanbetrainedonawiderangeoftasks,

includinglanguagetranslation,summarization,questionanswering,andtextcompletion.

GPT-3madeitevidentthatlarge-scalemodelscanaccuratelyperformawide–andpreviously

unheard-of–rangeofNLPtasks,fromtextsummarizationtotextgeneration.ItalsoshowedthatLLMscouldgenerateoutputsthatarenearlyindistinguishablefromhuman-createdtext,allwhilelearningontheirownwithminimalhumanintervention.

Thispresentedanenormousimprovementfromearlier,mainlyrule-basedmodelsthatcouldneitherlearnontheirownnorsuccessfullysolvetaskstheyweren’ttrainedon.Itisnosurprise,then,that

manyotherenterprisesandstartupssoonstarteddevelopingtheirownLLMsoradoptingexistingLLMsinordertoacceleratetheiroperations,reduceexpenses,andstreamlineworkflows.

Part1isintendedtoprovideasolidintroductionandfoundationforanyenterprisethatisconsideringbuildingoradoptingitsownLLM.

WhatAreLargeLanguageModels(LLMs)?

Largelanguagemodels(LLMs)aredeeplearningalgorithmsthatcanrecognize,extract,summarize,predict,andgeneratetextbasedonknowledgegainedduringtrainingonverylargedatasets.

They’realsoasubsetofamoregeneraltechnologycalledlanguagemodels.Alllanguagemodelshaveonethingincommon:theycanprocessandgeneratetextthatsoundslikenaturallanguage.Thisis

knownasperformingtasksrelatedtonaturallanguageprocessing(NLP).

ABeginner’sGuidetoLargeLanguageModels9

AlthoughalllanguagemodelscanperformNLPtasks,theydifferinothercharacteristics,suchastheirsize.Unlikeothermodels,LLMsareconsideredlargeinsizebecauseoftworeasons:

1.They’retrainedusinglargeamountsofdata.

2.Theycompriseahugenumberoflearnableparameters(i.e.,representationsoftheunderlyingstructureoftrainingdatathathelpmodelsperformtasksonnewornever-before-seendata).

Table1

showcasestwolargelanguagemodels,MT-NLGandGPT-3Davinci,tohelpclarifywhat’s

consideredlargebycontemporarystandards.

Table1.ComparisonofMT-NLGandGPT-3

LargeLanguageModel

Numberof

parameters

Numberoftokensinthetrainingdata

NVIDIAModel:Megatron-TuringNaturalLanguageGenerationModel(MT-NLG)

530billion

270billion

OpenAIModel:GPT-3DavinciModel

175billion

499billion

Sincethequalityofamodelheavilydependsonthemodelsizeandthesizeoftrainingdata,largerlanguagemodelstypicallygeneratemoreaccurateandsophisticatedresponsesthantheirsmallercounterparts.

ABeginner’sGuidetoLargeLanguageModels10

Figure1.AnswerGeneratedbyGPT-3.

However,theperformanceoflargelanguagemodelsdoesn’tjustdependonthemodelsizeordataquantity.Qualityofthedatamatters,too.

Forexample,LLMstrainedonpeer-reviewedresearchpapersorpublishednovelswillusuallyperformbetterthanLLMstrainedonsocialmediaposts,blogcomments,orotherunreviewedcontent.Low-

qualitydatalikeuser-generatedcontentmayleadtoallsortsofproblems,suchasmodelspickingupslang,learningincorrectspellingsofwords,andsoon.

Inaddition,modelsneedverydiversedatainordertoperformvariousNLPtasks.However,ifthe

modelisintendedtobeespeciallygoodatsolvingaparticularsetoftasks,thenfine-tuneitusinga

morerelevantandnarrowerdataset.Bydoingsoafoundationlanguagemodelistransformed—fromonethat’sgoodatperformingvariousNLPtasksacrossabroadsetofdomains–intoafine-tuned

modelthatspecializesinperformingtasksinanarrowlyscopeddomain.

ABeginner’sGuidetoLargeLanguageModels11

FoundationLanguageModelsvs.Fine-TunedLanguageModels

Foundationlanguagemodels,suchastheaforementionedMT-NLGandGPT-3,arewhatisusuallyreferredtowhendiscussingLLMs.They’retrainedonvastamountsofdataandcanperformawidevarietyofNLPtasks,fromansweringquestionsandgeneratingbooksummariestocompletingandtranslatingsentences.

Thankstotheirsize,foundationmodelscanperformwellevenwhentheyhavelittledomain-specificdataattheirdisposal.Theyhavegoodgeneralperformanceacrosstasksbutmaynotexcelat

performinganyonespecifictask.

Fine-tunedlanguagemodels,ontheotherhand,arelargelanguagemodelsderivedfromfoundationLLMs.They’recustomizedforspecificusecasesordomainsand,thus,becomebetteratperformingmorespecializedtasks.

Apartfromthefactthatfine-tunedmodelscanperformspecifictasksbetterthanfoundationmodels,theirbiggeststrengthisthattheyarelighterand,generally,easiertotrain.Buthowdoesoneactuallyfine-tuneafoundationmodelforspecificobjectives?

Currently,themostpopularmethodiscustomizingamodelusingparameter-efficientcustomizationtechniques,suchasp-tuning,prompttuning,adapters,andsoon.Customizationisfarlesstime-

consumingandexpensivethanfine-tuningtheentiremodel,althoughitmayleadtosomewhat

poorerperformancethanothermethods.Customizationmethodsarefurtherdiscussedin

Part3.

EvolutionofLargeLanguageModels

AIsystemswerehistoricallyaboutprocessingandanalyzingdata,notgeneratingit.Theyweremoreorientedtowardperceivingandunderstandingtheworldaroundusratherthanongeneratingnewinformation.ThisdistinctionmarksthemaindifferencebetweenPerceptiveandGenerativeAI,withthelatterbecomingincreasinglyprevalentsincearound2020,oraftercompaniesstartedadoptingtransformermodelsanddevelopingincreasinglymorerobustLLMsatalargescale.

TheadventoflargelanguagemodelsfurtherfueledarevolutionaryparadigmshiftinthewayNLP

modelsaredesigned,trained,andused.Totrulyunderstandthis,itmaybehelpfultocomparelargelanguagemodelstopreviousNLPmodelsandhowtheyworked.Forthispurpose,let’sbrieflyexplorethreeregimesinthehistoryofNLP:pre-transformersNLP,transformersNLP,andLLMNLP.

1.Pre-transformersNLPwasmainlymarkedbymodelsthatreliedonhuman-craftedrulesrather

thanmachinelearningalgorithmstoperformNLPtasks.Thismadethemsuitableforsimplertasksthatdidn’trequiretoomanyrules,liketextclassification,butunsuitableformorecomplextasks,suchasmachinetranslation.Rule-basedmodelsalsoperformedpoorlyinedge-casescenarios

becausetheycouldn’tmakeaccuratepredictionsorclassificationsfornever-before-seendataforwhichnoclearruleswereset.Thisproblemwassomewhatsolvedwithsimpleneuralnetworks,

suchasRNNsandLSTMs,developedduringthelaterphasesofthisperiod.RNNsandLSTMscouldmemorizepastdatatoacertainextentand,thus,providecontext-dependentpredictionsand

ABeginner’sGuidetoLargeLanguageModels12

classifications.However,RNNsandLSTMscouldnotmakepredictionsoverlongspansoftext,limitingtheireffectiveness.

2.TransformersNLPwassetinmotionbytheriseofthetransformerarchitecturein2017.

Transformerscouldgeneralizebetterthanthethen-prevailingRNNsandLSTMs,capturemore

context,andprocessmoredataatonce.TheseimprovementsenabledNLPmodelstounderstandlongersequencesofdataandperformamuchwiderrangeoftasks.However,fromtoday’spointofview,modelsdevelopedduringthisperiodhadlimitedcapabilities,mainlyduetothegenerallackoflarge-scaledatasetsandadequatecomputationalresources.Theyalsomainlysparked

attentionamongresearchersandexpertsinthefieldbutnotthegeneralpublic,astheyweren’tuser-friendlynoraccurateenoughtobecomecommercialized.

3.LLMNLPwasmainlyinitiatedbythelaunchofOpenAI’sGPT-3in2020.Largelanguagemodels

likeGPT-3weretrainedonmassiveamountsofdata,whichallowedthemtoproducemore

accurateandcomprehensiveNLPresponsescomparedtopreviousmodels.Thisunlockedmanynewpossibilitiesandbroughtusclosertoachievingwhatmanyconsider“true”AI.Also,LLMs

madeNLPmodelsmuchmoreaccessibletonon-technicaluserswhocouldnowsolveavarietyofNLPtasksjustbyusingnatural-languageprompts.NLPtechnologywasfinallydemocratized.

Theswitchfromonemethodologytoanotherwaslargelydrivenbyrelevanttechnologicaland

methodologicaladvancements,suchastheadventofneuralnetworks,attentionmechanisms,andtransformersanddevelopmentsinthefieldofunsupervisedandself-supervisedlearning.The

followingsectionswillbrieflyexplaintheseconcepts,asunderstandingthemiscrucialfortrulyunderstandinghowLLMsworkandhowtobuildnewLLMsfromscratch.

NeuralNetworks

Neuralnetworks(NNs)aremachinelearningalgorithmslooselymodeledafterthehumanbrain.Likethebiologicalhumanbrain,artificialneuralnetworksconsistofneurons,alsocallednodes,thatareresponsibleforallmodelfunctions,fromprocessinginputtogeneratingoutput.

Theneuronsarefurtherorganizedintolayers,verticallystackedcomponentsofNNsthatperformspecifictasksrelatedtoinputandoutputsequences.

ABeginner’sGuidetoLargeLanguageModels13

Everyneuralnetworkhasatleastthreelayers:

>Theinputlayeracceptsdataandpassesittotherestofthenetwork.

>Thehiddenlayer,ormultiplehiddenlayers,performsspecificfunctionsthatmakethefinal

outputofanNNpossible.Thesefunctionscanincludeidentifyingorclassifyingdata,generatingnewdata,andotherfunctionsdependingonthespecificNLPtaskinquestion.

>Theoutputlayergeneratesapredictionorclassificationbasedontheinput.

WhenLLMswerefirstdeveloped,theywerebasedonsimplerNNarchitectureswithfewerlayers,mainlyrecurrentneuralnetworks(RNNs)andlongshort-termmemorynetworks(LSTMs).Unlike

otherneuralnetworks,RNNsandLSTMscouldtakeintoaccountthecontext,position,and

relationshipsbetweenwordseveniftheywerefarapartinadatasequence.Simplyput,thismeanttheycouldmemorizeandconsiderpastdatawhengeneratingoutput,whichresultedinmore

accuratesolutionstomanyNLPtasks,especiallysentimentanalysisandtextclassification.

ThebiggestadvantagethatneuralnetworkslikeRNNsandLSTMshadovertraditional,rule-basedsystemswasthattheywerecapableoflearningontheirownwithlittletonohumaninvolvement.Theyanalyzedatatocreatetheirownrules,ratherthanlearntherulesfirstandapplythemtodatalater.Thisisalsoknownasrepresentationlearningandisinspiredbyhumanlearningprocesses.

Representations,orfeatures,arehiddenpatternsthatneuralnetworkscanextractfromdata.To

exemplifythis,let’simaginewe’retraininganNN-basedmodelonadatasetcontainingthefollowingtokens:

“cat,”“cats,”dog,”“dogs”

Afteranalyzingthesetokens,themodelmayidentifyarepresentationthatonecouldformulateas:

Pluralnounshavethesuffix“-s.”

Themodelwillthenextractthisrepresentationandapplyittoneworedge-casescenarioswhosedatadistributionfollowsthatoftrainingdata.Forexample,theassumptioncanbemadethatthemodel

willcorrectlyclassifytokenslike“chairs”or“table”aspluralorsingularevenifithadnotencounteredthembefore.Onceitencountersirregularnounsthatdon’tfollowtheextractedrepresentation,the

modelwillupdateitsparameterstoreflectnewrepresentations,suchas:

Pluralnounsarefollowedbypluralverbs.

ThisapproachenablesNN-basedmodelstogeneralizebetterthanrule-basedsystemsandsuccessfullyperformawiderrangeoftasks.

However,theirabilitytoextractrepresentationsisverymuchdependentonthenumberofneuronsandlayerscomprisinganetwork.Themoreneuronsneuralnetworkshave,themorecomplex

representationstheycanextract.That’swhy,today,mostlargelanguagemodelsusedeeplearningneuralnetworkswithmultiplehiddenlayersand,thus,ahighernumberofneurons.

ABeginner’sGuidetoLargeLanguageModels14

Figure2

showsaside-by-sidecomparisonofasingle-layerneuralnetworkandadeeplearningneural

network.

Figure2.ComparisonofSingle-Layervs.DeepLearningNeuralNetwork

Whilethismayseemlikeanobviouschoicetoday,considerthatdevelopingdeepneuralnetworksdidnotmakesensebeforethehardwareevolvedtobeabletohandlemassiveworkloads.Thisonly

becamepossibleafter~1999whenNVIDIAintroduced“theworld’sfirstGPU,”orgraphicsprocessingunit,tothewidermarketor,moreprecisely,afterawildlysuccessfulCNNcalledAlexNetpopularizedtheiruseindeeplearningin2012.

GPUshadahighlyparallelizablearchitecturewhichenabledtherapidadvancesindeeplearning

systemsthatareseentoday.Amongotheradvancements,theadventofGPUsusheredinthe

developmentofanewtypeofneuralnetworkthatwouldrevolutionizethefieldofNLP:transformers.

Transformers

WhileRNNsandLSTMshavetheiradvantages,especiallycomparedtotraditionalmodels,theyalso

havesomelimitationsthatmakethemunsuitableformorecomplexNLPtasks,suchasmachine

translation.Theirmainlimitationistheinabilitytoprocesslongerdatasequencesand,thus,considertheoverallcontextoftheinputsequence.BecauseLSTMsandRNNscannothandletoomuchcontextwell,theiroutputsarepronetobeinginaccurateornonsensical.Thisandotherchallengeshavebeenlargelyovercomewiththeadventofnew,specialneuralnetworkscalledtransformers.

Transformerswerefirstintroducedin2017byVaswanietal.inapapertitled"AttentionisAllYouNeed."Thetitlealludedtoattentionmechanisms,whichwouldbecomethekeycomponentof

transformers.

ABeginner’sGuidetoLargeLanguageModels15

“Weproposeanewsimplenetworkarchitecture,theTransformer,basedsolelyonattention

mechanisms,dispensingwithrecurrenceandconvolutionsentirely.”-Vaswaniet.al,“AttentionisA

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

最新文档

评论

0/150

提交评论