版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
ABeginner’sGuidetoLargeLanguageModels
Part1
Contributors:
AnnamalaiChockalingamAnkurPatel
ShashankVermaTiffanyYeung
TableofContents
Preface 3
Glossary 5
IntroductiontoLLMs 8
WhatAreLargeLanguageModels(LLMs)? 8
FoundationLanguageModelsvs.Fine-TunedLanguageModels 11
EvolutionofLargeLanguageModels 11
NeuralNetworks 12
Transformers 14
HowEnterprisesCanBenefitFromUsingLargeLanguageModels 20
ChallengesofLargeLanguageModels 21
WaystoBuildLLMs 21
HowtoEvaluateLLMs 22
NotableCompaniesintheLLMField 23
PopularStartup-developedLLMApps 23
ABeginner’sGuidetoLargeLanguageModels2
ABeginner’sGuidetoLargeLanguageModeIs3
Preface
Languagehasbeenintegraltohumansocietyforthousandsofyears.Along-prevailingtheory,laryngealdescenttheoryorLDT,suggeststhatspeechand,thus,language,
mayhaveevoIvedabout200,000or300,000yearsago,whiIenewerresearchshowsitcouId’vehappenedevensooner.
Regardlessofwhenitfirstappeared,languageremainsthecornerstoneofhumancommunication.IthastakenonanevengreaterroIeintoday’sdigitaIage,whereanunprecedentedportionofthe
populationcancommunicateviabothtextandspeechacrosstheglobe.
Thisisunderscoredbythefactthat347.3billionemailmessagesaresentandreceivedworldwide
everyday,andthatfivebillionpeople一orover63%oftheentireworldpopulation一sendandreceivetextmessages.
Languagehasthereforebecomeavasttroveofinformationthatcanhelpenterprisesextractvaluableinsights,identifytrends,andmakeinformeddecisions.Asanexample,enterprisescananalyzetextsIikecustomerreviewstoidentifytheirproducts’best-sellingfeaturesandfine-tunetheirfuture
productdevelopment.
Similarly,languageproduction一asopposedtolanguageanalysis一isalsobecominganincreasinglyimportanttoolforenterprises.Creatingblogposts,forexample,canhelpenterprisesraisebrand
awarenesstoapreviouslyunheard-ofextent,whilecomposingemailscanhelpthemattractnewstakeholdersorpartnersatanunmatchedspeed.
However,bothlanguageanalysisandproductionaretime-consumingprocessesthatcandistract
employeesanddecision-makersfrommoreimportanttasks.Forinstance,leadersoftenneedtosiftthroughvastamountsoftextinordertomakeinformeddecisionsinsteadofmakingthembasedonextractedkeyinformation.
Enterprisescanminimizetheseandotherproblems,suchastheriskofhumanerror,byemploying
largelanguagemodels(LLMs)forlanguage-relatedtasks.LLMscanhelpenterprisesaccelerateandlargelyautomatetheireffortsrelatedtobothlanguageproductionandanalysis,savingvaluabletimeandresourceswhileimprovingaccuracyandefficiency.
Unlikeprevioussolutions,suchasrule-basedsystems,LLMsareincrediblyversatileandcanbeeasilyadaptedtoawiderangeoflanguage-relatedtasks,likegeneratingcontentorsummarizinglegal
documentation.
ABeginner’sGuidetoLargeLanguageModels4
ThegoalofthisbookistohelpenterprisesunderstandwhatmakesLLMssogroundbreaking
comparedtoprevioussolutionsandhowtheycanbenefitfromadoptingordevelopingthem.ItalsoaimstohelpenterprisesgetaheadstartbyoutliningthemostcrucialstepstoLLMdevelopment,
training,anddeployment.
Toachievethesegoals,thebookisdividedintothreeparts:
>Part1definesLLMsandoutlinesthetechnologicalandmethodologicaladvancementsoverthe
yearsthatmadethempossible.Italsotacklesmorepracticaltopics,suchashowenterprisescandeveloptheirownLLMsandthemostnotablecompaniesintheLLMfield.Thisshouldhelp
enterprisesunderstandhowadoptingLLMscanunlockcutting-edgepossibilitiesandrevolutionizetheiroperations.
>Part2discussesfivemajorusecasesofLLMswithinenterprises,includingcontentgeneration,summarization,andchatbotsupport.Eachusecaseisexemplifiedwithreal-lifeappsandcasestudies,soastoshowhowLLMscansolverealproblemsandhelpenterprisesachievespecificobjectives.
>Part3isapracticalguideforenterprisesthatwanttobuild,train,anddeploytheirownLLMs.Itprovidesanoverviewofnecessarypre-requirementsandpossibletrade-offswithdifferent
developmentanddeploymentmethods.MLengineersanddatascientistscanusethisasareferencethroughouttheirLLMdevelopmentprocesses.
Hopefully,thiswillinspireenterprisesthathavenotyetadoptedordevelopedtheirownLLMstodososooninordertogainacompetitiveadvantageandoffernewSOTAservicesorproducts.Themostbenefitswillbe,asusual,reservedforearlyadoptersortrulyvisionaryinnovators.
ABeginner’sGuidetoLargeLanguageModels5
Glossary
TermsDescription
Deeplearningsystems
Systemsthatrelyonneuralnetworkswithmanyhiddenlayerstolearncomplexpatterns.
GenerativeAI
AIprogramsthatcangeneratenewcontent,liketext,images,andaudio,ratherthanjustanalyzeit.
Largelanguagemodels(LLMs)
Languagemodelsthatrecognize,summarize,translate,predict,andgeneratetextandothercontent.They’recalledlarge
becausetheyaretrainedonlargeamountsofdataandhavemanyparameters,withpopularLLMsreachinghundredsofbillionsofparameters.
Naturallanguageprocessing(NLP)
Theabilityofacomputerprogramtounderstandandgeneratetextinnaturallanguage.
Longshort-termmemoryneuralnetwork(LSTM)
AspecialtypeofRNNswithmorecomplexcellblocksthatallowittoretainmorepastinputs.
Naturallanguagegeneration(NLG)
ApartofNLPthatreferstotheabilityofacomputerprogramtogeneratehuman-liketext.
Naturallanguageunderstanding(NLU)
ApartofNLPthatreferstotheabilityofacomputerprogramtounderstandhuman-liketext.
Neuralnetwork(NN)
Amachinelearningalgorithminwhichtheparametersare
organizedintoconsecutivelayers.ThelearningprocessofNNsisinspiredbythehumanbrain.Muchlikehumans,NNs“learn”
importantfeaturesviarepresentationlearningandrequirelesshumaninvolvementthanmostotherapproachestomachine
learning.
PerceptionAI
AIprogramsthatcanprocessandanalyzebutnotgeneratedata,mainlydevelopedbefore2020.
Recurrentneuralnetwork(RNN)
Neuralnetworkthatprocessesdatasequentiallyandcanmemorizepastinputs.
ABeginner’sGuidetoLargeLanguageModels6
TermsDescription
Rule-basedsystem
Asystemthatreliesonhuman-craftedrulestoprocessdata.
Traditionalmachinelearning
Traditionalmachinelearningusesastatisticalapproach,drawingprobabilitydistributionsofwordsorothertokensbasedona
largeannotatedcorpus.Itrelieslessonrulesandmoreondata.
Transformer
Atypeofneuralnetworkarchitecturedesignedtoprocesssequentialdatanon-sequentially.
Structureddata
Datathatisquantitativeinnature,suchasphonenumbers,andcanbeeasilystandardizedandadjustedtoapre-definedformatthatMLalgorithmscanquicklyprocess.
Unstructureddata
Datathatisqualitativeinnature,suchascustomerreviews,anddifficulttostandardize.Suchdataisstoredinitsnativeformats,likePDFfiles,beforeuse.
Fine-tuning
Atransferlearningmethodusedtoimprovemodelperformanceonselecteddownstreamtasksordatasets.It’susedwhenthe
targettaskissimilartothepre-trainingtaskandinvolvescopyingtheweightsofaPLMandtuningthemondesiredtasksordata.
Customization
AmethodofimprovingmodelperformancebymodifyingonlyoneorafewselectedparametersofaPLMinsteadofupdatingtheentiremodel.Itinvolvesusingparameter-efficient
techniques(PEFT).
Parameter-efficienttechniques(PEFT)
Techniqueslikepromptlearning,LoRa,andadaptertuning
whichallowresearcherstocustomizePLMsfordownstreamtasksordatasetswhilpreservingandleveragingexisting
knowledgeofPLMs.Thesetechniquesareusedduringmodelcustomizationandallowforquickertrainingandoftenmoreaccuratepredictions.
Promptlearning
AnumbrellatermfortwoPEFTtechniques,prompttuningand
p-tuning,whichhelpcustomizemodelsbyinsertingvirtualtokenembeddingsamongdiscreteorrealtokenembeddings.
Adaptertuning
APEFTtechniquethatinvolvesaddinglightweightfeed-forwardlayers,calledadapters,betweenexistingPLMlayersand
updatingonlytheirweightsduringcustomizationwhilekeepingtheoriginalPLMweightsfrozen.
Open-domainquestionanswering
Answeringquestionsfromavarietyofdifferentdomains,likelegal,medical,andfinancial,insteadofjustonedomain.
Extractivequestionanswering
Answeringquestionsbyextractingtheanswersfromexistingtextsordatabases.
ABeginner’sGuidetoLargeLanguageModels7
TermsDescription
ThroughputAmeasureofmodelefficiencyandspeed.Itreferstothe
amountofdataorthenumberofpredictionsthatamodelcanprocessorgeneratewithinapre-definedtimeframe.
LatencyTheamountoftimeamodelneedstoprocessinputand
generateoutput.
DataReadinessThesuitabilityofdataforuseintraining,basedonfactorssuch
asdataquantity,structure,andquality.
ABeginner’sGuidetoLargeLanguageModels8
IntroductiontoLLMs
Alargelanguagemodelisatypeofartificialintelligence(AI)system
thatiscapableofgeneratinghuman-liketextbasedonthepatterns
andrelationshipsitlearnsfromvastamountsofdata.Largelanguagemodelsuseamachinelearningtechniquecalleddeeplearningtoanalyzeandprocesslargesetsofdata,suchasbooks,articles,andwebpages.
LargelanguagemodelsunlockednumerousunprecedentedpossibilitiesinthefieldofNLPandAI.ThiswasmostnotablydemonstratedbythereleaseofOpenAI’sGPT-3in2020,thethen-largestlanguagemodeleverdeveloped.
Thesemodelsaredesignedtounderstandthecontextandmeaningoftextandcangeneratetextthatisgrammaticallycorrectandsemanticallyrelevant.Theycanbetrainedonawiderangeoftasks,
includinglanguagetranslation,summarization,questionanswering,andtextcompletion.
GPT-3madeitevidentthatlarge-scalemodelscanaccuratelyperformawide–andpreviously
unheard-of–rangeofNLPtasks,fromtextsummarizationtotextgeneration.ItalsoshowedthatLLMscouldgenerateoutputsthatarenearlyindistinguishablefromhuman-createdtext,allwhilelearningontheirownwithminimalhumanintervention.
Thispresentedanenormousimprovementfromearlier,mainlyrule-basedmodelsthatcouldneitherlearnontheirownnorsuccessfullysolvetaskstheyweren’ttrainedon.Itisnosurprise,then,that
manyotherenterprisesandstartupssoonstarteddevelopingtheirownLLMsoradoptingexistingLLMsinordertoacceleratetheiroperations,reduceexpenses,andstreamlineworkflows.
Part1isintendedtoprovideasolidintroductionandfoundationforanyenterprisethatisconsideringbuildingoradoptingitsownLLM.
WhatAreLargeLanguageModels(LLMs)?
Largelanguagemodels(LLMs)aredeeplearningalgorithmsthatcanrecognize,extract,summarize,predict,andgeneratetextbasedonknowledgegainedduringtrainingonverylargedatasets.
They’realsoasubsetofamoregeneraltechnologycalledlanguagemodels.Alllanguagemodelshaveonethingincommon:theycanprocessandgeneratetextthatsoundslikenaturallanguage.Thisis
knownasperformingtasksrelatedtonaturallanguageprocessing(NLP).
ABeginner’sGuidetoLargeLanguageModels9
AlthoughalllanguagemodelscanperformNLPtasks,theydifferinothercharacteristics,suchastheirsize.Unlikeothermodels,LLMsareconsideredlargeinsizebecauseoftworeasons:
1.They’retrainedusinglargeamountsofdata.
2.Theycompriseahugenumberoflearnableparameters(i.e.,representationsoftheunderlyingstructureoftrainingdatathathelpmodelsperformtasksonnewornever-before-seendata).
Table1
showcasestwolargelanguagemodels,MT-NLGandGPT-3Davinci,tohelpclarifywhat’s
consideredlargebycontemporarystandards.
Table1.ComparisonofMT-NLGandGPT-3
LargeLanguageModel
Numberof
parameters
Numberoftokensinthetrainingdata
NVIDIAModel:Megatron-TuringNaturalLanguageGenerationModel(MT-NLG)
530billion
270billion
OpenAIModel:GPT-3DavinciModel
175billion
499billion
Sincethequalityofamodelheavilydependsonthemodelsizeandthesizeoftrainingdata,largerlanguagemodelstypicallygeneratemoreaccurateandsophisticatedresponsesthantheirsmallercounterparts.
ABeginner’sGuidetoLargeLanguageModels10
Figure1.AnswerGeneratedbyGPT-3.
However,theperformanceoflargelanguagemodelsdoesn’tjustdependonthemodelsizeordataquantity.Qualityofthedatamatters,too.
Forexample,LLMstrainedonpeer-reviewedresearchpapersorpublishednovelswillusuallyperformbetterthanLLMstrainedonsocialmediaposts,blogcomments,orotherunreviewedcontent.Low-
qualitydatalikeuser-generatedcontentmayleadtoallsortsofproblems,suchasmodelspickingupslang,learningincorrectspellingsofwords,andsoon.
Inaddition,modelsneedverydiversedatainordertoperformvariousNLPtasks.However,ifthe
modelisintendedtobeespeciallygoodatsolvingaparticularsetoftasks,thenfine-tuneitusinga
morerelevantandnarrowerdataset.Bydoingsoafoundationlanguagemodelistransformed—fromonethat’sgoodatperformingvariousNLPtasksacrossabroadsetofdomains–intoafine-tuned
modelthatspecializesinperformingtasksinanarrowlyscopeddomain.
ABeginner’sGuidetoLargeLanguageModels11
FoundationLanguageModelsvs.Fine-TunedLanguageModels
Foundationlanguagemodels,suchastheaforementionedMT-NLGandGPT-3,arewhatisusuallyreferredtowhendiscussingLLMs.They’retrainedonvastamountsofdataandcanperformawidevarietyofNLPtasks,fromansweringquestionsandgeneratingbooksummariestocompletingandtranslatingsentences.
Thankstotheirsize,foundationmodelscanperformwellevenwhentheyhavelittledomain-specificdataattheirdisposal.Theyhavegoodgeneralperformanceacrosstasksbutmaynotexcelat
performinganyonespecifictask.
Fine-tunedlanguagemodels,ontheotherhand,arelargelanguagemodelsderivedfromfoundationLLMs.They’recustomizedforspecificusecasesordomainsand,thus,becomebetteratperformingmorespecializedtasks.
Apartfromthefactthatfine-tunedmodelscanperformspecifictasksbetterthanfoundationmodels,theirbiggeststrengthisthattheyarelighterand,generally,easiertotrain.Buthowdoesoneactuallyfine-tuneafoundationmodelforspecificobjectives?
Currently,themostpopularmethodiscustomizingamodelusingparameter-efficientcustomizationtechniques,suchasp-tuning,prompttuning,adapters,andsoon.Customizationisfarlesstime-
consumingandexpensivethanfine-tuningtheentiremodel,althoughitmayleadtosomewhat
poorerperformancethanothermethods.Customizationmethodsarefurtherdiscussedin
Part3.
EvolutionofLargeLanguageModels
AIsystemswerehistoricallyaboutprocessingandanalyzingdata,notgeneratingit.Theyweremoreorientedtowardperceivingandunderstandingtheworldaroundusratherthanongeneratingnewinformation.ThisdistinctionmarksthemaindifferencebetweenPerceptiveandGenerativeAI,withthelatterbecomingincreasinglyprevalentsincearound2020,oraftercompaniesstartedadoptingtransformermodelsanddevelopingincreasinglymorerobustLLMsatalargescale.
TheadventoflargelanguagemodelsfurtherfueledarevolutionaryparadigmshiftinthewayNLP
modelsaredesigned,trained,andused.Totrulyunderstandthis,itmaybehelpfultocomparelargelanguagemodelstopreviousNLPmodelsandhowtheyworked.Forthispurpose,let’sbrieflyexplorethreeregimesinthehistoryofNLP:pre-transformersNLP,transformersNLP,andLLMNLP.
1.Pre-transformersNLPwasmainlymarkedbymodelsthatreliedonhuman-craftedrulesrather
thanmachinelearningalgorithmstoperformNLPtasks.Thismadethemsuitableforsimplertasksthatdidn’trequiretoomanyrules,liketextclassification,butunsuitableformorecomplextasks,suchasmachinetranslation.Rule-basedmodelsalsoperformedpoorlyinedge-casescenarios
becausetheycouldn’tmakeaccuratepredictionsorclassificationsfornever-before-seendataforwhichnoclearruleswereset.Thisproblemwassomewhatsolvedwithsimpleneuralnetworks,
suchasRNNsandLSTMs,developedduringthelaterphasesofthisperiod.RNNsandLSTMscouldmemorizepastdatatoacertainextentand,thus,providecontext-dependentpredictionsand
ABeginner’sGuidetoLargeLanguageModels12
classifications.However,RNNsandLSTMscouldnotmakepredictionsoverlongspansoftext,limitingtheireffectiveness.
2.TransformersNLPwassetinmotionbytheriseofthetransformerarchitecturein2017.
Transformerscouldgeneralizebetterthanthethen-prevailingRNNsandLSTMs,capturemore
context,andprocessmoredataatonce.TheseimprovementsenabledNLPmodelstounderstandlongersequencesofdataandperformamuchwiderrangeoftasks.However,fromtoday’spointofview,modelsdevelopedduringthisperiodhadlimitedcapabilities,mainlyduetothegenerallackoflarge-scaledatasetsandadequatecomputationalresources.Theyalsomainlysparked
attentionamongresearchersandexpertsinthefieldbutnotthegeneralpublic,astheyweren’tuser-friendlynoraccurateenoughtobecomecommercialized.
3.LLMNLPwasmainlyinitiatedbythelaunchofOpenAI’sGPT-3in2020.Largelanguagemodels
likeGPT-3weretrainedonmassiveamountsofdata,whichallowedthemtoproducemore
accurateandcomprehensiveNLPresponsescomparedtopreviousmodels.Thisunlockedmanynewpossibilitiesandbroughtusclosertoachievingwhatmanyconsider“true”AI.Also,LLMs
madeNLPmodelsmuchmoreaccessibletonon-technicaluserswhocouldnowsolveavarietyofNLPtasksjustbyusingnatural-languageprompts.NLPtechnologywasfinallydemocratized.
Theswitchfromonemethodologytoanotherwaslargelydrivenbyrelevanttechnologicaland
methodologicaladvancements,suchastheadventofneuralnetworks,attentionmechanisms,andtransformersanddevelopmentsinthefieldofunsupervisedandself-supervisedlearning.The
followingsectionswillbrieflyexplaintheseconcepts,asunderstandingthemiscrucialfortrulyunderstandinghowLLMsworkandhowtobuildnewLLMsfromscratch.
NeuralNetworks
Neuralnetworks(NNs)aremachinelearningalgorithmslooselymodeledafterthehumanbrain.Likethebiologicalhumanbrain,artificialneuralnetworksconsistofneurons,alsocallednodes,thatareresponsibleforallmodelfunctions,fromprocessinginputtogeneratingoutput.
Theneuronsarefurtherorganizedintolayers,verticallystackedcomponentsofNNsthatperformspecifictasksrelatedtoinputandoutputsequences.
ABeginner’sGuidetoLargeLanguageModels13
Everyneuralnetworkhasatleastthreelayers:
>Theinputlayeracceptsdataandpassesittotherestofthenetwork.
>Thehiddenlayer,ormultiplehiddenlayers,performsspecificfunctionsthatmakethefinal
outputofanNNpossible.Thesefunctionscanincludeidentifyingorclassifyingdata,generatingnewdata,andotherfunctionsdependingonthespecificNLPtaskinquestion.
>Theoutputlayergeneratesapredictionorclassificationbasedontheinput.
WhenLLMswerefirstdeveloped,theywerebasedonsimplerNNarchitectureswithfewerlayers,mainlyrecurrentneuralnetworks(RNNs)andlongshort-termmemorynetworks(LSTMs).Unlike
otherneuralnetworks,RNNsandLSTMscouldtakeintoaccountthecontext,position,and
relationshipsbetweenwordseveniftheywerefarapartinadatasequence.Simplyput,thismeanttheycouldmemorizeandconsiderpastdatawhengeneratingoutput,whichresultedinmore
accuratesolutionstomanyNLPtasks,especiallysentimentanalysisandtextclassification.
ThebiggestadvantagethatneuralnetworkslikeRNNsandLSTMshadovertraditional,rule-basedsystemswasthattheywerecapableoflearningontheirownwithlittletonohumaninvolvement.Theyanalyzedatatocreatetheirownrules,ratherthanlearntherulesfirstandapplythemtodatalater.Thisisalsoknownasrepresentationlearningandisinspiredbyhumanlearningprocesses.
Representations,orfeatures,arehiddenpatternsthatneuralnetworkscanextractfromdata.To
exemplifythis,let’simaginewe’retraininganNN-basedmodelonadatasetcontainingthefollowingtokens:
“cat,”“cats,”dog,”“dogs”
Afteranalyzingthesetokens,themodelmayidentifyarepresentationthatonecouldformulateas:
Pluralnounshavethesuffix“-s.”
Themodelwillthenextractthisrepresentationandapplyittoneworedge-casescenarioswhosedatadistributionfollowsthatoftrainingdata.Forexample,theassumptioncanbemadethatthemodel
willcorrectlyclassifytokenslike“chairs”or“table”aspluralorsingularevenifithadnotencounteredthembefore.Onceitencountersirregularnounsthatdon’tfollowtheextractedrepresentation,the
modelwillupdateitsparameterstoreflectnewrepresentations,suchas:
Pluralnounsarefollowedbypluralverbs.
ThisapproachenablesNN-basedmodelstogeneralizebetterthanrule-basedsystemsandsuccessfullyperformawiderrangeoftasks.
However,theirabilitytoextractrepresentationsisverymuchdependentonthenumberofneuronsandlayerscomprisinganetwork.Themoreneuronsneuralnetworkshave,themorecomplex
representationstheycanextract.That’swhy,today,mostlargelanguagemodelsusedeeplearningneuralnetworkswithmultiplehiddenlayersand,thus,ahighernumberofneurons.
ABeginner’sGuidetoLargeLanguageModels14
Figure2
showsaside-by-sidecomparisonofasingle-layerneuralnetworkandadeeplearningneural
network.
Figure2.ComparisonofSingle-Layervs.DeepLearningNeuralNetwork
Whilethismayseemlikeanobviouschoicetoday,considerthatdevelopingdeepneuralnetworksdidnotmakesensebeforethehardwareevolvedtobeabletohandlemassiveworkloads.Thisonly
becamepossibleafter~1999whenNVIDIAintroduced“theworld’sfirstGPU,”orgraphicsprocessingunit,tothewidermarketor,moreprecisely,afterawildlysuccessfulCNNcalledAlexNetpopularizedtheiruseindeeplearningin2012.
GPUshadahighlyparallelizablearchitecturewhichenabledtherapidadvancesindeeplearning
systemsthatareseentoday.Amongotheradvancements,theadventofGPUsusheredinthe
developmentofanewtypeofneuralnetworkthatwouldrevolutionizethefieldofNLP:transformers.
Transformers
WhileRNNsandLSTMshavetheiradvantages,especiallycomparedtotraditionalmodels,theyalso
havesomelimitationsthatmakethemunsuitableformorecomplexNLPtasks,suchasmachine
translation.Theirmainlimitationistheinabilitytoprocesslongerdatasequencesand,thus,considertheoverallcontextoftheinputsequence.BecauseLSTMsandRNNscannothandletoomuchcontextwell,theiroutputsarepronetobeinginaccurateornonsensical.Thisandotherchallengeshavebeenlargelyovercomewiththeadventofnew,specialneuralnetworkscalledtransformers.
Transformerswerefirstintroducedin2017byVaswanietal.inapapertitled"AttentionisAllYouNeed."Thetitlealludedtoattentionmechanisms,whichwouldbecomethekeycomponentof
transformers.
ABeginner’sGuidetoLargeLanguageModels15
“Weproposeanewsimplenetworkarchitecture,theTransformer,basedsolelyonattention
mechanisms,dispensingwithrecurrenceandconvolutionsentirely.”-Vaswaniet.al,“AttentionisA
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025广东汕头市消防救援支队定向招录潮南区政府专职消防员24人备考笔试试题及答案解析
- 2025年云南建投第一建设有限公司社会招聘(1人)参考考试试题及答案解析
- 2026湖北襄阳市老河口市应征备考考试试题及答案解析
- 《分数连除和乘除混合》数学课件教案
- 2025广西南宁市武鸣区陆斡中心卫生院招聘编外工作人员1人考试备考题库及答案解析
- 2025济宁市招聘劳务派遣制护理员(2人)参考考试试题及答案解析
- 2025年下半年四川乐山职业技术学院考核招聘1人模拟笔试试题及答案解析
- 2025年英山县事业单位第二批公开考核招聘“三支一扶”服务期满人员备考笔试题库及答案解析
- 2026广东深圳北理莫斯科大学汉语中心招聘备考笔试题库及答案解析
- 2026江西省江铜宏源铜业有限公司第二批次社会招聘2人备考笔试试题及答案解析
- 2025年河北地质大学第二次公开招聘工作人员65人备考题库完整答案详解
- 安全岗面试题库及答案
- 2025年劳动合同(兼职设计师)
- 2025至2030中国牙科高速手机行业调研及市场前景预测评估报告
- 2025年辽宁地区农村电力服务有限公司联合招聘笔试参考试题附答案解析
- 旱獭繁殖生态学-洞察及研究
- 2025年监理工程师考试《土建案例》真题及答案解析(完整版)
- 土地整治考试试题及答案
- 重庆市大一联盟2026届高三上学期12月联考数学试卷(含解析)
- 2026届上海市宝山区高三上学期一模数学试卷及答案解析
- 毛笔书法春联课程
评论
0/150
提交评论