综述:用于建筑设计的生成式人工智能_第1页
综述:用于建筑设计的生成式人工智能_第2页
综述:用于建筑设计的生成式人工智能_第3页
综述:用于建筑设计的生成式人工智能_第4页
综述:用于建筑设计的生成式人工智能_第5页
已阅读5页,还剩60页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1

arXiv:2404.01335v1[cs.LG]30Mar2024

GenerativeAIforArchitecturalDesign:ALiteratureReview

ChengyuanLi1TianyuZhang2XushengDu2YeZhang1*HaoranXie21TianjinUniversity2JapanAdvancedInstituteofScienceandTechnology

Abstract

GenerativeArtificialIntelligence(AI)haspioneerednewmethodologicalparadigmsinarchitecturaldesign,signifi-cantlyexpandingtheinnovativepotentialandefficiencyofthedesignprocess.Thispaperexplorestheextensiveappli-cationsofgenerativeAItechnologiesinarchitecturalde-sign,atrendthathasbenefittedfromtherapiddevelop-mentofdeepgenerativemodels.GenerativeAdversarialNetworks(GANs)andVariationalAutoencoder(VAE)havebeenextensivelyappliedbefore,significantlyadvancingde-signinnovationandefficiency.Withcontinualtechnolog-icaladvancements,state-of-the-artDiffusionModelsand3DGenerativeModelsareprogressivelyintegratedintoar-chitecturaldesign,offeringdesignersamorediversifiedsetofcreativetoolsandmethodologies.ThisarticlefurtherprovidesacomprehensivereviewofthebasicprinciplesofgenerativeAIandlarge-scalemodelsandhighlightstheapplicationsinthegenerationof2Dimages,videos,and3Dmodels.Inaddition,byreviewingthelatestliteraturefrom2020,thispaperscrutinizestheimpactofgenerativeAItechnologiesatdifferentstagesofarchitecturaldesign,fromgeneratinginitialarchitectural3Dformstoproduc-ingfinalarchitecturalimagery.Themarkedtrendofre-searchgrowthindicatesanincreasinginclinationwithinthearchitecturaldesigncommunitytowardsembracinggener-ativeAI,therebycatalyzingasharedenthusiasmforre-search.Theseresearchcasesandmethodologieshavenotonlyproventoenhanceefficiencyandinnovationsignifi-cantlybuthavealsoposedchallengestotheconventionalboundariesofarchitecturalcreativity.Finally,wepointoutnewdirectionsfordesigninnovationandarticulatefreshtrajectoriesforapplyinggenerativeAIinthearchitecturaldomain.Thisarticleprovidesthefirstcomprehensiveliter-aturereviewaboutgenerativeAIforarchitecturaldesign,andwebelievethisworkcanfacilitatemoreresearchworkonthissignificanttopicinarchitecture.

Keywords:GenerativeAI,ArchitecturalDesign,DiffusionModels,3DGenerativeModels,Large-scalemodels.

*correspondingauthor,zhang.ye@

Figure1.ExamplesofarchitecturedesignusinggenerativeAI

techniques:(a)churchdesign[1];(b)matrixofcuboidshapes

[2];

(c)FrankGehry’sWaltDisneyconcerthall[3];(d)Bangkokurban

design[4];(e)forestingarchitecture[4];(f)Urbaninteriors

[4]

and(g)text-to-architecturaldesign[5]

.

1.Introduction

Nowadays,generativeartificialintelligence(AI)tech-niquesincreasinglyexpandtheirpowerandrevolutioninar-chitecturaldesign.Here,generativeAIreferstotheartificialintelligencetechnologiesdedicatedtocontentgeneration,suchastext,images,music,andvideos.GenerativeAIben-efitsfromtherapiddevelopmentofdeepgenerativemodels,includingGenerativeAdversarialNetworks(GANs),Vari-ationalAutoencoder(VAE),andDiffusionModels(DMs).GANsandVAEaretraditionalgenerativemodels,andhavebeenwidelyexploredinarchitecturaldesign,asillustratedinFigure

1.Inthispaper,wefocusontherecentprogressof

generativeAI,especiallytherevolutionarydiffusionmod-els.DMsachievedstate-of-the-artperformanceinvariouscontentgenerationtaskssuchastext-to-imageandtext-to-3D-models.

Architecturaldesignmayencompassmultiplethemesandscopes,witheachprojecthavingdistinctdesignre-quirementsandindividualstyles,leadingtodiversityandcomplexityindesignapproaches.Inthiswork,weadopt6mainstepsinthearchitecturaldesignprocessforthelit-eraturereview:1)architecturalpreliminary3Dformsde-

2

sign,2)architecturallayoutdesign,3)architecturalstruc-turalsystemdesign,4)detailedandoptimizationdesignofarchitectural3Dforms,5)architecturalfacadedesign,and6)architecturalimageryexpression.Afterexploringthere-searchpapersfrom2020to2023,weobservedtherehasbeenasignificantincreaseinthenumberofresearchpapersinarchitecturaldesignusingGenerativeAI.ThenumberofresearchpapersusingGenerativeAItechnologyindifferentarchitecturaldesignstepsrevealsthedevelopmenttrendswithineachsubfield,asillustratedinFigure

2(a).Mostre

-searchesareconcentratedintheareaofarchitecturalplandesign.Researchinpreliminary3Dformdesignofarchi-tectureandarchitecturalimageexpressionhasrapidlyin-creasedinthepasttwoyears.Moreresearchneedstobedonebyscholarsonarchitectural,structuralsystemdesign,architectural3Dformrefinementandoptimizationdesign,andarchitecturalfacadedesign.

ThissustainedgrowthtrenddistinctlydemonstratesthatgenerativeAIinarchitecturaldesignareexpandingatanun-precedentedratewhilealsoreflectingthearchitecturalde-signandcomputersciencecommunityhavehighlevelofattentionandincreasinginvestmentinGenerativeAItech-nologies.ThemostusedgenerativeAItechniquesareillus-tratedinFig

2(b).Incomputerscience,manystudiesfocus

onGANandVAE,whileresearchonDDPM,LDM,andGPTisintheinitialstages.Thesituationisthesameinarchitecture.

1.1.Motivation

LeveragingtherecentgenerativeAImodelsinarchitec-turaldesigncouldsignificantlyimprovedesignefficiency,andprovidearchitectswithnewdesignprocessesandideastoexpandthepossibilitiesofarchitecturaldesignandrev-olutionizetheentiredesignprocess.However,theuseofadvancedgenerativemodelsinarchitecturaldesignhasnotbeenexploredextensively.Theprimaryreasonsforhinder-ingtheuseofadvancedgenerativemodelsinarchitecturaldesignmayhavetwoaspects:theprofessionalbarriersandtheissueoftrainingdata.

Intermsofprofessionalbarriers,deeplearningandar-chitecturaldesignarehighlyspecializedfieldsrequiringex-tensiveprofessionalknowledgeandexperience.Theaimofthisstudyistonarrowtheprofessionalbarriersbetweenar-chitectureandcomputerscience,andassistarchitecturalde-signersinbridgingGenerativeAItechnologieswithappli-cations,promotinginterdisciplinaryresearch,anddelineat-ingfutureresearchdirections.Thisreviewsystematicallyanalyzesandsummarizescasestudiesandresearchout-comesofGenerativeAIapplicationsinarchitecturaldesign,andshowcasesthepossibilitiesandpotentialoftheintersec-tionbetweencomputerscienceandarchitecture.Thisin-terdisciplinaryperspectiveencouragescollaborationamongexpertsfromdifferentfieldstoaddresscomplexissuesin

architecturaldesign,thusadvancingscientificresearchandtechnologicalinnovation.

Intermsoftheissueoftrainingdata,deeplearningmod-elsrequirehigh-qualitytrainingdatatoanalyzeandver-ifytheirgeneralizationability.However,datainthefieldofarchitectureisusuallyunstructured.Thesearchandor-ganizationofarchitecturaltrainingdataposeasignificantchallenge,makingitdifficultrightfromtheinitialstagesofmodeltraining.Inaddition,high-performanceGraphicsProcessingUnits(GPUs)arerequiredtotrainthemillionsofdatafordeeplearningmodels,especiallythosedealingwithcompleximagesanddatasets.Thescarcityofhigh-performanceGPUsandthedifficultyofmasteringGPUpro-grammingskillsmaypreventthearchitectstoexploretherecentdiffusionmodelandlargefoundationmodels.

1.2.StructureandMethodology

Thisarticlefirstintroducesthedevelopmentandapplica-tiondirectionsofgenerativeAImodels,thenelaboratesonthemethodsofapplyinggenerativeAIinthearchitecturaldesignprocess,andfinally,forecaststhepotentialapplica-tiondevelopmentofgenerativeAIinthearchitecturalfield.

Insection2,thearticleoffersanin-depthintroductiontotheprinciplesandevolutionofvariousgenerativeAImod-els,withafocusonDiffusionModels(DMs),3DGener-ativeModels,andFoundationModels.Insection2.1,thearticleelaboratesontheprinciplesanddevelopmentofVari-ationalAutoencoders(VAEs)andGenerativeAdversarialNetworks(GANs).Insection2.2,thediscourseonDif-fusionModelselaboratesontheworkingmechanismsandthedevelopmentaltrajectoriesofDDPMandLDM.Insec-tion2.3,thesegmenton3DGenerativeModelszeroesinon3Dshaperepresentation,encompassingVoxels,PointClouds,Meshes,Implicitfunctions,andOccupancyFields.WithinOccupancyFields,thepaperdetailsSignedDistanceFunctions(SDF),UnsignedDistanceFunctions(UDF),andNeuralRadianceFields(NeRF),explainingtheirrespec-tiveoperationalprinciples.Insection2.4,theFoundationModelssectioncomprehensivelydescribestheprogressandachievementsofLargeLanguageModels(LLM)andLargeVisionModels.Insection2.5,thepaperdiscussestheap-plicationsanddevelopmentsofthesemodelsinimagegen-eration,videogeneration,and3Dmodelgeneration.

Insection3,thispaperdelvesintotheapplicationde-velopmentofgenerativeAImodelsinarchitecturaldesign.Giventhecomplexityofthearchitecturaldesignprocess,thisarticledelineatesthearchitecturaldesignprocessintosixsteps,aspresentedinintroduction.Ineachstep,thearticlesummarizesanddiscussesthecurrentapplicationmethodsofgenerativeAImodelsinthesesixdomains.Byanalyzingtheseresearchpapers,thestudydemonstrateshowgenerativeAIcanfacilitateinnovationinarchitecturaldesign,improvedesignefficiency,andoptimizearchitec-

3

Figure2.OverviewofgenerativeAIapplicationsinarchitecturaldesign:statisticsonresearchpapernumbersandgenerativemodels.

turalsolutions.Throughoutthissummarizationprocess,literatureretrievalwasconductedusingdatabasessuchasCumincadandWebofScience,supplementedbysearchesonLitmaps.Toensurethetargetedandaccuratenatureofthesearch,specificsearchqueriesweresetforeachdesignprocess.

InSection4,thisarticleexploresthepotentialapplica-tionsofgenerativeAItechnologyingeneratingarchitec-turaldesignimages,architecturaldesignvideos,architec-turaldesign3Dmodels,andhuman-centricarchitecturalde-sign.Insection4.1,itanticipatesapplicationsforarchi-tecturaldesignimagegenerationingeneratingfloorplans,facadeimages,architecturalimages.Insection4.2,itan-ticipatesarchitecturaldesignvideogeneration,itforeseesapplicationssuchasgeneratingvideosfromasinglearchi-tecturalimage,generatingvideosfromarchitecturalimages,styletransferforspecificvideocontent.Insection4.3,Re-gardingarchitecturaldesign3Dmodelgeneration,itenvi-sionspossibilitiesingenerating3Dmodelsfromimagesandtextprompt,transferringstylesto3Dmodels,andgenerat-ingandeditingdetailedstylesfor3Dmodels.Insection4.4,itelaboratesonthepotentialofgenerativeAIinenhancingthehuman-centricarchitecturaldesignprocess.

2.GenerativeAIModels

ThegenerativeAImodelsarecurrentlyexperiencingrapiddevelopment,withnewmethodscontinuallyemerg-ing.Theevolutionofdeeplearning-basedapproaches,par-ticularlyVariationalAutoencoders(VAE),GenerativeAd-versarialNetworks(GAN),andDiffusionModels(DM),havesignificantlyadvancedandenhancedimagegenerationtechniques.VAEsplayedapioneeringroleindeeplearning-basedgenerativemodels.Theyemployanencoder-decoderarchitectureintegratedwithprobabilisticgraphicalmod-

elstolearnlatentrepresentationsforimagegeneration[6]

.GANsrepresentamilestoneintherealmofimagegener-

GAN:

VAE:

DM:

Figure3.TheframeworkofGAN,VAE,anddiffusionmodels(DM).Wherezisacompressedlow-dimensionalrepresentationoftheinput.

ationwithageneratorandadiscriminator,GANsengageinanadversarialtrainingprocesstopromptthegeneratortogenerateimagesprogressivelyresemblingthedistribu-

tionofrealdata[7,

8]

.Moreover,thediffusionmodelsstandoutasthemostrevolutionarytechnologiesthathaveemergedinrecentyearswithremarkableimagegeneration

quality[9,

10]

2.1.GenerativeAdversarialNetworks

GenerativeAdversarialNetwork(GAN)[11]comprises

ageneratorGandadiscriminatorD,asillustratedinFig-ure

3.TheG

isresponsibleforgeneratingsamplesfornoise

4

z,whiletheDdeterminestheauthenticityofthegeneratedsamplesG(Z)withthegroundtruthimage.Ideally:

D()=1,D(G(z))=0(1)

Thisadversarialnatureenablesthemodeltomaintainady-namicequilibriumbetweengenerationanddiscrimination,propellingthelearningandoptimizationoftheentiresys-tem.Despiteitsadvantages,GANstillfaceschallenges,suchasmodecollapseduringtraining.

ConditionalGANConditionalimagegenerationisanimagegenerationtechniquethatcontrolsthegenerationprocessbyintroducingconditionalinformationtogener-ateimagesthatmatchgivenconditions,suchastext,la-bels,andhand-drawnsketches.Conditionalimagegener-ationintroducesadditionalinputconditions,enablingthegeneratortogenerateimageswithspecificpropertiesbasedonconditionalinformation.ToaddresstheissuethatGANmodelsexhibitlimitedcontrollability,ConditionalGAN

(CGAN)[12]wasintroducedthatusesadditionalauxiliary

informationasaconditiontofine-tuneboththeGandD.TheGofCGANreceivesconditionalinformationbesidesrandomnoise.ByprovidingconditionalinformationtotheG,CGANcanmorepreciselycontrolthegeneratedre-

sults.Additionally,variantssuchaspix2pix[13]andStyle

-

GAN[7]havebeendeveloped.

2.2.DiffusionModels

Inimagegeneration,diffusionmodelsoutperformGANs

andVAEs[14,

15]

.MostdiffusionmodelscurrentlyusedarebasedonDenoisingDiffusionProbabilisticModels

(DDPM)[15]whichsimplifiesthediffusionmodelthrough

variationalinference.AsshowninFigure

3,diffusionmod

-elscontainbothforwarddiffusionprocessandreversede-noising(inference)processes.TheforwardprocessfollowstheconceptofaMarkovchainandturnstheinputimageintoGaussiannoise.Givenadatasamplex0,theGaussiannoiseisprogressivelyIncreasedtothedatasampleduringTstepsintheforwardprocess,producingthenoisysam-plesxt,wherethetimestept={1,...,T}.Astincreases,thedistinguishablefeaturesofx0graduallydiminish.Even-tuallywhenT→∞,xTisequivalenttoaGaussiandis-tributionwithisotropiccovariance.Inaddition,theinfer-enceprocesscanbeunderstoodasasequenceofdenoisingautoencoderswithsameweightsϵθ(xt,t)(ϵθistypically

implementedasU-Net[16]),whicharetrainedtoforecast

denoisedimagesoftheircorrespondinginputsxt.

LatentDiffusionModelDifferentfromDDPM,Latent

DiffusionModel(LDM)[9]doesnotdirectlyoperateon

theimagesbutoperatesinthelatentspace,calledpercep-tualcompression.LDMreducesthedimensionalityofthe

Figure4.Theframeworkofthelatentdiffusionmodel,whichis

proposedbyRombachetal[9]

.

databyprojectingitintoalow-dimensional,efficientlatent

space,inwhichhigh-frequency,imperceptibledetailsare

abstractedaway.TheframeworkofLDMisillustratedin

Figure

4.Aftertheimagexiscompressedbytheencoder

E

tolatentrepresentationz,thediffusionprocessisperformed

onthelatentrepresentationspace.LDMhasasimilardif-

fusionprocesstotheDDPM.Finally,LDMinfersthedata

samplezfromthenoisezTandDrestoresthedataztothe

originalpixelspaceandgetstheresultimagesx.

Specifically,givenanimagex∈RH×W×3withheightH,wigthWinRGBspace,LDMfirstutilizesanencoderEtoencodetheimagexintoalatentrepresentationspace:

z=E(x)(2)

wherez∈Rh×w×cwithheighthandwidthw,theconstantcrepresentsthenumberofchannels.ThenDrecovertheimagefromthelatentrepresentationspace:

=D(z)=D(E(x))(3)

Toacceleratethegenerationspeed,theLatentConsis-

tencyModel(LCM)[17]wasproposedtooptimizethestep

ofdenoisinginference.

2.3.3DGenerativeModels

Inthefieldofthree-dimensionalshapemodeling,im-plicitfunctionsarecommonlyrepresentedinthreeways:OccupancyField,SignedDistanceFunction(SDF),orUn-signedDistanceFunction(UDF),andtherecentlyemergingNeuralRadianceFields(NeRF).

3DShapeRepresentationRepresentationin3Dvisualproblemscangenerallybedividedintofourcategories:voxel-based,pointcloud-based,mesh-based,andimplicitrepresentation-based.

Voxel.AsshowninFig

5a.

Thevoxelformatdescribesa3Dobjectasamatrixofvolumeoccupancy,wherethe

sizeofthematrixisfixed.Researchers[18]adoptedvoxel

5

(a)Voxel(b)Point(c)Mesh(d)Implicit

Figure5.Representationexamplesof3Dshapesfrom[24]

.

representationinthegenerationof3Dshapes.Voxelfor-matrequireshighresolutiontodescribefine-graineddetails,soastheshaperesolutionincreases,thecomputationalcostalsoexplodes.Thereconstructionresultsofvoxel-basedre-searcharelimitedinresolutionanddonotprovidetopolog-icalguaranteesorrepresentsharpfeatures.

PointCloud.Asshownin

5b.Pointcloudsarealightweight

3Drepresentationcomposedof(x,y,z)coordinatevalues.Pointcloudsareanaturalwaytorepresentshapes.Point-

Net[19]extractsglobalshapefeaturesusingthemax-set

operations,anditisusedwidelyasanencoderforpoint-

basedgenerativenetworks[20].However,pointcloudsdo

notrepresenttopologyandareunsuitableforgeneratingwa-tertightsurfaces.

Mesh.Asshownin

5c

meshesarewidelyusedandcon-

structedfromverticesandfaces.[21]deformedapre

-definedtemplatetorestrictafixedtopologyusinggraphconvolution.Recently,meshesareusedtorepresentshapes

indeeplearningtechniques[22]

.Althoughmeshesaremoresuitablefordescribingthetopologicalstructureofob-jects,theyusuallyrequireadvancedpreprocessingsteps.

Implicit.Asshownin

5d,implicitrepresentationrefersto

describingasurfacewithazero-crossingpointofavolumefunctionψ:R3→R,whosevaluecanbeadjusted.Repre-sentinga3Dshapeasasetoflevelsetsofadeepnetwork,

mapping3Dcoordinatestoasigneddistancefunction[23]

oroccupancyfield[24].Implicitrepresentationcancreate

alightweight,continuousshaperepresentationwithnores-olutionlimits.

OccupancyFieldOccupancyFieldisoneoftheimplicit

functionmethodsbasedondeeplearning[24]

.Occu-pancyFieldassignsbinaryvaluestoeachpointinthree-dimensionalspace,determiningwhetherthepointisoccu-piedbyanobject.Thisapproachutilizesneuralnetworkstolearntherepresentationofoccupancyfields,facilitatinghighlydetailedthree-dimensionalreconstruction.Thead-vantageofOccupancyFieldliesinitsdynamicmodelingofobjectoccupancyinscenes,makingitsuitableforhandlingcomplexthree-dimensionalenvironments.

SDF.BuildinguponOccupancyField,theSignedDistance

Figure6.

DeepSDF[23]representationappliedtotheStanford

Bunny:(a)depictionoftheunderlyingimplicitsurfaceSDF=0trainedonsampledpointsinsideSDF<0andoutsideSDF>0thesurface,(b)2Dcross-sectionofthesigneddistancefield,(c)rendered3DsurfacerecoveredfromSDF=0.Notethat(b)and(c)arerecoveredviaDeepSDF.

Function(SDF)hasbecomeacrucialdirectioninimplicitfunctionrepresentationwithindeeplearning.SDFassignsasigneddistancevaluetoeachpoint,indicatingtheshort-estdistancefromthepointtotheobject’ssurface.Positivevaluessignifypointsoutsidetheobject,whilenegativeval-uesindicatepointsinsidetheobject.AsshowninFigure

6.

DeepSDF[23]providesanend-to-endapproachforcontin

-uousSDFlearning,enablingprecisemodelingofirregularshapesandlocalgeometry.

UDF.UDFandSDFaretwodistinctyetinterrelatedim-plicitfunctionrepresentationapproaches.UDFassignsanunsigneddistancevaluetoeachpoint,representingthedis-tancetothenearestsurfacewithoutconsideringsurfacedi-rection.UDFisparticularlyusefulforcapturingmoreintu-itivesurfacedistanceinformationwithoutinvolvingdirec-

tionalaspects.Zhaoetal.[26]contributesignificantlyby

jointlyexploringthelearningofbothsignedandunsigneddistancefunctions.Thisapproachaimstoenrichtheex-pressivenessofimplicitfunctions,simultaneouslycapturingintricatedetailsthroughbothsignedandunsigneddistanceinformation.

NeRF.

NeuralRadianceFields(NeRF)[25]haverevolu

-tionizedthefieldofcomputervisionandgraphicsbyintro-ducinganovelapproachtoscenerepresentation.AsshowninFigure

7.AttheheartofNeRFliestheconceptofrepre

-sentingasceneasacontinuousfunctioncapturingradianceinformationateverypoint.Thefundamentalequationdriv-ingNeRFistherenderingequation,mathematicallyformu-latingtheobservedradiancealongaviewingray.TheNeRFformulationisexpressedas:

C(p)=lT(pt)·σ(pt)·L(pt,−d)dpt

WhereC(p)representstheobservedcoloratpointp,ptrepresentspointsalongtheviewingray,T(pt)isthetrans-mittancefunction,σ(pt)representsvolumedensity,and

6

Figure7.AnoverviewofNeRFscenerepresentationanddifferentiablerenderingprocedure

[25].Synthesizingimagesbysampling5D

coordinates(locationandviewingdirection)alongcamerarays(a),feedingthoselocationsintoanMLPtoproduceacolorandvolumedensity(b),andusingvolumerenderingtechniquestocompositethesevaluesintoanimage(c).Andminimizetheresidualbetweensynthesizedandgroundtruthobservedimages(d).

L(pt,−d)representsemittedradiance.NeRFintroducesanimplicitrepresentation,enablingtheencodingofdetailedandcontinuousvolumetricinformation.Thisallowsforhigh-fidelityreconstructionandrenderingofsceneswithfine-scalestructures,surpassingthelimitationsofexplicit

representations.Recently,3DGaussianSplatting[27]isin

-troducedbyprojecting3Dinformationontoa2DdomainusingGaussiankernels,andachievedbetterperformancethanNeRF.

2.4.FoundationModels

Incomputerscience,foundationmodelsalsocalledlarge-scalemodelsusedeeplearningmodelswithnumer-ousparametersandintricatestructures,particularlyinnat-urallanguageprocessingandcomputervisiontasks.Thesemodelsdemandsubstantialcomputationalresourcesfortrainingbutexhibitexceptionalperformanceacrossdiversetasks.Theevolutionfrombasicneuralnetworkstosophis-ticateddiffusionmodels,asdepictedinFigure

8,illustrates

thecontinuousquestformorerobustandadaptableAIsys-tems.

2.4.1LargeLanguageModels(LLM)

Transformer.TheTransformermodelhasachievedremark-ablesuccessinnaturallanguageprocessing(NLP)whichconsistsofseveralcomponents:encoder,decoder,posi-tionalEncoding,andthefinallinearandsoftmaxlayers.Boththeencoderanddecoderarecomposedofmultipleidenticallayers.Eachlayercontainsseveralcomponentsofattentionlayersandfeedforwardnetworklayers.Addition-ally,positionalencodingisusedtoinjectpositionalinfor-mationintothetextembeddings,indicatingthepositionofwordswithinthesequence.Notably,TransformerhaspavedthewayfortwoprominentTransformermodels:Bidirec-tionalEncoderRepresentationsfromTransformers(BERT)

[28]andGenerativePre-trainedTransformer(GPT)[29]

.ThemaindifferenceisthatBERTisbasedonabidirectionalpre-traininglanguagemodelandfine-tuning,whileGPTisbasedonanautoregressivepre-traininglanguagemodelandprompting.

GPT.GPTaimstopre-trainmodelsusinglarge-scaleun-supervisedlearningtofacilitateunderstandingandgener-ationofnaturallanguage.Thetrainingprocessinvolvestwoprimarystages:Initially,alanguagemodelistrainedinanunsupervisedmanneronextensivecorporawithouttask-specificlabelsorannotations.Subsequently,super-visedfine-tuningoccursduringthesecondstage,cateringtospecificapplicationdomainsandtasks.

BERT.BERThasemergedasabreakthroughapproach,achievingstate-of-the-artperformanceacrossdiverselan-guagetasks.BERT’strainingmethodologycomprisestwokeystages:pre-trainingandfine-tuning.Pre-trainingin-volvestheutilizationofextensivetextcorporatotrainthelanguagemodel.Theprimaryobjectiveofpre-trainingistoendowtheBERTmodelwithrobustlanguageunder-standingcapabilities,enablingittoeffectivelytacklevar-iousnaturallanguageprocessingtasks.Subsequently,fine-tuningutilizesthepre-trainedBERTmodelinconjunctionwithsmallerlabeleddatasetstorefinethemodelparame-ters.Thisprocessfacilitatesthecustomizationofthemodeltospecifictasks,therebyenhancingitssuitabilityandper-formancefortargetedapplications.

Inrecentyears,LLMshavewitnessedexplosiveandrapidgrowth.Basiclanguagemodelsrefertomodelsthatareonlypre-trainedonlarge-scaletextcorpora,withoutany

fine-tuning.ExamplesofsuchmodelsincludeLaMDA[30]

andOpenAI’sGPT-3[31]

.

7

OtherDiffusionLarge-VisualNLPLarge-Language

MethodsMethodsModelsMethodsModels

Before2014201520162017201820192020202120222023

Gemini

CLIP

ERNIEBot

BART

BERTT5

GPT-1GPT-2GPT-3GPT-4

CogView

Classifier-FreePrompt-to-

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论