H2O 深度学习报告

上传人：媚*** IP属地：安徽上传时间：2024-04-23 格式：DOCX 页数：55 大小：322.81KB 积分：30 举报 版权申诉

已阅读5页，还剩50页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

DeepLearningwithH2O

ArnoCandel ErinLeDellEditedby:AngelaBartz

http://h2o.ai/resources/

October2021:SixthEdition

DeepLearningwithH2O

byArnoCandel&ErinLeDell

withassistancefromVirajParmar&AnishaAroraEditedby:AngelaBartz

PublishedbyH2O.ai,Inc.2307LeghornSt.

MountainView,CA94043

Photosby©H2O.ai,Inc.

Allcopyrightsbelongtotheirrespectiveowners.Whileeveryprecautionhasbeentakeninthepreparationofthisbook,thepublisherandauthorsassumenoresponsibilityforerrorsoromissions,orfordamagesresultingfromtheuseoftheinformationcontainedherein.

PrintedintheUnitedStatesofAmerica.

Contents

Introduction

WhatisH2O?

Installation

InstallationinR

InstallationinPython

PointingtoaDifferentH2OCluster

ExampleCode

Citation

DeepLearningOverview

H2O’sDeepLearningArchitecture

SummaryofFeatures

TrainingProtocol

Initialization

ActivationandLossFunctions

ParallelDistributedNetworkTraining

SpecifyingtheNumberofTrainingSamples

Regularization

AdvancedOptimization

MomentumTraining

RateAnnealing

AdaptiveLearning

LoadingData

DataStandardization/Normalization

Convergence-basedEarlyStopping

Time-basedEarlyStopping

AdditionalParameters

UseCase:MNISTDigitClassification

MNISTOverview

PerformingaTrialRun

N-foldCross-Validation

ExtractingandHandlingtheResults

WebInterface

VariableImportances

JavaModel

GridSearchforModelComparison

4|CONTENTS

WhatisH2O?|5

CartesianGridSearch

RandomGridSearch

CheckpointModels

AchievingWorld-RecordPerformance

ComputationalPerformance

DeepAutoencoders

NonlinearDimensionalityReduction

UseCase:AnomalyDetection

StackedAutoencoder

UnsupervisedPretrainingwithSupervisedFine-Tuning

Parameters

CommonRCommands

CommonPythonCommands

Acknowledgments

References

Authors

Introduction

ThisdocumentintroducesthereadertoDeepLearningwithH2O.ExamplesarewritteninRandPython.Topicsinclude:

installationofH2O

basicDeepLearningconcepts

buildingdeepneuralnetsinH2O

howtointerpretmodeloutput

howtomakepredictions

aswellasvariousimplementationdetails.

WhatisH2O?

H2O.aiisfocusedonbringingAItobusinessesthroughsoftware.ItsflagshipproductisH2O,theleadingopensourceplatformthatmakesiteasyforfinancialservices,insurancecompanies,andhealthcarecompaniestodeployAIanddeeplearningtosolvecomplexproblems.Morethan9,000organizationsand80,000+datascientistsdependonH2Oforcriticalapplicationslikepredictivemaintenanceandoperationalintelligence.Thecompany–whichwasrecentlynamedtotheCBInsightsAI100–isusedby169Fortune500enterprises,including8oftheworld’s10largestbanks,7ofthe10largestinsurancecompanies,and4ofthetop10healthcarecompanies.NotablecustomersincludeCapitalOne,ProgressiveInsurance,Transamerica,Comcast,NielsenCatalinaSolutions,Macy’s,Walgreens,andKaiserPermanente.

Usingin-memorycompression,H2Ohandlesbillionsofdatarowsin-memory,evenwithasmallcluster.Tomakeiteasierfornon-engineerstocreatecompleteanalyticworkflows,H2O’splatformincludesinterfacesforR,Python,Scala,Java,JSON,andCoffeeScript/JavaScript,aswellasabuilt-inwebinterface,Flow.H2Oisdesignedtoruninstandalonemode,onHadoop,orwithinaSparkCluster,andtypicallydeployswithinminutes.

H2Oincludesmanycommonmachinelearningalgorithms,suchasgeneralizedlinearmodeling(linearregression,logisticregression,etc.),Na¨ıveBayes,principalcomponentsanalysis,k-meansclustering,andword2vec.H2Oimplementsbest-in-classalgorithmsatscale,suchasdistributedrandomforest,gradientboosting,anddeeplearning.H2OalsoincludesaStackedEnsemblesmethod,whichfindstheoptimalcombinationofacollectionofpredictionalgorithmsusingaprocess

PAGE

|Installation

Installation|7

knownas”stacking.”WithH2O,customerscanbuildthousandsofmodelsandcomparetheresultstogetthebestpredictions.

H2Oisnurturingagrassrootsmovementofphysicists,mathematicians,andcomputerscientiststoheraldthenewwaveofdiscoverywithdatasciencebycollaboratingcloselywithacademicresearchersandindustrialdatascientists.StanforduniversitygiantsStephenBoyd,TrevorHastie,andRobTibshiraniadvisetheH2Oteamonbuildingscalablemachinelearningalgorithms.Andwithhundredsofmeetupsoverthepastseveralyears,H2Ocontinuestoremainaword-of-mouthphenomenon.

Tryitout

DownloadH2Odirectlyat

http://h2o.ai/download

InstallH2O’sRpackagefromCRANat

https://cran.r-project.

org/

web/packages/h2o/

InstallthePythonpackagefromPyPIat

pypi/h2o/

Jointhecommunity

Tolearnaboutourtrainingsessions,hackathons,andproductupdates,visit

http://h2o.ai

Tolearnaboutourmeetups,visit

topics/h2o/all/

Havequestions?PostthemonStackOverflowusingtheh2otagat

/questions/tagged/h2o

HaveaGoogleaccount(suchasGmailorGoogle+)?Jointheopensourcecommunityforumat

/d/forum/

h2ostream

Jointhechatat

https://gitter.im/h2oai/h2o-3

Installation

H2OrequiresJava;ifyoudonotalreadyhaveJavainstalled,installitfrom

/en/download/

beforeinstallingH2O.

TheeasiestwaytodirectlyinstallH2OisviaanRorPythonpackage.

InstallationinR

ToloadarecentH2OpackagefromCRAN,run:

install.packages("h2o")

Note:TheversionofH2OinCRANmaybeonereleasebehindthecurrentversion.

Forthelatestrecommendedversion,downloadthelateststableH2O-3buildfromtheH2Odownloadpage:

Goto

http://h2o.ai/download

ChoosethelateststableH2O-3build.

Clickthe“InstallinR”tab.

library(h2o)

#StartH2Oonyourlocalmachineusingallavailablecores.

#Bydefault,CRANpolicieslimitusetoonly2cores.

h2o.init(nthreads=-1)

#Gethelp

?h2o.glm

?h2o.gbm

?h2o.deeplearning

#Showademodemo(h2o.glm)demo(h2o.gbm)demo(h2o.deeplearning)

CopyandpastethecommandsintoyourRsession.AfterH2Oisinstalledonyoursystem,verifytheinstallation:

InstallationinPython

ToloadarecentH2OpackagefromPyPI,run:

pipinstallh2o

TodownloadthelateststableH2O-3buildfromtheH2Odownloadpage:

Goto

http://h2o.ai/download

ChoosethelateststableH2O-3build.

Clickthe“InstallinPython”tab.

CopyandpastethecommandsintoyourPythonsession.

AfterH2Oisinstalled,verifytheinstallation:

importh2o

#StartH2Oonyourlocalmachine

h2o.init()

#Gethelphelp(h2o.estimators.glm.H2OGeneralizedLinearEstimator)help(h2o.estimators.gbm.H2OGradientBoostingEstimator)help(h2o.estimators.deeplearning.

H2ODeepLearningEstimator)

#Showademo

h2o.demo("glm")

h2o.demo("gbm")h2o.demo("deeplearning")

PointingtoaDifferentH2OCluster

Theinstructionsintheprevioussectionscreateaone-nodeH2Oclusteronyourlocalmachine.

ToconnecttoanestablishedH2Ocluster(inamulti-nodeHadoopenvironment,forexample)specifytheIPaddressandportnumberfortheestablishedclusterusingtheipandportparametersintheh2o.init()command.ThesyntaxforthisfunctionisidenticalforRandPython:

h2o.init(ip="9",port=54321)

ExampleCode

RandPythoncodefortheexamplesinthisdocumentcanbefoundhere:

/h2oai/h2o-3/tree/master/h2o-docs/src/

booklets/v2_2015/source/DeepLearning_Vignette_code_examples

DeepLearningOverview|9

10|H2O’sDeepLearningArchitecture

Thedocumentsourceitselfcanbefoundhere:

/h2oai/h2o-3/blob/master/h2o-docs/src/

booklets/v2_2015/source/DeepLearning_Vignette.tex

Citation

Tocitethisbooklet,usethefollowing:

Candel,A.,Parmar,V.,LeDell,E.,andArora,A.(Oct2021).DeepLearningwithH2O.

http://h2o.ai/resources

DeepLearningOverview

Unliketheneuralnetworksofthepast,modernDeepLearningprovidestrainingstability,generalization,andscalabilitywithbigdata.Sinceitperformsquitewellinanumberofdiverseproblems,DeepLearningisquicklybecomingthealgorithmofchoiceforthehighestpredictiveaccuracy.

Thefirstsectionisabriefoverviewofdeepneuralnetworksforsupervisedlearningtasks.ThereareseveraltheoreticalframeworksforDeepLearning,butthisdocumentfocusesprimarilyonthefeedforwardarchitectureusedbyH2O.

Thebasicunitinthemodel(shownintheimagebelow)istheneuron,abiologicallyinspiredmodelofthehumanneuron.Inhumans,thevaryingstrengthsoftheneurons’outputsignalstravelalongthesynapticjunctionsandarethenaggregatedasinputforaconnectedneuron’sactivation.

i=1

Inthemodel,theweightedcombinationα=艺n wixi+bofinputsignalsis

aggregated,andthenanoutputsignalf(α)transmittedbytheconnectedneuron.Thefunctionfrepresentsthenonlinearactivationfunctionusedthroughoutthenetworkandthebiasbrepresentstheneuron’sactivationthreshold.

Multi-layer,feedforwardneuralnetworksconsistofmanylayersofinterconnectedneuronunits(asshowninthefollowingimage),startingwithaninputlayertomatchthefeaturespace,followedbymultiplelayersofnonlinearity,andendingwithalinearregressionorclassificationlayertomatchtheoutputspace.Theinputsandoutputsofthemodel’sunitsfollowthebasiclogicofthesingleneurondescribedabove.

Biasunitsareincludedineachnon-outputlayerofthenetwork.Theweightslinkingneuronsandbiaseswithotherneuronsfullydeterminetheoutputoftheentirenetwork.Learningoccurswhentheseweightsareadaptedtominimizetheerroronthelabeledtrainingdata.Morespecifically,foreachtrainingexamplej,theobjectiveistominimizealossfunction,

L(W,B|j).

Here,Wisthecollection{Wi}1:N−1,whereWidenotestheweightmatrixconnectinglayersiandi+1foranetworkofNlayers.SimilarlyBisthecollection{bi}1:N−1,wherebidenotesthecolumnvectorofbiasesforlayeri+1.

Thisbasicframeworkofmulti-layerneuralnetworkscanbeusedtoaccomplishDeepLearningtasks.DeepLearningarchitecturesaremodelsofhierarchicalfeatureextraction,typicallyinvolvingmultiplelevelsofnonlinearity.DeepLearningmodelsareabletolearnusefulrepresentationsofrawdataandhaveexhibitedhighperformanceoncomplexdatasuchasimages,speech,andtext

(Bengio,2009).

H2O’sDeepLearningArchitecture

H2Ofollowsthemodelofmulti-layer,feedforwardneuralnetworksforpredictivemodeling.ThissectionprovidesamoredetaileddescriptionofH2O’sDeepLearningfeatures,parameterconfigurations,andcomputationalimplementation.

H2O’sDeepLearningArchitecture|

PAGE

|H2O’sDeepLearningArchitecture

SummaryofFeatures

H2O’sDeepLearningfunctionalitiesinclude:

supervisedtrainingprotocolforregressionandclassificationtasks

fastandmemory-efficientJavaimplementationsbasedoncolumnarcom-pressionandfine-grainMapReduce

multi-threadedanddistributedparallelcomputationthatcanberunonasingleoramulti-nodecluster

automatic,per-neuron,adaptivelearningrateforfastconvergence

optionalspecificationoflearningrate,annealing,andmomentumoptions

regularizationoptionssuchasL1,L2,dropout,Hogwild!,andmodelaveragingtopreventmodeloverfitting

elegantandintuitivewebinterface(Flow)

fullyscriptableRAPIfromH2O’sCRANpackage

fullyscriptablePythonAPI

gridsearchforhyperparameteroptimizationandmodelselection

automaticearlystoppingbasedonconvergenceofuser-specifiedmetricstouser-specifiedtolerance

modelcheckpointingforreducedruntimesandmodeltuning

automaticpre-andpost-processingforcategoricalandnumericaldata

automaticimputationofmissingvalues(optional)

automatictuningofcommunicationvscomputationforbestperformance

modelexportinplainJavacodefordeploymentinproductionenvironments

additionalexpertparametersformodeltuning

deepautoencodersforunsupervisedfeaturelearningandanomalydetection

TrainingProtocol

ThetrainingprotocoldescribedbelowfollowsmanyoftheideasandadvancesdiscussedinrecentDeepLearningliterature.

Initialization

VariousDeepLearningarchitecturesemployacombinationofunsupervisedpre-trainingfollowedbysupervisedtraining,butH2Ousesapurelysupervisedtrainingprotocol.Thedefaultinitializationschemeistheuniformadaptiveoption,whichisanoptimizedinitializationbasedonthesizeofthenetwork.DeepLearningcanalsobestartedusingarandominitializationdrawnfromeitherauniformornormaldistribution,optionallyspecifyingascalingparameter.

ActivationandLossFunctions

艺

ThechoicesforthenonlinearactivationfunctionfdescribedintheintroductionaresummarizedinTable1below.xiandwirepresentthefiringneuron’sinputvaluesandtheirweights,respectively;αdenotestheweightedcombinationα=iwixi+b.

Function

Formula

Range

Tanh

α −α

f(α)=e−e

f(·)∈[−1,1]

RectifiedLinear

Maxout f

f(α)=max(0,α) f(·)∈R+

(α1,α2)=max(α1,α2) f(·)∈R

Table1:ActivationFunctions

eα+e−α

Thetanhfunctionisarescaledandshiftedlogisticfunction;itssymmetryaround0allowsthetrainingalgorithmtoconvergefaster.Therectifiedlinearactivationfunctionhasdemonstratedhighperformanceonimagerecognitiontasksandisamorebiologicallyaccuratemodelofneuronactivations

(LeCun

etal,1998).

MaxoutisageneralizationoftheRectifiiedLinearactivation,whereeachneuronpicksthelargestoutputofkseparatechannels,whereeachchannelhasitsownweightsandbiasvalues.Thecurrentimplementationsupportsonlyk=2.Maxoutactivationworksparticularlywellwithdropout

(Goodfellowet

al,2013).

Formoreinformation,referto

Regularization

TheRectifieristhespecialcaseofMaxoutwheretheoutputofonechannelisalways0.Itisdifficulttodeterminea“best”activationfunctiontouse;eachmayoutperformtheothersinseparatescenarios,butgridsearchmodelscanhelptocompareactivationfunctionsandotherparameters.Formoreinformation,referto

GridSearchforModelComparison

.ThedefaultactivationfunctionistheRectifier.Eachoftheseactivationfunctionscanbeoperatedwithdropoutregularization.Formoreinformation,referto

Regularization

Specifytheoneofthefollowingdistributionfunctionsfortheresponsevariableusingthedistributionargument:

AUTO

Bernoulli

Multinomial

Poisson

Gamma

Tweedie

Laplace

Quantile

Huber

Gaussian

Eachdistributionhasaprimaryassociationwithaparticularlossfunction,butsomedistributionsallowuserstospecifyanon-defaultlossfunctionfromthegroupoflossfunctionsspecifiedinTable2.Bernoulliandmultinomialareprimarilyassociatedwithcross-entropy(alsoknownaslog-loss),GaussianwithMeanSquaredError,LaplacewithAbsoluteloss(aspecialcaseofQuantilewithquantilealpha=0.5)andHuberwithHuberloss.ForPoisson,Gamma,andTweediedistributions,thelossfunctioncannotbechanged,solossmustbesettoAUTO.

Thesystemdefaultenforcesthetable’stypicaluserulebasedonwhetherregressionorclassificationisbeingperformed.Noteherethatt(j)ando(j)arethepredicted(alsoknownastarget)outputandactualoutput,respectively,fortrainingexamplej;further,letyrepresenttheoutputunitsandOtheoutputlayer.

Table2:Lossfunctions

Function Formula Typicaluse

MeanSquaredError L(W,B|j)=1lt(j)−o(j)l2

Regression

2 2

Absolute L(W,B|j)=lt(j)−o(j)l1 Regression

Huber L(W,B|j)=

Regression

lt(j)−o(j)l1−1

j1lt(j)−o(j)l2

forlt(j)−o(j)l1≤1,

CrossEntropy L(W,B|j)=−

艺ln(o(j))·t(j)+ln(1−o(j))·(1−t(j))

Classification

otherwise.

y y y y

y∈O

Topredictthe80-thpercentileofthepetallengthoftheIrisdatasetinR,usethefollowing:

ExampleinR

library(h2o)h2o.init(nthreads=-1)

train.hex<-h2o.importFile("https://h2o-public-test-/smalldata/iris/iris_wheader.csv")

splits<-h2o.splitFrame(train.hex,0.75,seed=1234)dl<-h2o.deeplearning(x=1:3,y="petal_len",

training_frame=splits[[1]],distribution="quantile",quantile_alpha=0.8)

h2o.predict(dl,splits[[2]])

Topredictthe80-thpercentileofthepetallengthoftheIrisdatasetinPython,usethefollowing:

ExampleinPython

importh2o

fromh2o.estimators.deeplearningimportH2ODeepLearningEstimator

h2o.init()

train=h2o.import_file("https://h2o-public-test-data./smalldata/iris/iris_wheader.csv")

splits=train.split_frame(ratios=[0.75],seed=1234)dl=H2ODeepLearningEstimator(distribution="quantile",

quantile_alpha=0.8)

dl.train(x=range(0,2),y="petal_len",training_frame=splits[0])

print(dl.predict(splits[1]))

ParallelDistributedNetworkTraining

TheprocessofminimizingthelossfunctionL(W,B|j)isaparallelizedversionofstochasticgradientdescent(SGD).AsummaryofstandardSGDprovidedbelow,withthegradient∇L(W,B|j)computedviabackpropagation

(LeCun

etal,1998).

Theconstantαisthelearningrate,whichcontrolsthestepsizesduringgradientdescent.

Standardstochasticgradientdescent

InitializeW,B

Iterateuntilconvergencecriterionreached:

Gettrainingexamplei

∂w

Updateallweightswjk∈W,biasesbjk∈Bwjk:=wjk−α∂L(W,B|j)

∂b

bjk:=bjk−α∂L(W,B|j)

Stochasticgradientdescentisfastandmemory-efficientbutnoteasilyparal-lelizablewithoutbecomingslow.WeutilizeHogwild!,therecentlydevelopedlock-freeparallelizationschemefrom

Niuetal,2011,

toaddressthisissue.

Hogwild!followsasharedmemorymodelwheremultiplecores(whereeachcorehandlesseparatesubsetsorallofthetrainingdata)areabletomakeindependentcontributionstothegradientupdates∇L(W,B|j)asynchronously.

Inamulti-nodesystem,thisparallelizationschemeworksontopofH2O’sdistributedsetupthatdistributesthetrainingdataacrossthecluster.EachnodeoperatesinparallelonitslocaldatauntilthefinalparametersW,Bareobtainedbyaveraging.

Paralleldistributedandmulti-threadedtrainingwithSGDinH2ODeepLearning

InitializeglobalmodelparametersW,B

DistributetrainingdataTacrossnodes(canbedisjointorreplicated)

Iterateuntilconvergencecriterionreached:

FornodesnwithtrainingsubsetTn,doinparallel:

ObtaincopyoftheglobalmodelparametersWn,Bn

SelectactivesubsetTna⊂Tn

(user-givennumberofsamplesperiteration)

PartitionTnaintoTnacbycoresnc

Forcoresnconnoden,doinparallel:

Gettrainingexamplei∈Tnac

∂w

Updateallweightswjk∈Wn,biasesbjk∈Bnwjk:=wjk−α∂L(W,B|j)

∂b

bjk:=bjk−α∂L(W,B|j)

SetW,B:=AvgnWn,AvgnBn

Optionallyscorethemodelon(potentiallysampled)train/validationscoringsets

Here,theweightsandbiasupdatesfollowtheasynchronousHogwild!proce-duretoincrementallyadjusteachnode’sparametersWn,Bnafterseeingtheexamplei.TheAvgnnotationrepresentsthefinalaveragingoftheselocalparametersacrossallnodestoobtaintheglobalmodelparametersandcompletetraining.

SpecifyingtheNumberofTrainingSamples

H2ODeepLearningisscalableandcantakeadvantageoflargeclustersofcomputenodes.Therearethreeoperatingmodes.Thedefaultbehaviorallowseverynodetotrainontheentire(replicated)datasetbutautomaticallyshuffling(and/orusingasubsetof)thetrainingexamplesforeachiterationlocally.

Fordatasetsthatdon’tfitintoeachnode’smemory(dependingontheamountofheapmemoryspecifiedbythe-XmxJavaoption),itmightnotbepossibletoreplicatethedata,soeachcomputenodecanbespecifiedtotrainonlywithlocaldata.Anexperimentalsinglenodemodeisavailableforcaseswherefinalconvergenceisslowduetothepresenceoftoomanynodes,butthishasnotbeennecessaryinourtesting.

TospecifytheglobalnumberoftrainingexamplessharedwiththedistributedSGDworkernodesbetweenmodelaveraging,usethe

trainsamplesperiterationparameter.Ifthespecifiedvalueis-1,allnodesprocessalltheirlocaltrainingdataoneachiteration.

Ifreplicatetrainingdataisenabled,whichisthedefaultsetting,thiswillresultintrainingNepochs(passesoverthedata)periterationonNnodes;otherwise,oneepochwillbetrainedperiteration.Specifying0alwaysresultsinoneepochperiterationregardlessofthenumberofcomputenodes.Ingeneral,thisparametersupportsanypositivenumber.Forlargedatasets,werecommendspecifyingafractionofthedataset.

Avalueof-2,whichisthedefaultvalue,enablesauto-tuningforthisparameterbasedonthecomputationalperformanceoftheprocessorsandthenetworkofthesystemandattemptstofindagoodbalancebetweencomputationandcommunication.Thisparametercanaffecttheconvergencerateduringtraining.

Forexample,ifthetrainingdatacontains10millionrows,andthenumberoftrainingsamplesperiterationisspecifiedas100,000whenrunningonfournodes,theneachnodewillprocess25,000examplesperiteration,anditwilltake40distributediterationstoprocessoneepoch.

Ifthevalueistoohigh,itmighttaketoolongbetweensynchronizationandmodelconvergencemaybeslow.Ifthevalueistoolow,networkcommunicationoverheadwilldominatetheruntimeandcomputationalperformancewillsuffer.

Regularization

H2O’sDeepLearningframeworksupportsregularizationtechniquestopreventoverfitting.£1(L1:Lasso)and£2(L2:Ridge)regularizationenforcethesamepenaltiesastheydowithothermodels:modifyingthelossfunctionsoastominimizeloss:

L1(W,B|j)=L(W,B|j)+λ1R1(W,B|j)+λ2R2(W,B|j).

For£1regularization,R1(W,B|j)isthesumofall£1normsfortheweightsandbiasesinthenetwork;£2regularizationviaR2(W,B|j)representsthesumofsquaresofalltheweightsandbiasesinthenetwork.Theconstantsλ1andλ2aregenerallyspecifiedasverysmall(forexample10−5).

ThesecondtypeofregularizationavailableforDeepLearningisamoderninnovationcalleddropout

(Hintonetal.,2012).

Dropoutconstrainstheonlineoptimizationsothatduringforwardpropagationforagiventrainingexample,eachneuroninthenetworksuppressesitsactivationwithprobabilityP,whichisusuallylessthan0.2forinputneuronsandupto0.5forhiddenneurons.

Therearetwoeffects:aswith£2regularization,thenetworkweightvaluesarescaledtoward0.Althoughtheysharethesameglobalparameters,eachtrainingexampletrainsadifferentmodel.Asaresult,dropoutallowsanexponentiallylargenumberofmodelstobeaveragedasanensembletohelppreventoverfittingandimprovegeneralization.

Ifthefeaturespaceislargeandnoisy,specifyinganinputdropoutusingtheinputdropoutratioparametercanbeespeciallyuseful.Notethatin-putdropoutcanbespecifiedindependentlyofthedropoutspecificationinthehiddenlayers(whichrequiresactivationtobeTanhWithDropout,MaxoutWithDropout,orRectifierWithDropout).Specifytheamountofhiddendropoutperhiddenlayerusingthehiddendropoutratiospa-rameter,whichissetto0.5bydefault.

AdvancedOptimization

H2Ofeaturesmanualandautomaticadvancedoptimizationmodes.Themanualmodefeaturesincludemomentumtrainingandlearningrateannealingandtheautomaticmodefeaturesanadaptivelearningrate.

MomentumTraining

Momentummodifiesback-propagationbyallowingprioriterationstoinfluencethecurrentversion.Inparticular,avelocityvector,v,isdefinedtomodifytheupdatesasfollows:

θrepresentstheparametersW,B

µrepresentsthemomentumcoefficient

αrepresentsthelearningrate

vt+1=µvt−α∇L(θt)θt+1=θt+vt+1

Usingthemomentumparametercanaidinavoidinglocalminimaandanyassociatedinstability

(Sutskeveretal,2014).

Toomuchmomentumcanleadtoinstability,sowerecommendincrementingthemomentumslowly.Thepa-rametersthatcontrolmomentumaremomentumstart,momentumramp,andmomentumstable.

Whenusingmomentumupdates,werecommendusingtheNesterovacceler-atedgradientmethod,whichusesthenesterovacceleratedgradientparameter.Thismethodmodifiestheupdatesasfollows:

vt+1=µvt−α∇L(θt+µvt)Wt+1=Wt+vt+1

RateAnnealing

Duringtraining,thechanceofoscillationor“optimumskipping”createstheneedforaslowerlearningrateasthemodelapproachesaminimum.Asopposedtospecifyingaconstantlearningrateα,learningrateannealinggraduallyreducesthelearningrateαtto“freeze”intolocalminimaintheoptimizationlandscape

(Zeiler,2012).

ForH2O,theannealingrate(rateannealing)istheinverseofthenumberoftrainingsamplesrequiredtodividethelearningrateinhalf(e.g.,10−6meansthatittakes106trainingsamplestohalvethelearningrate).

AdaptiveLearning

TheimplementedadaptivelearningratealgorithmADADELTA

(Zeiler,2012)

automaticallycombinesthebenefitsoflearningrateannealingandmomentumtrainingtoavoidslowconvergence.Tosimplifyhyperparametersearch,specifyonlyρandE.

Insomecases,amanuallycontrolled(non-adaptive)learningrateandmomen-tumspecificationscanleadtobetterresultsbutrequireahyperparametersearchofuptosevenparameters.Ifthemodelisbuiltonatopologywithmanylocalminimaorlongplateaus,aconstantlearningratemayproducesub-optimalresults.However,theadaptivelearningrategenerallyproducesthebestresultsduringourtesting,sothisoptionisthedefault.

Thefirstoftwohyperparametersforadaptivelearningisρ(rho).Itissimilartomomentumandisrelatedtothememoryofpriorweightupdates.Typicalvaluesarebetween0.9and0.999.Thesecondhyperparameter,E(epsilon),issimilartolearningrateannealingduringinitialtrainingandallowsfurtherprogressduringmomentumatlaterstages.Typicalvaluesarebetween10−10and10−4.

LoadingData

LoadingadatasetinRorPythonforusewithH2Oisslightlydifferentthantheusualmethodology.Insteadofusingdata.frameordata.tableinR,orpandas.DataFr

人人文库> 全部分类> 行业资料 > 信息产业

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

H2O 深度学习报告

文档简介

温馨提示

最新文档

评论

H2O 深度学习报告

文档简介

温馨提示

最新文档

评论

相关文档