版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
DeepLearningwithH2O
ArnoCandel ErinLeDellEditedby:AngelaBartz
http://h2o.ai/resources/
October2021:SixthEdition
DeepLearningwithH2O
byArnoCandel&ErinLeDell
withassistancefromVirajParmar&AnishaAroraEditedby:AngelaBartz
PublishedbyH2O.ai,Inc.2307LeghornSt.
MountainView,CA94043
©2016-2021H2O.ai,Inc.AllRightsReserved.October2021:SixthEdition
Photosby©H2O.ai,Inc.
Allcopyrightsbelongtotheirrespectiveowners.Whileeveryprecautionhasbeentakeninthepreparationofthisbook,thepublisherandauthorsassumenoresponsibilityforerrorsoromissions,orfordamagesresultingfromtheuseoftheinformationcontainedherein.
PrintedintheUnitedStatesofAmerica.
Contents
Introduction
5
WhatisH2O?
5
Installation
6
InstallationinR
7
InstallationinPython
7
PointingtoaDifferentH2OCluster
8
ExampleCode
8
Citation
9
DeepLearningOverview
9
H2O’sDeepLearningArchitecture
10
SummaryofFeatures
11
TrainingProtocol
12
Initialization
12
ActivationandLossFunctions
12
ParallelDistributedNetworkTraining
15
SpecifyingtheNumberofTrainingSamples
17
Regularization
18
AdvancedOptimization
18
MomentumTraining
19
RateAnnealing
19
AdaptiveLearning
20
LoadingData
20
DataStandardization/Normalization
20
Convergence-basedEarlyStopping
21
Time-basedEarlyStopping
21
AdditionalParameters
21
UseCase:MNISTDigitClassification
22
MNISTOverview
22
PerformingaTrialRun
25
N-foldCross-Validation
27
ExtractingandHandlingtheResults
28
WebInterface
31
VariableImportances
31
JavaModel
33
GridSearchforModelComparison
33
4|CONTENTS
WhatisH2O?|5
CartesianGridSearch
34
RandomGridSearch
35
CheckpointModels
37
AchievingWorld-RecordPerformance
41
ComputationalPerformance
41
DeepAutoencoders
42
NonlinearDimensionalityReduction
42
UseCase:AnomalyDetection
43
StackedAutoencoder
46
UnsupervisedPretrainingwithSupervisedFine-Tuning
46
Parameters
46
CommonRCommands
53
CommonPythonCommands
53
Acknowledgments
53
References
54
Authors
55
Introduction
ThisdocumentintroducesthereadertoDeepLearningwithH2O.ExamplesarewritteninRandPython.Topicsinclude:
installationofH2O
basicDeepLearningconcepts
buildingdeepneuralnetsinH2O
howtointerpretmodeloutput
howtomakepredictions
aswellasvariousimplementationdetails.
WhatisH2O?
H2O.aiisfocusedonbringingAItobusinessesthroughsoftware.ItsflagshipproductisH2O,theleadingopensourceplatformthatmakesiteasyforfinancialservices,insurancecompanies,andhealthcarecompaniestodeployAIanddeeplearningtosolvecomplexproblems.Morethan9,000organizationsand80,000+datascientistsdependonH2Oforcriticalapplicationslikepredictivemaintenanceandoperationalintelligence.Thecompany–whichwasrecentlynamedtotheCBInsightsAI100–isusedby169Fortune500enterprises,including8oftheworld’s10largestbanks,7ofthe10largestinsurancecompanies,and4ofthetop10healthcarecompanies.NotablecustomersincludeCapitalOne,ProgressiveInsurance,Transamerica,Comcast,NielsenCatalinaSolutions,Macy’s,Walgreens,andKaiserPermanente.
Usingin-memorycompression,H2Ohandlesbillionsofdatarowsin-memory,evenwithasmallcluster.Tomakeiteasierfornon-engineerstocreatecompleteanalyticworkflows,H2O’splatformincludesinterfacesforR,Python,Scala,Java,JSON,andCoffeeScript/JavaScript,aswellasabuilt-inwebinterface,Flow.H2Oisdesignedtoruninstandalonemode,onHadoop,orwithinaSparkCluster,andtypicallydeployswithinminutes.
H2Oincludesmanycommonmachinelearningalgorithms,suchasgeneralizedlinearmodeling(linearregression,logisticregression,etc.),Na¨ıveBayes,principalcomponentsanalysis,k-meansclustering,andword2vec.H2Oimplementsbest-in-classalgorithmsatscale,suchasdistributedrandomforest,gradientboosting,anddeeplearning.H2OalsoincludesaStackedEnsemblesmethod,whichfindstheoptimalcombinationofacollectionofpredictionalgorithmsusingaprocess
PAGE
6
|Installation
Installation|7
knownas”stacking.”WithH2O,customerscanbuildthousandsofmodelsandcomparetheresultstogetthebestpredictions.
H2Oisnurturingagrassrootsmovementofphysicists,mathematicians,andcomputerscientiststoheraldthenewwaveofdiscoverywithdatasciencebycollaboratingcloselywithacademicresearchersandindustrialdatascientists.StanforduniversitygiantsStephenBoyd,TrevorHastie,andRobTibshiraniadvisetheH2Oteamonbuildingscalablemachinelearningalgorithms.Andwithhundredsofmeetupsoverthepastseveralyears,H2Ocontinuestoremainaword-of-mouthphenomenon.
Tryitout
DownloadH2Odirectlyat
http://h2o.ai/download
.
InstallH2O’sRpackagefromCRANat
https://cran.r-project.
org/
web/packages/h2o/
.
InstallthePythonpackagefromPyPIat
/
pypi/h2o/
.
Jointhecommunity
Tolearnaboutourtrainingsessions,hackathons,andproductupdates,visit
http://h2o.ai
.
Tolearnaboutourmeetups,visit
/
topics/h2o/all/
.
Havequestions?PostthemonStackOverflowusingtheh2otagat
/questions/tagged/h2o
.
HaveaGoogleaccount(suchasGmailorGoogle+)?Jointheopensourcecommunityforumat
/d/forum/
h2ostream
.
Jointhechatat
https://gitter.im/h2oai/h2o-3
.
Installation
H2OrequiresJava;ifyoudonotalreadyhaveJavainstalled,installitfrom
/en/download/
beforeinstallingH2O.
TheeasiestwaytodirectlyinstallH2OisviaanRorPythonpackage.
InstallationinR
ToloadarecentH2OpackagefromCRAN,run:
install.packages("h2o")
1
Note:TheversionofH2OinCRANmaybeonereleasebehindthecurrentversion.
Forthelatestrecommendedversion,downloadthelateststableH2O-3buildfromtheH2Odownloadpage:
Goto
http://h2o.ai/download
.
ChoosethelateststableH2O-3build.
Clickthe“InstallinR”tab.
library(h2o)
#StartH2Oonyourlocalmachineusingallavailablecores.
#Bydefault,CRANpolicieslimitusetoonly2cores.
h2o.init(nthreads=-1)
#Gethelp
?h2o.glm
?h2o.gbm
?h2o.deeplearning
#Showademodemo(h2o.glm)demo(h2o.gbm)demo(h2o.deeplearning)
CopyandpastethecommandsintoyourRsession.AfterH2Oisinstalledonyoursystem,verifytheinstallation:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
InstallationinPython
ToloadarecentH2OpackagefromPyPI,run:
pipinstallh2o
1
TodownloadthelateststableH2O-3buildfromtheH2Odownloadpage:
Goto
http://h2o.ai/download
.
ChoosethelateststableH2O-3build.
Clickthe“InstallinPython”tab.
CopyandpastethecommandsintoyourPythonsession.
AfterH2Oisinstalled,verifytheinstallation:
importh2o
#StartH2Oonyourlocalmachine
h2o.init()
#Gethelphelp(h2o.estimators.glm.H2OGeneralizedLinearEstimator)help(h2o.estimators.gbm.H2OGradientBoostingEstimator)help(h2o.estimators.deeplearning.
H2ODeepLearningEstimator)
#Showademo
h2o.demo("glm")
h2o.demo("gbm")h2o.demo("deeplearning")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
PointingtoaDifferentH2OCluster
Theinstructionsintheprevioussectionscreateaone-nodeH2Oclusteronyourlocalmachine.
ToconnecttoanestablishedH2Ocluster(inamulti-nodeHadoopenvironment,forexample)specifytheIPaddressandportnumberfortheestablishedclusterusingtheipandportparametersintheh2o.init()command.ThesyntaxforthisfunctionisidenticalforRandPython:
h2o.init(ip="9",port=54321)
1
ExampleCode
RandPythoncodefortheexamplesinthisdocumentcanbefoundhere:
/h2oai/h2o-3/tree/master/h2o-docs/src/
booklets/v2_2015/source/DeepLearning_Vignette_code_examples
DeepLearningOverview|9
10|H2O’sDeepLearningArchitecture
Thedocumentsourceitselfcanbefoundhere:
/h2oai/h2o-3/blob/master/h2o-docs/src/
booklets/v2_2015/source/DeepLearning_Vignette.tex
Citation
Tocitethisbooklet,usethefollowing:
Candel,A.,Parmar,V.,LeDell,E.,andArora,A.(Oct2021).DeepLearningwithH2O.
http://h2o.ai/resources
.
DeepLearningOverview
Unliketheneuralnetworksofthepast,modernDeepLearningprovidestrainingstability,generalization,andscalabilitywithbigdata.Sinceitperformsquitewellinanumberofdiverseproblems,DeepLearningisquicklybecomingthealgorithmofchoiceforthehighestpredictiveaccuracy.
Thefirstsectionisabriefoverviewofdeepneuralnetworksforsupervisedlearningtasks.ThereareseveraltheoreticalframeworksforDeepLearning,butthisdocumentfocusesprimarilyonthefeedforwardarchitectureusedbyH2O.
Thebasicunitinthemodel(shownintheimagebelow)istheneuron,abiologicallyinspiredmodelofthehumanneuron.Inhumans,thevaryingstrengthsoftheneurons’outputsignalstravelalongthesynapticjunctionsandarethenaggregatedasinputforaconnectedneuron’sactivation.
i=1
Inthemodel,theweightedcombinationα=艺n wixi+bofinputsignalsis
aggregated,andthenanoutputsignalf(α)transmittedbytheconnectedneuron.Thefunctionfrepresentsthenonlinearactivationfunctionusedthroughoutthenetworkandthebiasbrepresentstheneuron’sactivationthreshold.
Multi-layer,feedforwardneuralnetworksconsistofmanylayersofinterconnectedneuronunits(asshowninthefollowingimage),startingwithaninputlayertomatchthefeaturespace,followedbymultiplelayersofnonlinearity,andendingwithalinearregressionorclassificationlayertomatchtheoutputspace.Theinputsandoutputsofthemodel’sunitsfollowthebasiclogicofthesingleneurondescribedabove.
Biasunitsareincludedineachnon-outputlayerofthenetwork.Theweightslinkingneuronsandbiaseswithotherneuronsfullydeterminetheoutputoftheentirenetwork.Learningoccurswhentheseweightsareadaptedtominimizetheerroronthelabeledtrainingdata.Morespecifically,foreachtrainingexamplej,theobjectiveistominimizealossfunction,
L(W,B|j).
Here,Wisthecollection{Wi}1:N−1,whereWidenotestheweightmatrixconnectinglayersiandi+1foranetworkofNlayers.SimilarlyBisthecollection{bi}1:N−1,wherebidenotesthecolumnvectorofbiasesforlayeri+1.
Thisbasicframeworkofmulti-layerneuralnetworkscanbeusedtoaccomplishDeepLearningtasks.DeepLearningarchitecturesaremodelsofhierarchicalfeatureextraction,typicallyinvolvingmultiplelevelsofnonlinearity.DeepLearningmodelsareabletolearnusefulrepresentationsofrawdataandhaveexhibitedhighperformanceoncomplexdatasuchasimages,speech,andtext
(Bengio,2009).
H2O’sDeepLearningArchitecture
H2Ofollowsthemodelofmulti-layer,feedforwardneuralnetworksforpredictivemodeling.ThissectionprovidesamoredetaileddescriptionofH2O’sDeepLearningfeatures,parameterconfigurations,andcomputationalimplementation.
H2O’sDeepLearningArchitecture|
PAGE
11
PAGE
12
|H2O’sDeepLearningArchitecture
SummaryofFeatures
H2O’sDeepLearningfunctionalitiesinclude:
supervisedtrainingprotocolforregressionandclassificationtasks
fastandmemory-efficientJavaimplementationsbasedoncolumnarcom-pressionandfine-grainMapReduce
multi-threadedanddistributedparallelcomputationthatcanberunonasingleoramulti-nodecluster
automatic,per-neuron,adaptivelearningrateforfastconvergence
optionalspecificationoflearningrate,annealing,andmomentumoptions
regularizationoptionssuchasL1,L2,dropout,Hogwild!,andmodelaveragingtopreventmodeloverfitting
elegantandintuitivewebinterface(Flow)
fullyscriptableRAPIfromH2O’sCRANpackage
fullyscriptablePythonAPI
gridsearchforhyperparameteroptimizationandmodelselection
automaticearlystoppingbasedonconvergenceofuser-specifiedmetricstouser-specifiedtolerance
modelcheckpointingforreducedruntimesandmodeltuning
automaticpre-andpost-processingforcategoricalandnumericaldata
automaticimputationofmissingvalues(optional)
automatictuningofcommunicationvscomputationforbestperformance
modelexportinplainJavacodefordeploymentinproductionenvironments
additionalexpertparametersformodeltuning
deepautoencodersforunsupervisedfeaturelearningandanomalydetection
TrainingProtocol
ThetrainingprotocoldescribedbelowfollowsmanyoftheideasandadvancesdiscussedinrecentDeepLearningliterature.
Initialization
VariousDeepLearningarchitecturesemployacombinationofunsupervisedpre-trainingfollowedbysupervisedtraining,butH2Ousesapurelysupervisedtrainingprotocol.Thedefaultinitializationschemeistheuniformadaptiveoption,whichisanoptimizedinitializationbasedonthesizeofthenetwork.DeepLearningcanalsobestartedusingarandominitializationdrawnfromeitherauniformornormaldistribution,optionallyspecifyingascalingparameter.
ActivationandLossFunctions
艺
ThechoicesforthenonlinearactivationfunctionfdescribedintheintroductionaresummarizedinTable1below.xiandwirepresentthefiringneuron’sinputvaluesandtheirweights,respectively;αdenotestheweightedcombinationα=iwixi+b.
Function
Formula
Range
Tanh
α −α
f(α)=e−e
f(·)∈[−1,1]
RectifiedLinear
Maxout f
f(α)=max(0,α) f(·)∈R+
(α1,α2)=max(α1,α2) f(·)∈R
Table1:ActivationFunctions
eα+e−α
Thetanhfunctionisarescaledandshiftedlogisticfunction;itssymmetryaround0allowsthetrainingalgorithmtoconvergefaster.Therectifiedlinearactivationfunctionhasdemonstratedhighperformanceonimagerecognitiontasksandisamorebiologicallyaccuratemodelofneuronactivations
(LeCun
etal,1998).
MaxoutisageneralizationoftheRectifiiedLinearactivation,whereeachneuronpicksthelargestoutputofkseparatechannels,whereeachchannelhasitsownweightsandbiasvalues.Thecurrentimplementationsupportsonlyk=2.Maxoutactivationworksparticularlywellwithdropout
(Goodfellowet
al,2013).
Formoreinformation,referto
Regularization
.
TheRectifieristhespecialcaseofMaxoutwheretheoutputofonechannelisalways0.Itisdifficulttodeterminea“best”activationfunctiontouse;eachmayoutperformtheothersinseparatescenarios,butgridsearchmodelscanhelptocompareactivationfunctionsandotherparameters.Formoreinformation,referto
GridSearchforModelComparison
.ThedefaultactivationfunctionistheRectifier.Eachoftheseactivationfunctionscanbeoperatedwithdropoutregularization.Formoreinformation,referto
Regularization
.
Specifytheoneofthefollowingdistributionfunctionsfortheresponsevariableusingthedistributionargument:
AUTO
Bernoulli
Multinomial
Poisson
Gamma
Tweedie
Laplace
Quantile
Huber
Gaussian
Eachdistributionhasaprimaryassociationwithaparticularlossfunction,butsomedistributionsallowuserstospecifyanon-defaultlossfunctionfromthegroupoflossfunctionsspecifiedinTable2.Bernoulliandmultinomialareprimarilyassociatedwithcross-entropy(alsoknownaslog-loss),GaussianwithMeanSquaredError,LaplacewithAbsoluteloss(aspecialcaseofQuantilewithquantilealpha=0.5)andHuberwithHuberloss.ForPoisson,Gamma,andTweediedistributions,thelossfunctioncannotbechanged,solossmustbesettoAUTO.
Thesystemdefaultenforcesthetable’stypicaluserulebasedonwhetherregressionorclassificationisbeingperformed.Noteherethatt(j)ando(j)arethepredicted(alsoknownastarget)outputandactualoutput,respectively,fortrainingexamplej;further,letyrepresenttheoutputunitsandOtheoutputlayer.
Table2:Lossfunctions
Function Formula Typicaluse
MeanSquaredError L(W,B|j)=1lt(j)−o(j)l2
Regression
2 2
Absolute L(W,B|j)=lt(j)−o(j)l1 Regression
Huber L(W,B|j)=
2
2
Regression
lt(j)−o(j)l1−1
2
j1lt(j)−o(j)l2
forlt(j)−o(j)l1≤1,
CrossEntropy L(W,B|j)=−
艺ln(o(j))·t(j)+ln(1−o(j))·(1−t(j))
Classification
otherwise.
y y y y
y∈O
Topredictthe80-thpercentileofthepetallengthoftheIrisdatasetinR,usethefollowing:
ExampleinR
library(h2o)h2o.init(nthreads=-1)
train.hex<-h2o.importFile("https://h2o-public-test-/smalldata/iris/iris_wheader.csv")
splits<-h2o.splitFrame(train.hex,0.75,seed=1234)dl<-h2o.deeplearning(x=1:3,y="petal_len",
training_frame=splits[[1]],distribution="quantile",quantile_alpha=0.8)
h2o.predict(dl,splits[[2]])
1
2
3
4
5
6
7
8
Topredictthe80-thpercentileofthepetallengthoftheIrisdatasetinPython,usethefollowing:
ExampleinPython
importh2o
fromh2o.estimators.deeplearningimportH2ODeepLearningEstimator
h2o.init()
train=h2o.import_file("https://h2o-public-test-data./smalldata/iris/iris_wheader.csv")
splits=train.split_frame(ratios=[0.75],seed=1234)dl=H2ODeepLearningEstimator(distribution="quantile",
quantile_alpha=0.8)
dl.train(x=range(0,2),y="petal_len",training_frame=splits[0])
print(dl.predict(splits[1]))
1
2
3
4
5
6
7
8
ParallelDistributedNetworkTraining
TheprocessofminimizingthelossfunctionL(W,B|j)isaparallelizedversionofstochasticgradientdescent(SGD).AsummaryofstandardSGDprovidedbelow,withthegradient∇L(W,B|j)computedviabackpropagation
(LeCun
etal,1998).
Theconstantαisthelearningrate,whichcontrolsthestepsizesduringgradientdescent.
Standardstochasticgradientdescent
InitializeW,B
Iterateuntilconvergencecriterionreached:
Gettrainingexamplei
∂w
Updateallweightswjk∈W,biasesbjk∈Bwjk:=wjk−α∂L(W,B|j)
jk
∂b
bjk:=bjk−α∂L(W,B|j)
jk
Stochasticgradientdescentisfastandmemory-efficientbutnoteasilyparal-lelizablewithoutbecomingslow.WeutilizeHogwild!,therecentlydevelopedlock-freeparallelizationschemefrom
Niuetal,2011,
toaddressthisissue.
Hogwild!followsasharedmemorymodelwheremultiplecores(whereeachcorehandlesseparatesubsetsorallofthetrainingdata)areabletomakeindependentcontributionstothegradientupdates∇L(W,B|j)asynchronously.
Inamulti-nodesystem,thisparallelizationschemeworksontopofH2O’sdistributedsetupthatdistributesthetrainingdataacrossthecluster.EachnodeoperatesinparallelonitslocaldatauntilthefinalparametersW,Bareobtainedbyaveraging.
Paralleldistributedandmulti-threadedtrainingwithSGDinH2ODeepLearning
InitializeglobalmodelparametersW,B
DistributetrainingdataTacrossnodes(canbedisjointorreplicated)
Iterateuntilconvergencecriterionreached:
FornodesnwithtrainingsubsetTn,doinparallel:
ObtaincopyoftheglobalmodelparametersWn,Bn
SelectactivesubsetTna⊂Tn
(user-givennumberofsamplesperiteration)
PartitionTnaintoTnacbycoresnc
Forcoresnconnoden,doinparallel:
Gettrainingexamplei∈Tnac
∂w
Updateallweightswjk∈Wn,biasesbjk∈Bnwjk:=wjk−α∂L(W,B|j)
jk
∂b
bjk:=bjk−α∂L(W,B|j)
jk
SetW,B:=AvgnWn,AvgnBn
Optionallyscorethemodelon(potentiallysampled)train/validationscoringsets
Here,theweightsandbiasupdatesfollowtheasynchronousHogwild!proce-duretoincrementallyadjusteachnode’sparametersWn,Bnafterseeingtheexamplei.TheAvgnnotationrepresentsthefinalaveragingoftheselocalparametersacrossallnodestoobtaintheglobalmodelparametersandcompletetraining.
SpecifyingtheNumberofTrainingSamples
H2ODeepLearningisscalableandcantakeadvantageoflargeclustersofcomputenodes.Therearethreeoperatingmodes.Thedefaultbehaviorallowseverynodetotrainontheentire(replicated)datasetbutautomaticallyshuffling(and/orusingasubsetof)thetrainingexamplesforeachiterationlocally.
Fordatasetsthatdon’tfitintoeachnode’smemory(dependingontheamountofheapmemoryspecifiedbythe-XmxJavaoption),itmightnotbepossibletoreplicatethedata,soeachcomputenodecanbespecifiedtotrainonlywithlocaldata.Anexperimentalsinglenodemodeisavailableforcaseswherefinalconvergenceisslowduetothepresenceoftoomanynodes,butthishasnotbeennecessaryinourtesting.
TospecifytheglobalnumberoftrainingexamplessharedwiththedistributedSGDworkernodesbetweenmodelaveraging,usethe
trainsamplesperiterationparameter.Ifthespecifiedvalueis-1,allnodesprocessalltheirlocaltrainingdataoneachiteration.
Ifreplicatetrainingdataisenabled,whichisthedefaultsetting,thiswillresultintrainingNepochs(passesoverthedata)periterationonNnodes;otherwise,oneepochwillbetrainedperiteration.Specifying0alwaysresultsinoneepochperiterationregardlessofthenumberofcomputenodes.Ingeneral,thisparametersupportsanypositivenumber.Forlargedatasets,werecommendspecifyingafractionofthedataset.
Avalueof-2,whichisthedefaultvalue,enablesauto-tuningforthisparameterbasedonthecomputationalperformanceoftheprocessorsandthenetworkofthesystemandattemptstofindagoodbalancebetweencomputationandcommunication.Thisparametercanaffecttheconvergencerateduringtraining.
Forexample,ifthetrainingdatacontains10millionrows,andthenumberoftrainingsamplesperiterationisspecifiedas100,000whenrunningonfournodes,theneachnodewillprocess25,000examplesperiteration,anditwilltake40distributediterationstoprocessoneepoch.
Ifthevalueistoohigh,itmighttaketoolongbetweensynchronizationandmodelconvergencemaybeslow.Ifthevalueistoolow,networkcommunicationoverheadwilldominatetheruntimeandcomputationalperformancewillsuffer.
Regularization
H2O’sDeepLearningframeworksupportsregularizationtechniquestopreventoverfitting.£1(L1:Lasso)and£2(L2:Ridge)regularizationenforcethesamepenaltiesastheydowithothermodels:modifyingthelossfunctionsoastominimizeloss:
L1(W,B|j)=L(W,B|j)+λ1R1(W,B|j)+λ2R2(W,B|j).
For£1regularization,R1(W,B|j)isthesumofall£1normsfortheweightsandbiasesinthenetwork;£2regularizationviaR2(W,B|j)representsthesumofsquaresofalltheweightsandbiasesinthenetwork.Theconstantsλ1andλ2aregenerallyspecifiedasverysmall(forexample10−5).
ThesecondtypeofregularizationavailableforDeepLearningisamoderninnovationcalleddropout
(Hintonetal.,2012).
Dropoutconstrainstheonlineoptimizationsothatduringforwardpropagationforagiventrainingexample,eachneuroninthenetworksuppressesitsactivationwithprobabilityP,whichisusuallylessthan0.2forinputneuronsandupto0.5forhiddenneurons.
Therearetwoeffects:aswith£2regularization,thenetworkweightvaluesarescaledtoward0.Althoughtheysharethesameglobalparameters,eachtrainingexampletrainsadifferentmodel.Asaresult,dropoutallowsanexponentiallylargenumberofmodelstobeaveragedasanensembletohelppreventoverfittingandimprovegeneralization.
Ifthefeaturespaceislargeandnoisy,specifyinganinputdropoutusingtheinputdropoutratioparametercanbeespeciallyuseful.Notethatin-putdropoutcanbespecifiedindependentlyofthedropoutspecificationinthehiddenlayers(whichrequiresactivationtobeTanhWithDropout,MaxoutWithDropout,orRectifierWithDropout).Specifytheamountofhiddendropoutperhiddenlayerusingthehiddendropoutratiospa-rameter,whichissetto0.5bydefault.
AdvancedOptimization
H2Ofeaturesmanualandautomaticadvancedoptimizationmodes.Themanualmodefeaturesincludemomentumtrainingandlearningrateannealingandtheautomaticmodefeaturesanadaptivelearningrate.
MomentumTraining
Momentummodifiesback-propagationbyallowingprioriterationstoinfluencethecurrentversion.Inparticular,avelocityvector,v,isdefinedtomodifytheupdatesasfollows:
θrepresentstheparametersW,B
µrepresentsthemomentumcoefficient
αrepresentsthelearningrate
vt+1=µvt−α∇L(θt)θt+1=θt+vt+1
Usingthemomentumparametercanaidinavoidinglocalminimaandanyassociatedinstability
(Sutskeveretal,2014).
Toomuchmomentumcanleadtoinstability,sowerecommendincrementingthemomentumslowly.Thepa-rametersthatcontrolmomentumaremomentumstart,momentumramp,andmomentumstable.
Whenusingmomentumupdates,werecommendusingtheNesterovacceler-atedgradientmethod,whichusesthenesterovacceleratedgradientparameter.Thismethodmodifiestheupdatesasfollows:
vt+1=µvt−α∇L(θt+µvt)Wt+1=Wt+vt+1
RateAnnealing
Duringtraining,thechanceofoscillationor“optimumskipping”createstheneedforaslowerlearningrateasthemodelapproachesaminimum.Asopposedtospecifyingaconstantlearningrateα,learningrateannealinggraduallyreducesthelearningrateαtto“freeze”intolocalminimaintheoptimizationlandscape
(Zeiler,2012).
ForH2O,theannealingrate(rateannealing)istheinverseofthenumberoftrainingsamplesrequiredtodividethelearningrateinhalf(e.g.,10−6meansthatittakes106trainingsamplestohalvethelearningrate).
AdaptiveLearning
TheimplementedadaptivelearningratealgorithmADADELTA
(Zeiler,2012)
automaticallycombinesthebenefitsoflearningrateannealingandmomentumtrainingtoavoidslowconvergence.Tosimplifyhyperparametersearch,specifyonlyρandE.
Insomecases,amanuallycontrolled(non-adaptive)learningrateandmomen-tumspecificationscanleadtobetterresultsbutrequireahyperparametersearchofuptosevenparameters.Ifthemodelisbuiltonatopologywithmanylocalminimaorlongplateaus,aconstantlearningratemayproducesub-optimalresults.However,theadaptivelearningrategenerallyproducesthebestresultsduringourtesting,sothisoptionisthedefault.
Thefirstoftwohyperparametersforadaptivelearningisρ(rho).Itissimilartomomentumandisrelatedtothememoryofpriorweightupdates.Typicalvaluesarebetween0.9and0.999.Thesecondhyperparameter,E(epsilon),issimilartolearningrateannealingduringinitialtrainingandallowsfurtherprogressduringmomentumatlaterstages.Typicalvaluesarebetween10−10and10−4.
LoadingData
LoadingadatasetinRorPythonforusewithH2Oisslightlydifferentthantheusualmethodology.Insteadofusingdata.frameordata.tableinR,orpandas.DataFr
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 乡下家人课件
- 税收补充习题
- 小儿先天性心脏病
- 《粉末冶金》课件
- 中学规划设计
- 几百几十数乘以一位数质量测验口算题
- 2024应急预案编制导则
- 血液制品的种类成分和作用全血成分血血制品
- 重庆2022-2023高二上期学情调研化学试题卷
- 新媒体创新与运用
- 溶解度表大全-无机盐-有机物
- 梗阻性黄疸护理查房课件
- 供应商质量事故索赔单
- 木拱廊桥怀古韵(海西家园四下第二课)课件
- 文艺复兴史学习通课后章节答案期末考试题库2023年
- 2022年广西普通高中学业水平合格性考试英语学科参考试题
- 四年级人自然社会第10课 苏东坡与西子湖 优秀教学课件
- 医疗文书规范管理制度
- 国家开放大学《教育组织行为与管理案例》大作业参考答案
- 2018年上半年全市中小学部分学科德育优质课评选结果
- 【阅读训练】Unit 8 Read a story 阅读理解活动(五上)
评论
0/150
提交评论