深度学习综述讨论简介deepLearning_第1页
深度学习综述讨论简介deepLearning_第2页
深度学习综述讨论简介deepLearning_第3页
深度学习综述讨论简介deepLearning_第4页
深度学习综述讨论简介deepLearning_第5页
已阅读5页,还剩46页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

IntroductiontoDeepLearningHuihuiLiuMar.1,2023OutlineConceptionofdeeplearningDevelopmenthistoryDeeplearningframeworksDeepneuralnetworkarchitecturesConvolutionalneuralnetworks

IntroductionNetworkstructureTrainingtricksApplicationinAestheticImageEvaluationIdea

DeepLearning(Hinton,2006)Deeplearningisabranchofmachinelearningbasedonasetofalgorithmsthatattempttomodelhighlevelabstractionsindata.Theadvantageofdeeplearningistoextractingfeaturesautomatically

insteadofextractingfeaturesmanually.ComputervisionSpeechrecognitionNaturallanguageprocessingDevelopmentHistory194319401950196019701980199020002023MPmodel1958Single-layerPerceptron1969XORproblem1986BPalgorithm1989CNN-LeNet19951997SVMLSTMGradientdisappearanceproblem19912006DBNReLU202320232023DropoutAlexNetBNFasterR-CNNResidualNetGeoffreyHintonW.S.McCullochW.PittsRosenblattMarvinMinskyYannLeCunHintonHintonHintonLeCunBengioDeepLearningFrameworksDeepneuralnetworkarchitecturesDeepBeliefNetworks(DBN)RecurrentNeuralNetworks(RNN)GenerativeAdversarialNetworks(GANs)ConvolutionalNeuralNetworks(CNN)LongShort-TermMemory(LSTM)DBN(DeepBeliefNetwork,2006)Hiddenunitsandvisibleunits

Eachunitisbinary(0or1).

Everyvisibleunitconnectstoallthehiddenunits.

Everyhiddenunitconnectstoallthevisibleunits.

Therearenoconnectionsbetweenv-vandh-h.HintonGE.Deepbeliefnetworks[J].Scholarpedia,2023,4(6):5947.Fig1.RBM(restrictedBoltzmannmachine)structure.Fig2.DBN(deepbeliefnetwork)structure.Idea?ComposedofmultiplelayersofRBM.Howtowetraintheseadditionallayers?

UnsupervisedgreedyapproachRNN(RecurrentNeuralNetwork,2023)What?RNNaimstoprocessthesequencedata.RNNwillrememberthepreviousinformationandapplyittothecalculationofthecurrentoutput.Thatis,thenodesofthehiddenlayerareconnected,andtheinputofthehiddenlayerincludesnotonlytheoutputoftheinputlayerbutalsotheoutputofthehiddenlayer.MarhonSA,CameronCJF,KremerSC.RecurrentNeuralNetworks[M]//HandbookonNeuralInformationProcessing.SpringerBerlinHeidelberg,2023:29-65.Applications?MachineTranslationGeneratingImageDescriptionsSpeechRecognitionHowtotrain?

BPTT(Backpropagationthroughtime)GANs(GenerativeAdversarialNetworks,2023)GANsInspiredbyzero-sumGameinGameTheory,whichconsistsofapairofnetworks-ageneratornetworkandadiscriminatornetwork.Thegeneratornetworkgeneratesasamplefromtherandomvector,thediscriminatornetworkdiscriminateswhetheragivensampleisnaturalorcounterfeit.Bothnetworkstraintogethertoimprovetheirperformanceuntiltheyreachapointwherecounterfeitandrealsamplescannotbedistinguished.GoodfellowI,Pouget-AbadieJ,MirzaM,etal.Generativeadversarialnets[C]//Advancesinneuralinformationprocessingsystems.2023:2672-2680.Applacations:ImageeditingImagetoimagetranslationGeneratetextGenerateimagesbasedontextCombinedwithreinforcementlearningAndmore…LongShort-TermMemory(LSTM,1997)NeuralNetworksNeuronNeuralnetworkConvolutionalNeuralNetworks(CNN)Convolutionneuralnetworkisakindoffeedforwardneuralnetwork,whichhasthecharacteristicsofsimplestructure,lesstrainingparametersandstrongadaptability.CNN

avoids

thecomplexpre-processingofimage(etc.extracttheartificialfeatures),wecandirectlyinput

theoriginalimage.

Basiccomponents:ConvolutionLayers,PoolingLayers,FullyconnectedLayersConvolutionlayerTheconvolutionkerneltranslates

ona2-dimensionalplane,andeachelementoftheconvolutionkernelismultiplied

bytheelementatthecorrespondingpositionoftheconvolutionimageandthensumalltheproduct.Bymovingtheconvolutionkernel,wehaveanewimage,whichconsistsofthesumoftheproductoftheconvolutionkernelateachposition.localreceptivefieldweightsharingReduced

thenumberofparametersPoolinglayerPoolinglayeraimstocompresstheinputfeaturemap,whichcanreducethenumberofparameters

intrainingprocessandthedegreeof

over-fitting

ofthemodel.Max-pooling:Selectingthemaximumvalueinthepoolingwindow.Mean-pooling:Calculatingtheaverageofallvaluesinthepoolingwindow.FullyconnectedlayerandSoftmaxlayerEachnodeofthefullyconnectedlayerisconnectedtoallthenodesofthelastlayer,whichisusedtocombinethefeaturesextractedfromthefrontlayers.Fig1.Fullyconnectedlayer.Fig2.CompleteCNNstructure.Fig3.Softmaxlayer.TrainingandTestingForwardpropagation-Takingasample(X,Yp)fromthesamplesetandputtheXintothenetwork;-CalculatingthecorrespondingactualoutputOp.Backpropagation-CalculatingthedifferencebetweentheactualoutputOpandthecorrespondingidealoutputYp;-Adjustingtheweightmatrixbyminimizingtheerror.Trainingstage:Testingstage:Puttingdifferentimagesandlabelsintothetrainedconvolutionneuralnetworkandcomparingtheoutputandtheactualvalueofthesample.Beforethetrainingstage,weshouldusesomedifferentsmallrandomnumberstoinitializeweights.CNNStructureEvolutionHintonBPNeocognitionLeCunLeNetAlexNetHistoricalbreakthroughReLUDropoutGPU+BigDataVGG16VGG19MSRA-NetDeepernetworkNINGoogLeNetInceptionV3InceptionV4R-CNNSPP-NetFastR-CNNFasterR-CNNInceptionV2(BN)FCNFCN+CRFSTNetCNN+RNN/LSTMResNetEnhancedthefunctionalityoftheconvolutionmoduleClassificationtaskDetectiontaskAdd

newfunctionalunitintegration19801998198920232023ImageNetILSVRC(ImageNetLargeScaleVisualRecognitionChallenge)20232023202320232023,2023202320232023BN(BatchNormalization)RPNLeNet(LeCun,1998)LeNet

isaconvolutionalneuralnetworkdesignedbyYannLeCunforhandwrittennumeralrecognitionin1998.Itisoneofthemostrepresentativeexperimentalsystemsinearlyconvolutionalneuralnetworks.LeNetincludestheconvolutionlayer,poolinglayer

andfull-connectedlayer,whicharethebasiccomponentsofmodernCNNnetwork.LeNetisconsideredtobethebeginningoftheCNN.networkstructure:3convolutionlayers+2poolinglayers+1fullyconnectedlayer+1outputlayerHaykinS,KoskoB.GradientBasedLearningAppliedtoDocumentRecognition[D].Wiley-IEEEPress,2023.AlexNet(Alex,2023)Networkstructure:5convolutionlayers+3fullyconnectedlayersThenonlinearactivationfunction:ReLU(Rectifiedlinearunit)Methodstopreventoverfitting:Dropout,DataAugmentationBigDataTraining:ImageNet--imagedatabaseofmillionordersofmagnitudeOthers:GPU,LRN(localresponsenormalization)layerKrizhevskyA,SutskeverI,HintonGE.ImageNetclassificationwithdeepconvolutionalneuralnetworks[C]//InternationalConferenceonNeuralInformationProcessingSystems.CurranAssociatesInc.2023:1097-1105.Overfeat(2023)SermanetP,EigenD,ZhangX,etal.OverFeat:IntegratedRecognition,LocalizationandDetectionusingConvolutionalNetworks[J].EprintArxiv,2023.VGG-Net(OxfordUniversity,2023)input:afixed-size224*224RGBimagefilters:averysmallreceptivefield--3*3,withstride1Max-pooling:2*2pixelwindow,withstride2Fig1.ArchitectureofVGG16Table1:ConvNetconfigurations(shownincolumns).Theconvolutionallayerparametersaredenotedas“conv<receptivefieldsize>-<numberofchannels>〞SimonyanK,ZissermanA.VeryDeepConvolutionalNetworksforLarge-ScaleImageRecognition[J].ComputerScience,2023.Why3*3filters?Stackedconv.layershavealargereceptivefieldMorenon-linearityLessparameterstolearnNetwork-in-Network(NIN,ShuichengYan,2023)Networkstructure:4Mlpconvlayers+GlobalaveragepoolinglayerFig1.linearconvolution

MLPconvolutionFig2.fullyconnectedlayer

globalaveragepoolinglayerMinLinetal,NetworkinNetwork,Arxiv2023.Fig3.NINstructureLinearcombinationofmultiplefeaturemaps.Informationintegrationofcross-channel.ReducedtheparametersReducedthenetworkAvoidedover-fittingGoogLeNet(InceptionV1,2023)Fig1.Inceptionmodule,naïveversionProposedinceptionarchitectureandoptimizeditCanceled

thefullyconnnectedlayerUsedauxiliaryclassifierstoacceleratenetworkconvergenceSzegedyC,LiuW,JiaY,etal.Goingdeeperwithconvolutions[C]//ProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition.2023:1-9.Fig2.InceptionmodulewithdimensionreductionsFig3.GoogLeNetnetwork(22layers)InceptionV2(2023)IoffeS,SzegedyC.Batchnormalization:Acceleratingdeepnetworktrainingbyreducinginternalcovariateshift[J].arXivpreprintarXiv:1502.03167,2023.InceptionV3(2023)SzegedyC,VanhouckeV,IoffeS,etal.Rethinkingtheinceptionarchitectureforcomputervision[C]//ProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition.2023:2818-2826.ResNet(KaiwenHe,2023)Asimpleandcleanframeworkoftraining“very〞deepnetworks.State-of-the-artperformanceforImageclassificationObjectdetectionSemanticSegmentationandmoreHeK,ZhangX,RenS,etal.DeepResidualLearningforImageRecognition[J].2023:770-778.Fig1.ShortcutconnectionsFig2.ResNetstructure(152layers)FractalNetInceptionV4(2023)SzegedyC,IoffeS,VanhouckeV,etal.Inception-v4,inception-resnetandtheimpactofresidualconnectionsonlearning[J].arXivpreprintarXiv:1602.07261,2023.Inception-ResNetHeK,ZhangX,RenS,etal.DeepResidualLearningforImageRecognition[J].2023:770-778.ComparisonSqueezeNet

SqueezeNet:AlexNet-levelaccuracywith50xfewerparametersand<0.5MBmodelsizeXceptionR-CNN(2023)Regionproposals:SelectiveSearch

Resizetheregionproposal:Warpallregionproposalstotherequiredsize(227*227,

AlexNetInput)

ComputeCNNfeature:Extracta4096-dimensionalfeaturevectorfromeachregionproposalusingAlexNet.

Classify:TrainingalinearSVMclassifierforeachclass.[1]UijlingsJRR,SandeKEAVD,GeversT,etal.SelectiveSearchforObjectRecognition[J].InternationalJournalofComputerVision,2023,104(2):154-171.[2]GirshickR,DonahueJ,DarrellT,etal.RichFeatureHierarchiesforAccurateObjectDetectionandSemanticSegmentation[J].2023:580-587.R-CNN:Regionproposals+CNNSPP-Net(Spatialpyramidpoolingnetwork,2023)HeK,ZhangX,RenS,etal.SpatialPyramidPoolinginDeepConvolutionalNetworksforVisualRecognition[J].IEEETransactionsonPatternAnalysis&MachineIntelligence,2023,37(9):1904-1916.Fig2.Anetworkstructurewithaspatialpyramidpoolinglayer.Fig1.Top:AconventionalCNN.Bottom:Spatialpyramidpoolingnetworkstructure.Advantages:Getthefeaturemapoftheentireimagetosavemuchtime.Outputafixedlengthfeaturevectorwithinputsofarbitrarysizes.Extractthefeatureofdifferentscale,andcanexpressmorespatialinformation.TheSPP-Netmethodcomputesaconvolutionalfeaturemapfortheentireinputimageandthenclassifieseachobjectproposalusingafeaturevectorextractedfromthesharedfeaturemap.FastR-CNN(2023)AFastR-CNNnetworktakesanentireimageandasetofobjectproposalsasinput.Thenetworkprocessestheentireimagewithseveralconvolutional(conv)andmaxpoolinglayerstoproduceaconvfeaturemap.Foreachobjectproposal,aregionofinterest(RoI)poolinglayerextractsafixed-lengthfeaturevectorfromthefeaturemap.Eachfeaturevectorisfedintoasequenceoffullyconnectedlayersthatfinallybranchintotwosiblingoutputlayers.

GirshickR.Fastr-cnn[C]//ProceedingsoftheIEEEInternationalConferenceonComputerVision.2023:1440-1448.FasterR-CNN(2023)FasterR-CNN=RPN+FastR-CNN

ARegionProposalNetwork(RPN)takesanimage(ofanysize)asinputandoutputsasetofrectangularobjectproposals,eachwithanobjectnessscore.

RenS,HeK,GirshickR,etal.Fasterr-cnn:Towardsreal-timeobjectdetectionwithregionproposalnetworks[C]//Advancesinneuralinformationprocessingsystems.2023:91-99.Figure1.FasterR-CNNisasingle,unifiednetworkforobjectdetection.Figure2.RegionProposalNetwork(RPN).TrainingtricksDataAugmentationDropoutReLUBatchNormalizationDataAugmentation-rotation-flip-zoom-shift-scale-contrast-noisedisturbance-color-...Dropout(2023)Dropoutconsistsofsettingtozerotheoutputofeachhiddenneuronwithprobabilityp.Theneuronswhichare“droppedout〞inthiswaydonotcontributetotheforwardbackpropagationanddonotparticipateinbackpropagation.ReLU(RectifiedLinearUnit)

advantagesrectifiedSimplifiedcalculationAvoidedgradientdisappearedBatchNormalization(2023)Intheinputofeachlayerofthenetwork,insertanormalizedlayer.Foralayerwithd-dimensionalinputx=(x(1)...x(d)),wewillnormalizeeachdimension:IoffeS,SzegedyC.Batchnormalization:Acceleratingdeepnetworktrainingbyreducinginternalcovariateshift[J].arXivpreprintarXiv:1502.03167,2023.Internal

Covariate

Shift

ApplicationinAestheticImageEvaluationDongZ,ShenX,LiH,etal.PhotoQualityAssessmentwithDCNNthatUnderstandsImageWell[M]//MultiMediaModeling.SpringerInternationalPublishing,2023:524-535.LuX,LinZ,JinH,etal.Ratingimageaestheticsusingdeeplearning[J].IEEETransactionsonMultimedia,2023,17(11):2021-2034.WangW,ZhaoM,WangL,etal.Amulti-scenedeeplearningmodelforimageaestheticevaluation[J].SignalProcessingImageCommunication,2023,47:511-518.PhotoQualityAssessmentwithDCNNthatUnderstandsImageWellDCNN_Aesthtrainedwellnetworkatwo-classSVMclassifierDCNN_Aesth_SPoriginalimagessegmentedimagesspatialpyramidImageNetCUHKAVADongZ,ShenX,LiH,etal.PhotoQualityAssessmentwithDCNNthatUnderstandsImageWell[M]//MultiMediaModeling.SpringerInternationalPublishing,2023:524-535.RatingimageaestheticsusingdeeplearningSupportheterogeneousinputs,i.e.,globaland

localviews.AllparametersinDCNNarejointlytrained.Fig1.GlobalviewsandlocalviewsofanimageFig3.DCNNarchitectureFig2.SCNNarchitecture

SCNNDCNN

Enablesthenetworktojudgeimageaestheticswhilesimultaneouslyconsideringboththeglobalandlocalviewsofanimage.LuX,LinZ,JinH,etal.Ratingimageaestheticsusingdeeplearning[J].IEEETransactionsonMultimedia,2023,17(11):2021-2034.Amulti-scenedeeplearningmodelforimageaestheticevaluationDesignasceneconvolutionallayerconsistofmulti-groupdescriptorsinthenetwork.Designapre-trainingproceduretoinitializeourmodel.Fig1.Thearchitectureofthemulti-scenedeeplearningmodel(MSDLM).Fig2.TheoverviewofproposedMSDLM.ArchitectureofMSDLM:4

convolutionallayers+1sceneconvolutionallayer+3fullyconnectedlayersWangW,ZhaoM,WangL,etal.Amulti-scenedeeplearningmodelforimageaestheticevaluation[J].SignalProcessingImageCommunication,2023,47:511-518.Example-Loadthedatasetdefload_dataset():url=':///data/mnist/mnist.pkl.gz'filename='E:/DeepLearning_Library/mnist.pkl.gz'ifnotos.path.exists(filename):print("DownloadingMNISTdataset...")urlretrieve(url,filename)withgzip.open(filename,'rb')asf:data=pickle.load(f)X_train,y_train=data[0]X_val,y_val=data[1]X_test,y_test=data[2]X_train=X_train.reshape((-1,1,28,28))X_val=X_val.reshape((-1,1,28,28))X_test=X_test.reshape((-1,1,28,28))y_train=y_train.astype(np.uint8)y_val=y_val.astype(np.uint8)y_test=y_test.astype(np.uint8)returnX_train,y_train,X_val,y_val,X_test,y_test

X_train,y_train,X_val,y_val,X_test,y_test=load_dataset()plt.imshow(X_train[0][0],cmap=cm.binary)Example–Modelnet1=NeuralNet(layers=[('input',layers.InputLayer),

('conv2d1',

layers.Conv2DLayer),

('maxpool1',

layers.MaxPool2DLayer),

('conv2d2',layers.Conv2DLayer),

('maxpool2',layers.MaxPool2DLayer),

('dropout1',layers.DropoutLayer),

('dense',layers.DenseLayer),

('dropout2',layers.DropoutLayer),

('output',layers.DenseLayer),

],

#inputlayerinput_shape=(None,1,28,28),#layerconv2d1conv2d1_num_filters=32,conv2d1_filter_size=(5,5),

conv2d1_nonlinearity=lasagne.nonlinearities.rectify,conv2d1_W=lasagne.init.GlorotUniform(),

#layermaxpool1maxpool1_pool_size=(2,2),#layerconv2d2conv2d2_num_filters=32,conv2d2_filter_size=(5,5),conv2d2_nonlinearity=lasagne.nonlinearities.rectify,

#layermaxpool2maxpool2_pool_size=(2,2),

#dropout1dropout1_p=0.5,

#densei.e.full-connectedlayerdense_num_units=256,dense_nonlinearity=lasagne.nonlinearities.rectify,

#dropout2dropout2_p=0.5,

#outputoutput_nonlinearity=lasagne.nonlinearities.softmax,output_num_units=10,

#optimizationmethodparamsupdate=nesterov_momentum,update_learning_rate=0.01,update_momentum=

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论