版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
OptimizingAI/MLWorkflowsinPythonforGPUs
By:DanielHoward
dhoward@
,ConsultingServicesGroup,CISL&NCARDate:August25th,2022
Inthisnotebookweanalysetheoverallworkflowoftypicalmachinelearning/deeplearningprojects,emphasizinghowtoworktowardsoptimalperformanceonGPUs.WewillNOTcovertheoryoforhowtoimplementAIbasedprojects.Wewillcover:
BackgroundonmachinelearningresearchinEarthsciences
SettingupPythonvirtualcondaenvironmentsTheRAPIDSAIsoftwaresuite
GPUenabledTensorFlowandPyTorch
EnablingtuningandprofilingwithTensorFlowandPyTorch
ProfilingwithDLProf/TensorBoardandperformanceoptimizationsforNVIDIATensorCores
WorkshopEtiquette
Pleasemuteyourselfandturnoffvideoduringthesession.
Questionsmaybesubmittedinthechatandwillbeansweredwhenappropriate.Youmayalsoraiseyourhand,unmute,andaskquestionsduringQ&Aattheendofthepresentation.
Byparticipating,youareagreeingto
UCARʼsCodeofConduct
Recordings&othermaterialwillbearchived&sharedpublicly.
FeelfreetofollowupwiththeGPUworkshopteamviaSlackorsubmitsupportrequeststo
OfficeHours:Asynchronoussupportvia
Slack
orscheduleatimewithanorganizer
StartaJupyterHubSession
Headtothe
NCARJupyterHubportal
andstartaJupyterHubsessiononCasperPBSLoginNodeandopenthenotebookat15_OptimizeAIML/15_OptimizeAIML.ipynb.Besuretoclone(ifneeded)andupdate/pulltheNCARGPU_workshopdirectory.YouarewelcometouseaninteractiveGPUnodeforthefinalfewcellsofthisnotebook
#UsetheJupyterHubGitHubGUIontheleftpanelorthebelowshellcommands
gitclonegit@:NCAR/GPU_workshop.gitgitpull
NotebookSetup
TheGPU_TYPE=gp100nodesdonothavetensorcores!Thus,thegpuworkshopqueueisnotasusefulforthissession.Sayingasmuch,pleasesetGPU_TYPE=v100andusethegpudevorcasperqueuebothduringtheworkshopandforindependentwork.See
Casperqueuedocumentation
formoreinfo.
MachineLearningandDeepLearning?
MLandDLarestatisticalmodelsthataredesignedtolearnandpredictbehaviorfromalargeamountofinputtrainingdata.
TheBAMSarticle"
OutlookforExploitingArtificialIntelligenceintheEarthandEnvironmentalSciences
"byBoukabara,etalhighlightsadditionalapplicationsofAIintheEarthSciences.
OveriewofanEarthScienceAIWorkflow-RemoteSensing
MultiplestepsareneededtoenableAIforEarthScience.GPUsarecriticalinthemostexpensivestep,modelbuildingandtraining,sincetheyperformwellwithmatrixalgebra,foundationaltoMLmethods.
Image:ObjectDetectionandImageSegmentationwithDeepLearningonEarthObservationData:AReview
—PartII:ApplicationsbyHoeser,etal
WhyUseAIforEarthScience?
EarthScienceislargelybuiltonphysicsbasedtheoriesanddynamicalinteractionswiththebiosphere.Today,thesemodelshavescaledtoenormoussizes,consumingsignificantcomputationalresourcesanddatastorage.
4kmglobalrunsof
E3SM
(left)over100forecastyearsuses120Mcore-hoursand250GB/forecastday,or12PB.1kmECMWFruns(right),as
inthisarticle
andbyNilsWedi
keynoteatESMD2020
.
AIoffersanopportunitytoreducecomputationalresourcesrequired.FeelfreetoconsultAReviewofEarthArtificialIntelligenceforcurrent"GrandChallenges"
SurrogateModels
NovelwayscanbeexploredtouseEarthSciencedatatoreducerequiredcomputationalresources.Asurrogatemodelinmachinelearningisastatisticalmodeldesignedtomoreefficientlyapproximatetheoutputofaphysicsbasedmodel.
Image:IntroductiontoSurrogateModeling,ShuaiGuo.See"LearningNonlinearDynamicalSystemsfromDataUsingScientificMachineLearning"byMaulik,ANL.
NeuralOrdinaryDifferentialEquations
Forexample,astabilizedneuralODEcanbedesignedtoaccuratelysimulateshocksandchaoticdynamics.
SeepaperbyLinot,etal"StabilizedNeuralOrdinaryDifferentialEquationsforLong-TimeForecastingofDynamicalSystems".
PhysicsInformedNeuralNetworks(PINNs)
OtherapplicationstoconsiderarePhysicsInformedNeuralNetworks.PINNsattempttoembedknownphysicsrelationshipsintothedesignofamachinelearningmodel.ThismayincludedefiningtheNavier-StokesconservationlawsasconditionstominimizeinaMLmodel'slossfunction.
Image:
Wikipedia-PhysicsInformedNearalNetworks
ResourcesforEngagingandLearningAIinEarthSciences
Feelfreetoreachoutto
rchelp@
ifyouwantassistancerecreatingenvironmentsforanybelowcodeexamples.
OLCFAI4ScienceFluidFlowTutorial(
GitHub
)-Uses
MiniWeatherML
OpenHackathonsGPUBootcamp(
GitHub
)-
HPCAIExamples
forPINNs,CFD,andClimate
NSFAIInstituteforResearchonTrustworthyAIinWeather,Climate,andCoastalOceanography(
AI2ES.org
)-
EducationMaterials
and
2022Trust-a-thonGitHub
ArgonneALCF
2021Simulation,Data,andLearningWorkshopforAI(GitHub)-DetailedDLprofilingtutorialnotebooks&
video
2022IntroductiontoAI-drivenScienceonSupercomputers
(
GitHub
)
DataDrivenAtmosphericandWaterDynamicsBeuclerLab(U.ofLausanne-Switzerland)
GettingStartedwithMachineLearning
curatedresourcelist
NOAAWorkshoponLeveragingArtificialIntelligenceinEnvironmentalSciences
-4thWorkshopfreetoregister,virtualSept6-92022
NationalAcademies-2022workshopMachineLearningandArtificialIntelligencetoAdvanceEarth
SystemScience:OpportunitiesandChallenges
ClimateInformatics
community-
Conferences
and
Hackathons
Book-DeeplearningfortheEarthSciences--Acomprehensiveapproachtoremotesensing,climatescienceandgeosciences
climatechange.ai
-Globalinitiativetocatalyzeimpactfulworkattheintersectionofclimatechangeand
machinelearning.
HowtoManagePythonSoftwareforMLandDLModels
ThePythonecosystemalreadyprovidesmanyrobustpre-builtsoftwarepackagesandlibrarieswhicharecontinuallymaintained.LearningaboutandemployingthePythonecosystemwellcansimplifytheprocessofusingmachinelearningtools.
ThekernelGPU_Workshopalreadyhasmanyusefulpackagesplusothers(notably
Horovod
fordistributeddeep
learning)whichyouarewelcometoexploreonyourownbeyondthisworkshop.
RunthebelowcelltogetalistingofallpackagesinstalledintheGPU_Workshopcondaenvironment.
In[]:
!mambalist-p/glade/work/dhoward/conda/envs/GPU_Workshop/
SettingUpCondaEnvironments
Sinceensuringcompatibilityandreproducibilityisdifficultacrosspythonpackageenvironments,youareencouragedtomaintainyourownpersonalizedcondavirtualenvironments.Nonetheless,NCARprovidesabasesetofcommonlyusedPythonpackagesviathe
NCARPackageLibrary(NPL)
.NPLdoesincludethefasterpackagemanagementtoolmambawhichusesthesamecommandsyntaxasconda.
Ifyouprefertoinstallyourownandnotusemoduleloadconda,weencourage
Mambaforge
.Ingeneral,mambaissafetousecomparedtoconda.Toupdateallnon-pinnedpackagesinanenvironment,youcanusemambaupdate--all.
ChoosingCondaChannels
Tosourcepackages,thechannelconda-forgeisrecommendedandsetaspriorityonCasperbutotherchannelsyoumayconsiderarencar,nvidia,rapidsai,intel,pytorch,andanacondaamongothers.
Learntomanagechannels
here
usingyour$HOME/.condarcfile
Definepinnedpackages,iepackagesthatshouldstayataspecificversionoruseaspecificbuildtype,viathe
/path/to/env/conda-meta/pinnedfile
RAPIDSAIEnvironment
rapidsaichannelprovides
RAPIDS
,anopensource,NVIDIAmaintainedsuiteforend-to-enddatascienceandanalyticspipelinesonGPUs.FeelfreetoexploreRAPIDS
GettingStartedNotebooks
.
ScaleUpwithRAPIDStoolsandScaleOutwithDask/UCXorHorovodtools.
PythonPackagesandRAPIDSEquivalents
InstallRAPIDSenvironment
Settingflexiblechannelpriorityviacondaconfig--setchannel_priorityflexibleorin
~/.condarc,followinstalldirections
here
orbyrunning:
condacreate-nrapids-22.08-crapidsai-cnvidia-cconda-forge\rapids=22.08python=3.9cudatoolkit=11.5
InstallingCustomizedPythonPackages
Formorepersonalizedenvironments,anexampleprocesstosetupacondaenvironmentonCasperisbelow:
moduleloadconda
#Createsenvironmentin/glade/work/$USER/conda-envs/my-env-nameorafullyspecifiedpath
mambacreate-nmy-env-namemambaactivatemy-env-name
#ThePythonversioninstalledherewillautomaticallybepinned
#RecommendtonotusethelatestPythonversion(3.10+)givencompatibilityissues
mambainstallpython=3.9*
#EnsureswegetMKLoptimizedpackagestorunonCasper'sIntelCPUs
mambainstallnumpyscipypandasscikit-learnxarray"libblas=*=*mkl"
#EnsurescommonpackagesprovideMPIsupport(typicallydefaultstoOpenMPI).#Usefultopinpackagesin`/path/to/env/conda-meta/pinned`file.
mambainstallmpi4pyfftw=*=mpi*h5py=*=mpi*netcdf4=*=mpi*
Tohighlight,adding
<package-name>=<version>=<build-type>
isimportanttoensureyouinstallthemost
relevantandperformantversionforyourneeds.
Forexample,libblas=*=*mklguaranteesyougettheIntelMKLoptimizedversionsofpackagesthatutilizetheBLASlibrary.The*isawildcardforthelatestversionorotherbuildspecifications/hashes.
GPUEnabledPythonPackagesandTools
mambainstallcudatoolkitcudnncupynvtx
#MakesurepackagewheelIDincludes*cuda*toverifyGPUsupport
mambainstallpytorch=1.12.1=cuda112*
#Don’tusetensorflow-gpupackageaspackagesolverisinconsistentincondo-forgechannel#TFrecommendspipinstallforlatestofficialversionbutconda-forgeversionsalsoworkmambainstalltensorflow=2.9.1=cuda112*
#Enablesaddedprofilingcapabilities,onlyavailableviapipandPyPIorNVIDIA'spackageindex
pipinstallnvidia-pyindex
pipinstallnvidia-dlprofnvidia-dlprof-pytorch-nvtxpipinstalltensorboard_plugin_profile
MLlibraries
pytorch
and
tensorflow
requireadditionalstepstoensuretheyareinstalledwithGPUsupport.
Eachlibrary'sdocumentationlinkedabovehasmoreinfoaboutinstallationoptions.Asofthisworkshop,TensorFlowguaranteessupportuptoCUDAv11.2andPyTorchuptoCUDAv11.6sowespecifiedbuildswith=cuda112*.Runmambasearch<package>toviewallavailablepackagesgivenavailablechannels.
TensorFlowrecommendsinstallationviapipfortheirofficalversionsbutthecommunitydoestendtomaintainsimilarqualityreleasesviaconda-forge.Combiningpipwithconda/mambainstallsshouldbeavoidedifpossibleduetogreaterdifficultyinmaintainingenvironments.
HorovodforDistributedDeepLearning
moduleloadcuda/11.7gnu/10.1.0
mambainstallpipgxx_linux-64cmakencclexportHOROVOD_NCCL_HOME=$CONDA_PREFIXexportHOROVOD_CUDA_HOME=$CUDA_HOME
HOROVOD_GPU_OPERATIONS=NCCLpipinstallhorovod[tensorflow,keras,pytorch]horovodrun--check-build
Notethespecificationof
HOROVOD_GPU_OPERATIONS=NCCL
Fordistributeddeeplearningwith
Horovod
insteadofDask,seebelowor
Horovodinstallationdocumentation
forhowtousepiptoinstallHorovodfromPyPIonCasper.
touseNVIDIA'sCollectiveCommunicationLibrary.
AnMPIoptionisalsoselectableforCUDA-awareMPIlibraries.FindmoredetailsaboutHorovod'sGPUtensoroperationsand
GPUinstalloptionshere
.
AusefultutorialforHorovodwasgivenaspartofthe
ArgonneTrainingProgramonExtreme-ScaleComputing
(ATPESC)-
DataParallelDeepLearning
SharingPackageEnvironments
Onceyourenvironmentissetup,youcanshareorgiveaccesstoyourPythonvirtualenvironments,whichisvitallyimportanttoconsidertowardsenablingreproduciblescience.
Onasharedcluster,shareapathtoyourenvironment,seemambaenvlist.Makesureyouprovideread
accesspluswriteaccessifyouwantotherstobeabletomodifytheenvironment.Thenrunmambaactivate
/path/to/env
Othersmayinsteadcloneareadableenvironmentwithmambacreate--namecloned_env--clone
/path/to/original_env
Todistributeyourenvironment,runmambaenvexport>my-env.yml.Otherscantheninstallthisenvironmentwithmambaenvcreate-f/path/to/yaml-file
RunningaProfileronTensorFlowandPyTorchModels
BothtensorflowandpytorchhavebuiltintoolsandtensorboardGUIinterfaceforDLprofiling,whichtypicallyrunprofilesduringthetrainingportionofadeeplearningmodel.Baseguidesforusingthesebuilt-intoolsfollow:
PyTorch
ProfilerTutorial
BuildingaBenchmarkTutorial
PyTorchProfilerwithTensorBoardTutorial
TensorFlow
TensorFlowProfilerGuide
TensorBoardProfilerAnalysisGuide
TensorBoard-
CallbacksAPIClass
EasyWaystoImplementTensorFlowandPyTorchProfilers
PyTorch
record_shapes
model=models.resnet18().cuda()
inputs=torch.randn(5,3,224,224).cuda()
withprofile(activities=[
ProfilerActivity.CPU,ProfilerActivity.CUDA],record_shapes=True)asprof:
withrecord_function("model_inference"):model(inputs)
print(prof.key_averages().table(sort_by="cuda_time_total",row_limit=10))
Theshapes.
parameterensurestheprofilercollectsdataonthedatapipelinetypes,notablytensor
importtensorflowastffiler.experimental.start('/path/to/log/output/')#...trainingloop...
filer.experimental.stop()
TensorFlow-See
API
foradditionaloptions
UsingNVIDIAToolsforProfilingDLModels
ThetoolsnsysandncuaresimilarlyadaptabletorunagainstDLPythoncodes.The
dlproftool
was
previouslydevelopedtorunnsysonDLmodelsthenoutputaTensorBoardinterface.However,dlprofisno
longerbeingdevelopedinfavorofthepreviousbuiltinprofilingmethods.
PyTorch
DNNLayerannotationsaredisabledbydefault
Usewithfiler.emit_nvtx():Manuallywithtorch.cuda.nvtx.range_(push/pop)TensorRTbackendisalreadyannotated
TensorFlow
AnnotatedbydefaultwithNVTX,onlyin
containers
NVIDIANGCcontainers
ornvidia-pyindexTF1.X
exportTF_DISABLE_NVTX_RANGES=1todisableforproduction
ForTensorFlow2.X,mustmanuallyinlineNVTXrangesorusedlprof--mode=tensorflow2...
NVIDIAprovidestheirownguides,suchas
NVIDIADeepLearningPerformance
.Asmallexampleusingthe
nsys/ncutoolsanddlprofwithDLmodelscanbefoundhere.dlprofcanstillworkwellinNVIDIANGCContainersbutcompatibilityelsewhereisnotwellsupported.
CommonPerformanceConsiderations
I/O
UsedesignatedTF/PTdataloaders
TensorFlow-
BetterPerformancewiththetf.dataAPI
PyTorch-
Datasets&Dataloaders
Multithreading,eg
Multi-WorkerTrainingwithKeras
CPUto/fromGPUdatacopies
RewritecodewithTF/PTtensorsoruseCuPy,etcOverlapcopyandcomputation
Batchsize-IncreasebatchsizeuptoGPUissaturated
Precision(Background:SeeTheoMary's
MixedPrecisionArtithmetic
talkatLondonMathSociety)Considermixedprecision,
NVIDIAMixedPrecisionTrainingGuide
AutomaticMixedPrecision(AMP)settings
PTGuide
:scaler=torch.cuda.amp.GradScaler()
TFGuide
:policy=mixed_precision.Policy('mixed_float16');mixed_precision.set_global_policy(policy)
EnsureusageofTensorCoreswithMixedPrecision
TensorFlowprovidesacomprehensiveguide,OptimizeTensorFlowGPUperformancewiththeTensorFlowProfiler
PerformanceImprovementswithTensorCores
PerNVIDIA'srecommendationon
OptimizingforTensorCores
,settingparameterssuchasmatrixdimensionsizes,batchsizes,convolutionlayerchannelcounts,etc.asmultiplesof8isoptimalduetotensorcoreshapeconstraints.
Utilizingmixedprecisionandtensorcoreseffectivelycanleadto
theoreticalthroughputperformance
of9.70
TeraFLOPSforFP64arithmeticupto78.0TeraFLOPSforFP16arithmeticonA100GPUs.
ProfilerRunsofaGeomagneticFieldLSTMModel
ThisLongShort-TermMemory(LSTM)examplecomescourtesyofthe
TrustworthyAIforEnvironmental
ScienceTrust-a-thon.Youcanfollowtheoriginalexample,withdatapreparationandexplanationofhowt
he
LSTMmodelisimplementedinthe
sourcenotebook
.
Tobegin,let'sfirstdownloaddatatousefortrainingandvalidationofourLSTMmodel.
In[]:
%%capturecaptured_io
%%bash
#Downloaddataweneed.Ifadirectory"data/"alreadyexists,we'llassumethedataarealreadydownloaded.# Theabove"magic"statementsareusedtocaptureshellin/outandtorunthefollowingBashcommands.if[!-d"data"];then
wget--verbose/geomag/data/geomag/magnet/public.zip
wget--verbose/geomag/data/geomag/magnet/private.zipunzippublic.zip
unzipprivate.zipmkdir-vdata
mv-vpublicprivatedata/
mv-vpublic.zipprivate.zipdata/fi
#Uncommentfordebuggingifyouhavetroubledownloading:
#print(captured_io)
Profilethemagnet_lstm_tutorial.pyPythonScript
ThefullGeomagneticFieldLSTMmodeliscondensedintothePythonfile
magnet_lstm_tutorial.py
.Recallthatprofilingdoesnotrequireanalyzingthefullruntimeofmostmodels.InDL,m
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 回迁房买卖合同版怎么理解
- 标准摩托车转让协议合同范本
- 技术升级与改善服务合同
- 购销合同中的供应链金融服务风险控制
- 仓储代表合同协议案例
- 解除劳务合同协议
- 深入解析采购订单与采购合同
- 精酿啤酒代理权协议
- 保密协议与数据安全示例
- 电力供应安全承诺书
- 五年级数学(小数乘除法)计算题专项练习及答案汇编
- 初中济南版生物实验报告单
- 北京邮电大学《自然语言处理》2023-2024学年第一学期期末试卷
- 2024年广西安全员A证考试题及答案
- 道法全册知识点梳理-2024-2025学年统编版道德与法治七年级上册
- 《网络系统管理与维护》期末考试题库及答案
- 人教版数学六年级上册期末考试试卷
- 2024年时事政治试题库附答案(综合题)
- 新人教版八年级上册数学知识点归纳及常考题型
- 警务指挥与战术学总论学习通超星期末考试答案章节答案2024年
- 绘本小狐狸卖空气
评论
0/150
提交评论