版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
June2023
NVIDIADGXSuperPOD:NextGenerationScalableInfrastructureforAILeadership
ReferenceArchitecture
FeaturingNVIDIADGXH100Systems
RA-11333-001v6
BCM3.23.05
NVIDIADGXSuperPOD:NextGenerationScalableInfrastructureforAILeadershipRA-11333-001v6|i
Abstract
TheNVIDIADGXSuperPOD™withNVIDIADGX™H100systemsisthenextgenerationofdatacenterarchitectureforartificialintelligence(AI).DesignedtoprovidethelevelsofcomputingperformancerequiredtosolveadvancedcomputationalchallengesinAI,highperformancecomputing(HPC),andhybridapplicationswherethetwoarecombinedtoimprovepredictionperformanceandtime-to-solution.TheDGXSuperPODisbasedupontheinfrastructurebuiltatNVIDIAforinternalresearchpurposesandisdesignedtosolvethemostchallengingcomputationalproblemsoftoday.SystemsbasedontheDGXSuperPODarchitecturehavebeendeployedatcustomerdatacentersandcloud-serviceprovidersaroundtheworld.
Toachievethemostscalability,DGXSuperPODispoweredbyseveralkeyNVIDIAtechnologies,including:
>NVIDIADGXH100system—toprovidethemostpowerfulcomputationalbuilding
blockforAIandHPC.
>NVIDIANDR(400Gbps)InfiniBand—bringingthehighestperformance,lowest
latency,andmostscalablenetworkinterconnect.
>NVIDIANVLink—networkingtechnologiesthatconnectGPUsattheNVLinklayerto
provideunprecedentedperformanceformostdemandingcommunicationpatterns.
TheDGXSuperPODarchitectureismanagedbyNVIDIAsolutionsincludingNVIDIABaseCommand™,NVIDIAAIEnterprise,CUDA,andMagnumIO™.Thesetechnologieshelpkeepthesystemrunningatthehighestlevelsofavailability,performance,andwithNVIDIAEnterpriseSupport(NVEX),keepsallcomponentsandapplicationsrunningsmoothly.
Thisreferencearchitecture(RA)discussesthecomponentsthatdefinethescalableandmodulararchitectureoftheDGXSuperPOD.Thesystemisbuiltuponbuildingblocksofscalableunits(SU),eachcontaining32DGXH100systems,whichprovidesforrapiddeploymentofsystemsofmultiplesizes.ThisRAincludesdetailsregardingtheSUdesignandspecificsofInfiniBand,NVLinknetwork,Ethernetfabrictopologies,storagesystemspecifications,recommendedracklayouts,andwiringguides.
NVIDIADGXSuperPOD:NextGenerationScalableInfrastructureforAILeadershipRA-11333-001v6|ii
Contents
KeyComponentsoftheDGXSuperPOD 1
NVIDIADGXH100System 1
NVIDIAInfiniBandTechnology 2
RuntimeandSystemManagement 2
Components 3
DesignRequirements 4
SystemDesign 4
InfiniBandFabrics 4
ComputeFabric 4
StorageFabric 4
EthernetFabrics 5
In-BandManagementNetwork 5
Out-of-BandManagementNetwork 5
StorageRequirements 5
High-PerformanceStorage 5
UserStorage 5
DGXSuperPODArchitecture 6
NetworkFabrics 8
Compute—InfiniBandFabric 9
Storage—InfiniBandFabric 10
In-BandManagementNetwork 11
Out-of-BandManagementNetwork 12
StorageArchitecture 13
DGXSuperPODSoftware 16
NVIDIABaseCommand 16
NVIDIANGC 17
NVIDIAAIEnterprise 17
Summary 18
AppendixA.MajorComponents iii
NVIDIADGXSuperPOD:NextGenerationScalableInfrastructureforAILeadershipRA-11333-001v6|1
KeyComponentsoftheDGXSuperPOD
TheDGXSuperPODarchitecturehasbeendesignedtomaximizeperformanceforstate-of-the-artmodeltraining,scaletoexaflopsofperformance,providethehighestperformancetostorageandsupportallcustomersintheenterprise,highereducation,research,andthepublicsector.ItisadigitaltwinofthemainNVIDIAresearchanddevelopmentsystem,meaningthecompany'ssoftware,applications,andsupportstructurearefirsttestedandvettedonthesamearchitecture.UsingSUs,systemdeploymenttimesarereducedfrommonthstoweeks.LeveragingtheDGXSuperPODdesignsreducestime-to-solutionandtime-to-marketofnextgenerationmodelsandapplications.
TheDGXSuperPODistheintegrationofkeyNVIDIAcomponents,aswellasstoragesolutionsfrompartnerscertifiedtoworkinaDGXSuperPODenvironment.
NVIDIADGXH100System
TheNVIDIADGXH100system
(Figure1
)isanAIpowerhousethatenablesenterprisestoexpandthefrontiersofbusinessinnovationandoptimization.TheDGXH100system,whichisthefourth-generationNVIDIADGXsystem,deliversAIexcellenceinaneightGPUconfiguration.TheNVIDIAHopperGPUarchitectureprovideslatesttechnologiessuchasthetransformerenginesandfourth-generationNVLinktechnologythatbringsmonthsofcomputationaleffortdowntodaysandhours,onsomeofthelargestAI/MLworkloads.
Figure1.DGXH100system
NVIDIADGXSuperPOD:NextGenerationScalableInfrastructureforAILeadershipRA-11333-001v6|2
SomeofthekeyhighlightsoftheDGXH100systemovertheDGXA100systeminclude:>Upto9Xmoreperformancewith32petaFLOPSatFP8precision.
>Dual56-core4thGenIntel®Xeon®capableprocessorswithPCIe5.0supportandDDR5
memory.
>2Xfasternetworkingandstorage@400GbpsInfiniBand/EthernetwithNVIDIA
ConnectX®-7smartnetworkinterfacecards(SmartNICs).
>1.5XhigherbandwidthperGPU@900GBpswithfourthgenerationofNVIDIA
NVLink.
>640GBofaggregatedHBM3memorywith24TB/sofaggregatememorybandwidth,
1.5XhigherthanDGXA100system.
NVIDIAInfiniBandTechnology
InfiniBandisahigh-performance,lowlatency,RDMAcapablenetworkingtechnology,provenover20yearsintheharshestcomputeenvironmentstoprovidethebestinter-nodenetworkperformance.DrivenbytheInfiniBandTradeAssociation(IBTA),itcontinuestoevolveandleaddatacenternetworkperformance.
ThelatestgenerationInfiniBand,NDR,hasapeakspeedof400Gbpsperdirection.ItisbackwardscompatiblewiththepreviousgenerationsofInfiniBandspecifications.InfiniBandismorethanjustpeakperformance.InfiniBandprovidesadditionalfeaturestooptimizeperformanceincludingadaptiverouting(AR),collectivecommunicationwithSHARPTM,dynamicnetworkhealingwithSHIELDTM,andsupportsseveralnetworktopologiesincludingfat-tree,Dragonfly,andmulti-dimensionalTorustobuildthelargestfabricsandcomputesystemspossible.
RuntimeandSystemManagement
TheDGXSuperPODRArepresentsthebestpracticesforbuildinghigh-performancedatacenters.Thereisflexibilityinhowthesesystemscanbepresentedtocustomersandusers.NVIDIABaseCommandsoftwareisusedtomanageallDGXSuperPODdeployments.
DGXSuperPODcanbedeployedon-premises,meaningthecustomerownsandmanagesthehardwareasatraditionalsystem.Thiscanbewithinacustomer’sdatacenterorco-locatedatacommercialdatacenter,butthecustomerownsthehardware.Foron-premisessolutions,thecustomerhastheoptiontooperatethesystemwithasecure,cloud-nativeinterfacethroughNVIDIANGC™.
NVIDIADGXSuperPOD:NextGenerationScalableInfrastructureforAILeadershipRA-11333-001v6|3
Components
ThecomponentsoftheDGXSuperPODaredescribedin
Table1.
Table1.FourSU,127-nodeDGXSuperPODcomponents
Component
Technology
Description
Computenodes
127×NVIDIADGXH100systemwitheight80GBH100
GPUs
Fourthgenerationoftheworld’spremierpurpose-builtAIsystemsfeaturingNVIDIAH100TensorCoreGPUs,4thgenerationNVIDIANVLink®and3rdgenerationNVIDIANVSwitch™technologies.
Computefabric
NVIDIAQuantumQM9700NDR400GbpsInfiniBand
Rail-optimized,fullfat-treenetworkwitheightNDR400connectionspersystem
Storagefabric
NVIDIAQuantumQM9700NDR400Gb/sInfiniBand
Thefabricisoptimizedtomatchpeakperformanceoftheconfiguredstoragearray
Compute/storagefabricmanagement
NVIDIAUnifiedFabricManager,EnterpriseEdition
NVIDIAUFMcombinesenhanced,real-timenetworktelemetrywithAIpoweredcyberintelligenceandanalyticstomanagescale-outInfiniBanddatacenters
In-bandmanagement
network
NVIDIASN4600Cswitch
64port100GbpsEthernetswitchprovidinghighportdensitywithhighperformance
Out-of-band(OOB)managementnetwork
NVIDIASN2201switch
48port1GbpsEthernetswitchleveragingcopperportstominimizecomplexity
DGXSuperPODsoftwarestack
NVIDIABaseCommand
Manager
ClustermanagementforDGXSuperPOD
NVIDIAAIEnterprise
Best-in-classdevelopmenttoolsandframeworksfortheAIpractitionerandreliablemanagementandorchestrationforITprofessionals
MagnumIO
TheNVIDIAMAGNUMIOenablesincreasedperformanceforAIandHPC
NVIDIANGC
TheNGCcatalogprovidesacollectionofGPU-optimizedcontainersforAIandHPC
Userenvironment
Slurm
Slurmisaclassicworkloadmanagerusedtomanagecomplexworkloadsinamulti-node,batch-style,computeenvironment
NVIDIADGXSuperPOD:NextGenerationScalableInfrastructureforAILeadershipRA-11333-001v6|4
DesignRequirements
TheDGXSuperPODisdesignedtominimizesystembottlenecksthroughoutthetightlycoupledconfigurationtoprovidethebestperformanceandapplicationscalability.Eachsubsystemhasbeenthoughtfullydesignedtomeetthisgoal.Inaddition,theoveralldesignremainsflexiblesothatdatacenterrequirementscanbetailoredtobetterintegrateintoexistingdatacenters.
SystemDesign
TheDGXSuperPODisoptimizedforacustomers’particularworkloadofmulti-nodeAI,HPC,andHybridapplications:
>AmodulararchitecturebasedonSUsof32DGXH100systemseach.
>AfullytestedsystemscalestofourSUs,butlargerdeploymentscanbebuiltbased
oncustomerrequirements.
>Rackdesigncansupportone,two,orfourDGXH100systemsperrack,sothatthe
racklayoutcanbemodifiedtoaccommodatedifferentdatacenterrequirements.
>StoragepartnerequipmentthathasbeencertifiedtoworkinDGXSuperPOD
environments.
>Fullsystemsupport(includingcompute,storage,network,andsoftware)isprovided
byNVIDIAEnterpriseSupportNVES).
InfiniBandFabrics
ComputeFabric
>TheInfiniBandcomputefabricisrail-optimizedtothetoplayerofthefabric.>TheInfiniBandfabricisabalanced,full-fattree.
>ManagedNDRswitchesareusedthroughoutthedesigntoprovidebetter
managementofthefabric.
>ThefabricisdesignedtosupportthelatestSHaRPv3features.
StorageFabric
Thestoragefabricprovideshighbandwidthtosharedstorage.Italsohasthesecharacteristics:
>Itisindependentofthecomputefabrictomaximizeperformanceofbothstorage
andapplicationperformance.
>Providessingle-nodebandwidthofatleast40GBpstoeachDGXH100system.>StorageisprovidedoverInfiniBandandleveragesRDMAtoprovidemaximum
performanceandminimizeCPUoverhead.
>Itisflexibleandcanscaledtomeetspecificcapacityandbandwidthrequirements.>User-accessiblemanagementnodesprovideaccesstosharedstorage.
NVIDIADGXSuperPOD:NextGenerationScalableInfrastructureforAILeadershipRA-11333-001v6|5
EthernetFabrics
MultipleEthernetfabricsareusedtosupportmanagementcommunications,Ethernet-basedstoragetargets,Internetaccess,andothertraditionalTCP/IPbasedservices.
In-BandManagementNetwork
>Thein-bandmanagementnetworkfabricisEthernet-basedandisusedfornode
provisioning,datamovement,Internetaccess,andotherservicesthatmustbeaccessiblebytheusers.
>Thein-bandmanagementnetworkconnectionsforcomputeandmanagement
serversoperateat100Gbpsandarebondedforresiliency.
Out-of-BandManagementNetwork
TheOOBmanagementnetworkconnectsallthebasemanagementcontroller(BMC)ports,aswellasotherdevicesthatshouldbephysicallyisolatedfromsystemusers.
StorageRequirements
TheDGXSuperPODcomputearchitecturemustbepairedwithahigh-performance,balanced,storagesystemtomaximizeoverallsystemperformance.TheDGXSuperPODisdesignedtousetwoseparatestoragesystems,high-performancestorage(HPS)anduserstorage,optimizedforkeyoperationsofthroughput,parallelI/O,aswellashigherIOPSandmetadataworkloads.
High-PerformanceStorage
HPSmustprovide:
>High-performance,resilient,POSIX-stylefilesystemoptimizedformulti-threadedreadandwriteoperationsacrossmultiplenodes.
>NativeInfiniBandsupport.
>LocalsystemRAMfortransparentcachingofdata.
>Leveragelocaldisktransparentlyforcachingoflargerdatasets.
UserStorage
Userstoragemust:
>Bedesignedforhighmetadataperformance,IOPS,andkeyenterprisefeaturessuch
ascheckpointing.ThisisdifferentthantheHPS,whichisoptimizedforparallelI/Oandlargecapacity.
>CommunicateoverEthernettoprovideasecondarypathtostorageso,thatinthe
eventofafailureofthestoragefabricorHPS,nodescanstillbeaccessedandmanagedbyadministratorsinparallel.
NVIDIADGXSuperPOD:NextGenerationScalableInfrastructureforAILeadershipRA-11333-001v6|6
DGXSuperPODArchitecture
TheDGXSuperPODarchitectureisacombinationofDGXsystems,InfiniBandandEthernetnetworking,managementnodes,andstorage.
Figure2
showstheracklayoutofasingleSU.Inthisexample,powerconsumptionperrackexceeds40kW.Theracklayoutcanbeadjustedtomeetlocaldatacenterrequirements,suchasmaximumpowerperrackandracklayoutbetweenDGXsystemsandsupportingequipmenttomeetlocalneedsforpowerandcoolingdistribution.
Figure2.CompletesingleSUracklayout
NVIDIADGXSuperPOD:NextGenerationScalableInfrastructureforAILeadershipRA-11333-001v6|7
Figure3
showsatypicalmanagementrackconfigurationwithInfiniBandandEthernet
switches,managementservers,storagearrays,andUFMappliances.
Figure3.Typicalmanagementrack
NVIDIADGXSuperPOD:NextGenerationScalableInfrastructureforAILeadershipRA-11333-001v6|8
NetworkFabrics
SeveralnetworksaredeployedontheDGXSuperPOD.Thecomputefabricisusedforinter-nodecommunicationthroughtheapplications.Aseparatestoragefabricisusedtoisolatestoragetraffic.TherearetwoEthernetfabricsforin-bandandOOBmanagement.Requirementsforeachsectionaredetailedbelow.Inaddition,designsforthenetworkareprovidedaftertherequirements.
Figure4
showsthedifferentportsonthebackoftheDGXH100CPUtrayandtheconnectivityprovided.TheInfiniBandcomputefabricportsinthemiddleuseatwo-porttransceivertoaccessalleightGPUs.Eachpairofin-bandEthernetmanagementandInfiniBandstorageportsprovideparallelpathwaysintotheDGXH100systemforincreasedperformance.TheOOBportisusedforBMCaccess.Inaddition,thereisanadditionalLANportnexttotheBMCbutisnotusedintheDGXSuperPOD.
Figure4.DGXH100networkports
NVIDIADGXSuperPOD:NextGenerationScalableInfrastructureforAILeadershipRA-11333-001v6|9
Compute—InfiniBandFabric
Figure5
showsthecomputefabriclayoutforthefull127-nodeDGXSuperPOD.Eachgroupof32nodesisrail-aligned.TrafficperrailoftheDGXH100systemsisalwaysonehopawayfromtheother31nodesinaSU.Trafficbetweennodes,orbetweenrails,traversesthespinelayer.
Figure5.ComputeInfiniBandfabricforfull127nodeDGXSuperPOD
Table2
showsthenumberofcablesandswitchesrequiredforthecomputefabricfordifferentSUsizes.
Table2.Computefabriccomponentcount
SUCount
Cluster
Size#
Nodes
ClusterSize
#GPUs
LeafSwitchCount
SpineSwitchCount
Compute+UFM
NodeCable
Count
Spine-LeafCableCount
1
311
248
8
4
252
256
2
63
504
16
8
508
512
3
95
760
24
16
764
768
4
127
1016
32
16
1020
1024
1.Thisisa32nodeperSUdesign,howeveraDGXNodemustberemovedtoaccommodateforUFMconnectivity.
BuildingsystemsbySUprovidesthemostefficientdesigns.However,ifadifferentnodecountisrequiredduetobudgetaryconstraints,datacenterconstraints,orotherneeds,thefabricshouldbedesignedtosupportthefullSU,includingleafswitchesandleaf-spinecables,andleavetheportionofthefabricunusedwherethesenodeswouldbelocated.Thiswillensureoptimaltrafficroutingandensurethatperformanceisconsistentacrossallportionsofthefabric.
NVIDIADGXSuperPOD:NextGenerationScalableInfrastructureforAILeadershipRA-11333-001v6|10
Storage—InfiniBandFabric
ThestoragefabricemploysanInfiniBandnetworkfabricthatisessentialtomaximumbandwidth
(Figure6
).ThisisbecausetheI/Oper-nodefortheDGXSuperPODmustexceed40GBps.High-bandwidthrequirementswithadvancedfabricmanagementfeatures,suchascongestioncontrolandAR,providesignificantbenefitsforthestoragefabric.
Figure6.InfiniBandstoragefabriclogicaldesign
Thestoragefabricuses
MQM9700-NS2F
switches
(Figure7
).Thestoragedevicesareconnectedata1:1porttouplinkratio.TheDGXH100systemconnectionsareslightlyoversubscribedwitharationear4:3withadjustmentsasneededtoallowformorestorageflexibilityregardingcostandperformance.
Figure7.MQM9700-NS2Fswitch
NVIDIADGXSuperPOD:NextGenerationScalableInfrastructureforAILeadershipRA-11333-001v6|11
In-BandManagementNetwork
Thein-bandmanagementnetworkprovidesseveralkeyfunctions:
>Connectsalltheservicesthatmanagethecluster.
>Enablesaccesstothehomefilesystemandstoragepool.
>Providesconnectivityforthein-clusterservicessuchasBaseCommandManager,
SlurmandtootherservicesoutsideoftheclustersuchastheNGCregistry,coderepositories,anddatasources.
Figure8
showsthelogicallayoutofthein-bandEthernetnetwork.Thein-bandnetworkconnectsthecomputenodesandmanagementnodes.Inaddition,theOOBnetworkisconnectedtothein-bandnetworktoprovidehigh-speedinterfacesfromthemanagementnodestosupportparalleloperationstodevicesconnectedtotheOOBstoragefabric,suchasstorage.
Figure8.In-bandEthernetnetwork
Thein-bandmanagementnetworkuses
SN4600C
switches
(Figure9
).
Figure9.SN4600Cswitch
NVIDIADGXSuperPOD:NextGenerationScalableInfrastructureforAILeadershipRA-11333-001v6|12
Out-of-BandManagementNetwork
Figure10
showstheOOBEthernetfabric.ItconnectsthemanagementportsofalldevicesincludingDGXandmanagementservers,storage,networkinggear,rackPDUs,andallotherdevices.Theseareseparateontotheirownfabricsincethereisnouse-casewhereusersneedaccesstotheseportsandaresecuredusinglogicalnetworkseparation.
Figure10.LogicalOOBmanagementnetworklayout
TheOOBmanagementnetworkusesSN2201switches
(Figure11
).
Figure11.SN2201switch
NVIDIADGXSuperPOD:NextGenerationScalableInfrastructureforAILeadershipRA-11333-001v6|13
StorageArchitecture
Data,lotsofdata,isthekeytodevelopmentofaccuratedeeplearning(DL)models.Datavolumecontinuestogrowexponentially,anddatausedtotrainindividualmodelscontinuestogrowaswell.Dataformat,notjustvolumecanplayakeyfactorintherateatwhichdataisaccessed.TheperformanceoftheDGXH100systemisuptoninetimesfasterthanitspredecessor.Toachievethisinpractice,storagesystemperformancemustscalecommensurately.
ThekeyI/OoperationinDLtrainingisre-read.Itisnotjustthatdataisread,butitmustbereusedagainandagainduetotheiterativenatureofDLtraining.Purereadperformancestillisimportantassomemodeltypescantraininafractionofanepoch(ex:somerecommendermodels)andinferenceofexistingcanbehighlyI/Ointensive,muchmoresothantraining.Writeperformancecanalsobeimportant.AsDLmodelsgrowinsizeandtime-to-train,writingcheckpointsisnecessaryforfaulttolerance.ThesizeofcheckpointfilescanbeterabytesinsizeandwhilenotwrittenfrequentlyaretypicallywrittensynchronouslythatblocksforwardprogressofDLmodels.
Ideally,dataiscachedduringthefirstreadofthedataset,sodatadoesnothavetoberetrievedacrossthenetwork.SharedfilesystemstypicallyuseRAMasthefirstlayerofcache.Readingfilesfromcachecanbeanorderofmagnitudefasterthanfromremotestorage.Inaddition,theDGXH100systemprovideslocalNVMestoragethatcanalsobeusedforcachingorstagingdata.
NVIDIADGXSuperPOD:NextGenerationScalableInfrastructureforAILeadershipRA-11333-001v6|14
DGXSuperPODisdesignedtosupportallworkloads,butthestorageperformancerequiredtomaximizetrainingperformancecanvarydependingonthetypeofmodelanddataset.Theguidelinesin
Table3
and
Table4
areprovidedtohelpdeterminetheI/Olevelsrequiredfordifferenttypesofmodels.
Table3.Storageperformancerequirements
PerformanceLevel
WorkDescription
DatasetSize
Good
NaturalLanguageProcessing(NLP)
Datasetsgenerallyfitwithinlocalcache
Better
Imageprocessingwithcompressedimages(ex:ImageNet)
Manytomostdatasetscanfitwithinthelocalsystem’scache
Best
Trainingwith1080p,4K,or
uncompressedimages,offline
inference,ETL,
Datasetsaretoolargetofitintocache,massivefirstepochI/Orequirements,workflowsthatonlyreadthedatasetonce
Table4.Guidelinesforstorageperformance
PerformanceCharacteristic
Good(GBps)
Better(GBps)
Best(GBps)
Single-noderead
4
8
40
Single-nodewrite
2
4
20
SingleSUaggregatesystemread
15
40
125
SingleSUaggregatesystemwrite
7
20
62
4SUaggregatesystemread
60
160
500
4SUaggregatesystemwrite
30
80
250
Evenforthebestcategoryabove,itisdesirablethatthesinglenodereadperformanceisclosertothemaximumnetworkperformanceof80GBps.
Note:Asdatasetsgetlarger,theymaynolongerfitincacheonthelocalsystem.PairinglargedatasetsthatdonotfitincachewithveryfastGPUscancreateasituationwhereitisdifficulttoachievemaximumtrainingperformance.NVIDIAGPUDirectStorage®(GDS)providesawaytoreaddatafromtheremotefilesystemorlocalNVMedirectlyintoGPUmemoryprovidinghighersustainedI/Operformancewithlowerlatency.UsingthestoragefabricontheDGXSuperPOD,aGDS-enabledapplicationshouldbeabletoreaddataatover40GBpsdirectlyintotheGPUs.
NVIDIADGXSuperPOD:NextGenerationScalableInfrastructureforAILeadershipRA-11333-001v6|15
High-speedstorageprovidesasharedviewofanorganization’sdatatoallnodes.Itmustbeoptimizedforsmall,randomI/Opatterns,andprovidehighpeaknodeperformanceandhighaggregatefilesystemperformancetomeetthevarietyofworkloadsanorganizationmayencounter.High-speedstorageshouldsupportbothefficientmulti-threadedreadsandwritesfromasinglesystem,butmostDLworkloadswillberead-dominant.
Usecasesinautomotiveandothercomputervision-relatedtasks,where1080pimagesareusedfortraining(andinsomecasesareuncompressed)involvedatasetsthateasilyexceed30TBinsize.Inthesecases,4GBpsperGPUforreadperformanceisneeded.
WhileNLPcasesoftendonotrequireasmuchreadperformancefortraining,peakperformanceforreadsandwritesareneededforcreatingandreadingcheckpointfiles.Thisisasynchronousoperationandtrainingstopsduringthisphase.Ifyouarelookingforbestend-to-endtrainingperformance,donotignoreI/Ooperationsforcheckpoints.
Theprecedingmetricsassumeavarietyofworkloads,datasets,andneedfortraininglocallyanddirectlyfromthehigh-speedstoragesystem.Itisbesttocharacterizeworkloadsandorganizationalneedsbeforefinalizingperformanceandcapacityrequirements.
NVIDIADGXSuperPOD:NextGenerationScalableInfrastructureforAILeadershipRA-11333-001v6|16
DGXSuperPODSoftware
DGXSuperPODisanintegratedhardwareandsoftwaresolution.Theincludedsoftware
(Figure12
)isoptimizedforAIfromtoptobottom,fromtheacceleratedframeworksandworkflowmanagementthroughtosystemmanagementandlow-leveloperatingsystem(OS)optimizations,everypartofthestackisdesignedtomaximizetheperformanceandvalueofDGXSuperPOD.
Figur
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
评论
0/150
提交评论