Deep Learning:AI 与 Dell EMC 和 Bitfusion 的生命周期

上传人：媚*** IP属地：境外上传时间：2024-04-23 格式：PPTX 页数：38 大小：5.59MB 积分：12 举报 版权申诉

Deep Learning:AI 与 Dell EMC 和 Bitfusion 的生命周期_第2页

Deep Learning:AI 与 Dell EMC 和 Bitfusion 的生命周期_第3页

Deep Learning:AI 与 Dell EMC 和 Bitfusion 的生命周期_第4页

Deep Learning:AI 与 Dell EMC 和 Bitfusion 的生命周期_第5页

已阅读5页，还剩33页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

DeepLearning/AILifecycle

with

DellEMCand

bitfusionBhavesh

PatelDell

EMC

Server

Advanced

EngineeringAbstractThis

talk

gives

overview

the

end

application

life

cycle

ofdeep

learning

the

enterprise

along

with

numerous

use

cases

andsummarizes

studies

done

Bitfusion

and

Dell

high

performanceheterogeneous

elastic

rack

DellEMC

PowerEdge

C4130s

with

NvidiaGPUs.

Some

the

use

cases

that

will

talked

about

detail

will

beability

bring

on-demand

GPU

acceleration

beyond

the

rack

across

the

enterprise

with

easy

attachable

elastic

GPUs

for

deep

learningdevelopment,

well

the

creation

cost

effective

software

definedhigh

performance

elastic

multi-GPU

system

combiningmultipleDellEMC

C4130

servers

runtime

for

deep

learning

training.Deep

Learning

and

Are

being

adoptedacross

wide

range

market

segmentsIndustry/FunctionAI

RevolutionComputer

Vision

&Speech,Drones,DroidsInteractive

Virtual

Mixed

RealitySelf-Driving

Cars,

Co-PilotAdvisorPredictive

Price

Analysis,Dynamic

DecisionSupportDrug

Discovery,

Protein

SimulationPredictive

Diagnosis,Wearable

IntelligenceGeo-Seismic

Resource

DiscoveryAdaptive

Learning

CoursesAdaptive

Product

RecommendationsDynamic

Routing

OptimizationBots

And

Fully-Automated

ServiceDynamic

Risk

Mitigation

And

Yield

OptimizationROBOTICSENTERTAINMENTAUTOMOTIVEFINANCEPHARMAHEALTHCAREENERGYEDUCATIONSALESSUPPLY

CHAINCUSTOMER

SERVICEMAINTENANCE...but

few

people

have

the

time,knowledge,

resources

even

get

startedPROBLEM

HARDWARE

INFRASTRUCTURE

LIMITATIONSIncreased

cost

with

dense

serversTOR

bottleneck,

limited

scalabilityLimited

multi-tenancy

GPUservers

(limited

CPU

and

memoryper

user)Limited

8-GPU

applicationsDoes

not

support

GPU

apps

with:High

storage,

CPU,

MemoryrequirementsPROBLEM

SOFTWARE

COMPLEXITYOVERLOADSoftware

ManagementGPU

Driver

ManagementFramework

Library

InstallationDeep

Learning

Framework

ConfigurationPackageManagerJupyter

Server

IDE

SetupData

ManagementData

UploaderShared

Local

File

SystemData

Volume

ManagementData

Integrations

PipeliningModel

ManagementCode

Version

ManagementHyperparameter

OptimizationExperiment

TrackingDeployment

AutomationDeployment

Continuous

IntegrationWorkload

ManagementJob

SchedulerLog

ManagementUser

Group

ManagementInference

AutoscalingInfrastructure

ManagementCloud

Server

OrchestrationGPU

Hardware

SetupGPU

Resource

AllocationContainer

OrchestrationNetworking

Direct

BypassMPI

/RDMA

/RPI/gRPCMonitoringNeed

Simplify

andScaleSOLUTION

1/2:

CONVERGED

RACK

SOLUTIONComposable

computebundleUp

GPUs

per

applicationGPU

applications

with

varied

storage,memory,

CPU

requirements30-50%

less

cost

per

GPU>

{cores,

memory}

GPU>>

intra-rack

networking

bandwidthLess

inter-rack

loadComposable

Add-as-you-goSOLUTION

2/2:

COMPLETE,

STREAMLINED

DEVELOPMENTDevelop

pre-installed,

quickstart

deep

learning

containers.••Get

work

quickly

withworkspaces

with

optimized

pre-configured

drivers,

frameworks,libraries,andnotebooks.Start

with

CPUs,

and

attachElasticGPUs

on-demand.Allyour

code

and

data

issavedautomatically

and

sharable

withothers.Transition

from

developmentto

training

with

multipleGPUs.•Seamlessly

scale

out

moreGPUs

shared

training

clusterto

train

larger

models

quickly

andcost-effectively.Support

and

manage

multipleusers,teams,

and

projects.Train

multiple

models

parallelfor

massive

productivityimprovements.Pushtrained,

finalized

modelsinto

production.•Deploy

trained

neural

networkinto

production

and

perform

real-time

inference

across

differenthardware.Managemultiple

applicationsand

inference

endpointscorresponding

different

trainedmodels.•GPUGPUGPUGPUGPUGPGPUGPUGPUU

GPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPU12Dell

EMC

Deep

Learning

Optimized

serversVerticalSegmentApplicationsOpenSourceFrameworksOptimizedLibrariesOperatingSystemProcessor/AcceleratorComputePlatformC4130R730C6320P

inC6300GPUKNLPhiinC6320P

SledNvLink-GPUC4130

DEEP

LEARNING

ServerFront(optional)

RedundantPower

SuppliesDual

SSDbootdrivesBackIDRAC

NIC2x

1GbNICFrontPowerSuppliesGPUaccelerators(4)CPU

sockets(under

heatsinks)8fansGPU

DEEP

LEARNING

RACK

SOLUTIONFeaturesR730C4130CPUE5-2669

v3@2.1GHzE5-2630

v3@

2.4GhzMemory4GB1TB/node;

64G

DIMMStorageIntel

PCIe

NVMEIntel

PCIe

NVMENetworking

IOCX3

FDRInfiniBandCX3

FDRInfiniBandGPUNAM40-24GBTOR

SwitchMellanox

SX6036-

FDRSwitchCablesFDR

56G

DCA

CablesConfiguration

DetailsR730C4130Pre-Built

AppContainersGPU

and

WorkspaceManagementElastic

GPUs

across

theDatacenterSoftware

definedScaled

out

GPU

ServersGPU

DEEP

LEARNING

RACK

SOLUTIONPre-Built

App

ContainersGPUandWorkspaceManagementElastic

GPUs

across

theDatacenterSoftware

definedScaledoutGPU

Servers1

Develop2

Train3DeployEnd

End

Deep

Learning

Application

Life

CycleGPUGPU

GPU

GPUGPUGPU

GPU

GPUGPUGPU

GPU

GPUGPUGPU

GPU

GPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPUC4130

#1GPU

NodesInfinibandSwitchCPU

NodesC4130

#2C4130

#3C4130

#4R730

#1R730

#2…but

wait,

‘converged

compute’requires

network

attached

GPUs...R730C4130BITFUSION

CORE

VIRTUALIZATIONGPU

Device

VirtualizationAllows

dynamic

GPU

attach

per-application

basisFeaturesAPIs: CUDA,

OpenCLDistribution:

scale-out

remote

GPUsPooling:

Oversubscribe

GPUsResourceProvisioning:

Fractional

vGPUsHigh

Availability:

Automatic

DMRManageability:

Remote

nvidia-smiDistributed

CUDA

Unified

MemoryNative

support

for

IB,

GPUDirect

RDMAFeature

complete

with

CUDA

8.0PUTTING

ALL

TOGETHERCLIENT

SERVERGPUSERVERGPUSERVERGPUSERVERBitfusion

Flex,managed

containersBitfusion

Service

DaemonBitfusion

Client

LibraryNATIVE

VS.

REMOTE

GPUsCPUGPU

0GPU

1PCIeCPUGPU

0HCAPCIeCPUHCAGPU

1PCIeCompletely

transparent:

All

CUDA

Apps

see

local

and

remote

GPUs

directly

connectedResultsREMOTE

GPUs

LATENCY

AND

BANDWIDTHData

movement

overheads

the

primary

scalinglimiterMeasurements

done

application

level

–cudaMemcpyFast

Local

GPU

copiesPCIe

Intranode

copies16

GPU

virtual

system:

Naive

implementation

TCP/IPC4130Fast

local

GPUcopiesIntranode

copies

via

PCIeLow

BW,

High

Latency

remote

copiesOSBypass

needed

avoidprimary

TCP/IP

overheadsAIapps

are

very

latency

sensitivenode

0node

1node

2node

316

GPU

virtual

system:

Bitfusion

optimized

transport

and

runtimeSame

FDRx4

transport,

but

drop

IPoIBReplace

remotecallswith

native

verbsRuntime

selectionof

intranode

RDMA

vs.cudaMemcpyMulti-rail

communications

where

availaRbemleote=~

Native

Local

GPUsRuntime

optimizations:

pipelining,

speMciunilmaatlivNUeMA

effectsexecution,

distributed

caching

eventcoalescing,…SLICE

DICE

THAN

ONE

WAY

GET

GPUsCaffe

GoogleNetTensorFlowPixel-CNNR730C4130Native

GPU

performance

with

networkattached

GPUsRun

time

comparison

(lower

better)

→Multiple

ways

create

virtual

GPU

node,with

nativeefficiency(secsto

trainCaffeGoogleNet,

batch

size:

128)TRAINING

PERFORMANCEContinued

Strong

ScalingCaffe

GoogleNetWeak-scalingAccelerate

Hyper

parameter

OptimizationCaffe

GoogleNet

TensorFlow1.0

with

Pixel-CNN74%73%55%53%86%PCIe

host

bridge

limit124816nativeremoteR730C4130Other

PCIe

GPU

Configurations

AvailableCurrently

TestingConfig

‘G’Further

reading:/techcenter/high-performance-computing/b/gener

al_hpc/archive/2016/11/11/deep-learning-performance-with-p100-gpushttp:///techcenter/high-performance-computing/b/general_h

pc/archive/2017/03/22/deep-learning-inference-on-p40-gpuso3f0YNvLink

Configuration••••4P100-16GBSXM2GPU2CPUPCIeswitch1

PCIe

slot

–

EDRIBSXM2#3Config

‘K’SXM2#2SXM2#4SXM2#1o3f1YNvLink

Configuration•••••4P100-16GBSXM2GPU2CPUPCIeswitch1

PCIe

slot

–

EDRIBMemory

256GBw/16GB@

2133OS:

Ubuntu

16.04CUDA:

8.1••Config

‘L’SXM2#3SXM2#2SXM2#4SXM2#1PCIe

SwitchSoftware

Solutionso3f319Overview

–

Bright

Dell

EMC

has

partnered

withBrightComputing

offertheir

BrightML

package

the

software

stack

onDell

EMC

Deep

learninghardwaresolution.o3f419Bright

OverviewMachine

Learning

SeismicImaging

Using

KNL

FPGA–Project

#1Bhavesh

Patel

–

Server

Advanced

EngineeringRobert

Dildy

Product

Technologist

Sr.

Consultant,Engineering

Solutions36AbstractThis

paper

focused

how

apply

Machine

Learning

seismic

imaging

with

the

use

FPGA

aco-accelerator.It

will

cover

hardware

technologies:

Intel

KNL

Phi

FPGA

and

also

address

how

use

Machine

learningforseismic

imaging.There

are

different

types

accelerators

GPU,

Intel

Phi

but

are

choosing

study

how

can

use

i-ABRAplatform

KNL

FPGA

train

the

neural

network

using

Seismic

Imaging

data

and

then

doing

the

inference.Machine

learning

broader

sense

can

divided

into

parts

namely

Training

and

Inference.37BackgroundSeismic

Imaging

standard

data

processing

technique

used

creating

image

subsurface

structures

ofthe

Earth

from

measurements

recorded

the

surface

via

seismic

wave

propagations

captured

from

varioussound

energy

sources.There

are

certain

challenges

with

Seismic

data

interpretation

starting

replace

for

seismicinterpretation.There

has

been

rapid

growth

use

computer

vision

technology

several

companies

developing

imagerecognition

platforms.

This

technology

being

used

for

automatic

photo

tagging

and

classificatio

人人文库> 全部分类> 行业资料 > 信息产业

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

Deep Learning:AI 与 Dell EMC 和 Bitfusion 的生命周期

文档简介

温馨提示

最新文档

评论

Deep Learning:AI 与 Dell EMC 和 Bitfusion 的生命周期

文档简介

温馨提示

最新文档

评论

相关文档