版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
What
Every
TechnologistShould
Know
About
AI
andDeepLearningAlex
McDonaldStandards
&
Industry
Associations,
NetApp
Inc.The
information
is
intended
to
outline
our
general
product
direction.
It
isintendedfor
information
purposesonly,and
may
not
be
incorporated
into
any
contract.
It
isnot
a
commitment
to
deliver
any
material,
code,
or
functionality,
and
should
notbe
relied
upon
in
making
purchasing
decisions.
NetApp
makes
no
warranties,expressed
or
implied,
on
future
functionality
and
timeline.
The
development,release,
and
timing
of
any
features
or
functionality
described
for
NetApp’sproducts
remains
at
the
sole
discretion
of
NetApp.
NetApp's
strategy
andpossible
future
developments,
products
and
or
platforms
directions
andfunctionality
are
all
subject
to
change
without
notice.
NetApp
has
no
obligation
topursue
any
course
of
business
outlined
in
this
document
or
anyrelatedpresentation,
or
to
develop
or
release
any
functionality
mentioned
therein.2Why
is
AI
&
Deep
Learning
Important?AI
and
Deep
Learning
is
disrupting
everyindustryFor
decades,
AIwas
allabout
improvingalgorithmsNow
thefocus
ison
putting
AI
topracticaluseCritical
to
leverage
well-engineeredsystemsThis
talkwillTake
you
on
a
broad
coherent
tour
of
Deep
Learning
systemsHelp
you
appreciate
the
rolewell-engineered
systems
play
in
AIdisruptionTake
you
a
step
closer
to
being
a
unicornSystems
+
AI
-
something
highly
desirable,
difficult
to
obtainNetApp
INSIGHT
©2019
NetApp,
Inc.
All
rightsreserved.
NetApp
Confidential–
Limited
Use
Only3AgendaAI
PrimerAI
Stacks
OverviewDeep
Learning
ProcessTrainingInferenceDeep
Learning
SystemsHardwareSoftwareDatasets
and
DataflowFutureof
Systems4BackgroundAI
or
ML
orDLAI
–
program
that
imitates
human
intelligenceML
–
program
that
learns
with
experience
(i.e.,
data)DL
–
ML
using
>1
hidden
layers
of
neural
networkNetApp
INSIGHT
©2019
NetApp,
Inc.
All
rightsreserved.
NetApp
Confidential–
Limited
Use
Only5Deep
Learning
101Basic
concepts
and
terminologyNeuron:
computational
unitDL
Model==type&
structureMore
layers
=>
better
capture
the
features
indataset,
better
performance
at
task
(normally)Parameters/WeightsNeuronParameter/WeightsLayer6Deep
Learning
101Basic
concepts
and
terminologyNeuron:
computational
unitDL
Model==type&
structureMore
layers
=>
better
capture
the
features
indataset,
better
performance
at
task
(normally)Parameters/WeightsTraining:
build
a
model
from
datasetEpoch:
a
pass
over
entire
datasetBatch:
a
chunk
of
dataPreprocessing/preparation:
ready
data
to
trainBackpropagationRepeatForward
propagation“mountain”data7Deep
Learning
101Basic
concepts
and
terminologyNeuron:
computational
unitDL
Model==type&
structureMore
layers
=>
better
capture
the
features
indataset,
better
performance
at
task
(normally)Parameters/WeightsTraining:
build
a
model
from
datasetEpoch:
a
pass
over
entire
datasetBatch
size:
a
chunk
of
dataPreprocessing/preparation:
ready
data
to
trainInference:
usinga
trainedmodelForward
propagation“mountain”New
data8Deep
Learning
101Basic
concepts
and
terminologyForward
propagationState-of-the-Art
DL
is
large
scale100s
of
layers
Millions
ofparameters100s
of
GBs
to
TBs
of
dataHours/days
to
train“mountain”data9AI
StackOverview10AI
StackLayersAIPaaS,
End-to-end
solutionsAIStackLayersGPUs,
TPUs,
FPGAsOptimized
hardware
to
provide
tremendousspeed-up
for
training,
sometimesinferenceMore
easily
available
on
cloud
for
rentModern
Compute11AIStackLayersTensorflow,
PyTorch,
Caffe2,
MxNet,
CNTK,Keras,
GluonLibrary
that
implements
algorithms,
providesexecution
engine
andprogramming
APIsUsed
to
train
and
build
sophisticated
models,and
to
do
predictions
based
on
the
trainedmodel
for
new
dataSoftwareModern
Compute12AIStackLayersLaptop,
Cloud
compute
instances,
H2O
Deep
Water,
Spark
DL
pipelinesHardware
accelerated
platforms,
supportingcommon
software
frameworks,
to
run
thetraining
and/or
inference
of
deep
neuralnetworksTypically
optimized
for
a
preferred
softwareframeworkCan
be
hostedon-prem
or
cloudAlso
offered
as
fully-managed
service
(PaaS)by
cloud
vendors
like
Amazon
SageMaker,Google
Cloud
ML,
Azure
MLPlatformSoftwareModern
Compute13AIStackLayersAmazon
Rekognition,
Lex
&
Polly;
GoogleCloud
API;
Microsoft
Cognitive
Services;Allows
query
based
service
access
togeneralizable
state
of
art
AI
models
forcommontasksEx:
send
an
image
and
get
object
tags
as
result,
send
mp3and
get
converted
text
as
result
and
so
onNo
dataset,
no
training
of
model
required
by
userPer-call
cost
modelIntegrated
with
cloud
storage
and/or
bundled
intoend-to-end
solutions
and
AI
consultancyofferingslike
IBMServices
Watson
AI,
ML
&
Cognitive
consulting,
Amazon's
ML
Solutions
Lab,Google's
Advanced
Solutions
LabAPI-based
servicePlatformSoftwareModern
Compute14Deep
LearningProcessTrainingInferenceDL
Process
and
Data
LifecycleDL
lifecycle
is
very
unlike
traditional
systems
software
developmentGather
DataData
Analysis,Transformation,ValidationModel
TrainingModelEvaluation,Validation,
TuningModelServing,MonitoringGathering
and
curating
quality
datasets
andmaking
them
accessible
across
orgDiverse
tools
and
flexible
infrastructure
neededEvaluation
criteria
is
criticalbut
hardComparing
algorithms
is
not
straightforwardTracking
artifacts
like
dataset
transformations,tuning
history,
model
versions,
validation
resultsmore
important
thancodeDebugging,
interpretability
and
fairness
islimitedTension/friction:
Data
security
and
privacy;
ITNetApp
INSIGHT
©2019
NetApp,
Inc.
All
rightsreserved.
NetApp
Confidential–
Limited
Use
Only16Deep
Learning
TrainingTraining:
build
a
model
from
a
datasetIs
memory
and
computeboundBig
datasets,
complex
math
operationsIs
highly
parallelized/distributed
–
acrosscores,across
machinesPartition
data,
or
model,
or
bothScale
Up
before
Scale
OutCommunication
to
computation
ratioSpeed
vs
Accuracy
tradeoffFederated
learningLeans
on
enhancements
to
data
qualityAugmentation,
randomnessTransformationsEfficiently
fit
in
memoryForward
propagationBackpropagationRepeat“mountain”17Deep
Learning
TrainingTraining:
build
a
model
from
a
datasetSupervision
-
rely
on
labeled
dataTransfer
learning:
a
pre-trained
model,
train
few
layersLearning
labelInvolves
a
lot
of
hyperparameter
tuningExample:
#layers,
#neurons,
batch
size
…Multi-model
training
on
same
data
setTrial
and
error
search
-
easier
to
automateRise
of
AutoMLLearn
how
to
model
given
data
–
nomodeling/tuningexpertise
requiredExample:
AmoebaNet
beat
ResNet
ImageClassificationForward
propagationBackpropagationRepeat“mountain”18Deep
Learning
InferenceInference*:
use
a
trained
model
on
new
dataIs
computationally
simplersingle
forward
passTypically
a
containerized
RPC/Web
serverwith
pre-installed
DL
software
+
NNmodelMultiple
inputs
are
batched
for
better
throughputBut
much
smaller
than
training
batchLowlatency*
aka
Model
Serving,
Deployment,
PredictionForward
propagation“mountain”19Deep
Learning
InferenceInference*:
use
a
trained
model
on
new
dataDL
models
can
be
hugemay
need
hardware
accelerationOn-device/Edge
inference
is
gaining
tractionReason:
latency
&
privacyspecial
modeloptimizations
–pruninghardware
on-devicePortability
and
interoperability
of
model
isimportantTrain
any
way,
deploy
anywhereExample:
ONNX
is
a
step
towardsstandardizing*
aka
Model
Serving,
Deployment,
PredictionForward
propagation“mountain”20Deep
LearningSystemsHardware
AccelerationSoftware
FrameworksCPUs
are
still
used
for
ML
trainingCPUs
are
common
forinferenceincluding
certain
DL
inferenceStruggle
to
handle
DL
trainingData
preprocessing
are
suited
for
CPUsHybrid
hardware
of
CPUs
with
otheraccelerators
is
commonRole
of
CPU
in
AIAcceleratorslike
GPU22GPU/Image
source:
https://.
Courtesy
Daniel
Whitenack.De
facto
hardware
for
AI
trainingAlso
for
large
scaleinferenceGPU
vs
CPU
:
many
more
cores,
parallelizationModern
GPU
architectures
used
for
AIHigh
speed
interconnect
between
CPU/GPUs(NVLink)Bypass
CPU
for
communication
(GPUDirect)Efficientmessage
passing
(Collective-All-Reduce)Available
in
cloud
(EC2
P*
instances)
and
on-premise
(DGX)23Hardware
Acceleration
forDLGPU
(Graphic
ProcessingUnit)Without
GPUDirectWithGPUDirectImage
source:
https://ASIC
designed
to
speed
up
DL
operations,
likeGoogle’s
TPU
(Tensor
Processing
Unit)HighperformanceLess
flexibleEconomical
only
at
large
scaleSpecialoptimizations
inhardwareFor
example:
reduced
precision,
matmul
operatorDesign
for
inference
is
different
from
thatfortrainingFor
example:
in
1st
generation
TPUs
fp-units
werereplacedbyint8-unitsHardware
Acceleration
forDLASIC
(Application
Specific
Integrated
Circuit)Image
source:
/nips17/assets/slides/dean-nips17.pdf24Hardware
Acceleration
forDLFPGA
(Field
Programmable
Gated
Arrays)Designed
to
bereconfigurableFlexibility
to
change
as
neural
networks
and
new
algorithms
evolveOffer
much
higher
Performance/Watt
than
GPUsCost
effective
and
excel
at
inferenceReprogramming
an
FPGA
is
not
easyLow
levellanguageAvailable
on
cloud
(EC2
F1instances)25Primarily
limited
to
inference-onlySpecialSoCdesign
with
reduced
die
spaceEnergy
efficiency
and
memory
efficiency
is
more
criticalSpecialoptimization
to
support
specific
tasks-onlyFor
example:
speech-only,
vision-onlyExamples:
Apple’s
Neural
Engine,
Huawei’s
NPUHardware
Acceleration
for
On-device
AI26Software
FrameworksFrontendAbstracts
the
mathematical
and
algorithm
implementationdetails
of
Neural
NetworksProvides
a
high
level
building
blocks
API
to
defineneuralnetwork
models
over
multiple
backendsA
high
level
language
libraryBackendHideshardware-specific
programming
APIs
from
userOptimizes
and
parallelizes
the
training
and
inference
processto
work
efficiently
on
the
hardwareMakes
it
easier
to
preprocess
and
prepare
data
for
trainingSupports
multi-GPU,
multi-node
execution27Dataset&Data
FlowUsing
Tensorflow
as
referenceDataset
Transformation
–
ImageNet
ExampleRawdata
vs
TFRecordsRaw
data
is
converted
into
packed
binary
format
for
training
called
TFRecord
(One
time
step)1.2
M
image
files
are
converted
into
1024
TFRecords
with
each
TFRecord
100s
of
MB
in
size1624761281802322843363884404925445966487007528048569089601,0125,120FREQUENCYSIZE
(IN
KB)Raw
ImageNet
Data10000
900080007000600050004000300020001000002920406080100120126127128129130131132133134135136137138139140141142143144145146147148149150151152153154156158161FREQUENCYSIZE(INMB)TFRecord
ImageNetDataTensorFlow
Data
PipelineIO:
Read
data
from
persistent
storagePrepare:
Use
CPU
cores
to
parse
andpreprocess
dataPreprocessing
includes
Shuffling,
datatransformations,
batching
etc.Train:
Load
the
transformed
data
ontothe
accelerator
devices
(GPUs,
TPUs)and
execute
the
DL
modelRead
IOPrepareTrainStorageNetworkCPU/RAMGPUPCIe/NVLinkGPUPCIe/NVLinkTFRecordsHost30WithoutpipeliningCompute
PipeliningWith
Pipelining
(using
prefetch
API)Image
source:
https:///guide/31Parallelize
IO
and
Prepare
PhaseParallelizeprepareParallelize
IOImage
source:
https:///guide/3
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 二零二五版油气田钻井技术服务质量承包合同3篇
- 2025年度环保型厂房设计与施工总承包合同3篇
- 二零二四年在线教育平台软件全国代理销售合同模板2篇
- 2025年度全国范围内土地测绘技术服务合同范文3篇
- 2024版液化天然气交易协议全文下载版B版
- 2024版运输行业职员劳动协议样本
- 2024年地基买卖合同附带地基检测及质量认证3篇
- 2025年大棚农业绿色生产技术引进合同3篇
- 2025年度绿色建筑:知识产权许可与环保建材合同3篇
- 2025年智慧能源物业工程承包及节能服务合同3篇
- 2024版塑料购销合同范本买卖
- 【高一上】【期末话收获 家校话未来】期末家长会
- JJF 2184-2025电子计价秤型式评价大纲(试行)
- GB/T 44890-2024行政许可工作规范
- 有毒有害气体岗位操作规程(3篇)
- 儿童常见呼吸系统疾病免疫调节剂合理使用专家共识2024(全文)
- 2025届山东省德州市物理高三第一学期期末调研模拟试题含解析
- 《华润集团全面预算管理案例研究》
- 二年级下册加减混合竖式练习360题附答案
- 异地就医备案个人承诺书
- 苏教版五年级数学下册解方程五种类型50题
评论
0/150
提交评论