版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
2024人工智能AI技术教程
课程
讲义名称
备注
1
课程介绍
Overviewandsystem/AIbasics
2
人工智能系统概述
SystemperspectiveofSystemforAI
SystemforAI:ahistoricview;Fundamentalsofneuralnetworks;FundamentalsofSystemforAI
3
深度神经网络计算框架基础
ComputationframeworksforDNN
BackpropandAD,Tensor,DAG,ExecutiongraphPapersandsystems:PyTorch,TensorFlow
4
矩阵运算与计算机体系结构
ComputerarchitectureforMatrixcomputation
Matrixcomputation,CPU/SIMD,GPGPU,ASIC/TPU
Papersandsystems:Blas,TPU
5
分布式训练算法
Distributedtrainingalgorithms
Dataparallelism,modelparallelism,distributedSGDPapersandsystems:
6
分布式训练系统
Distributedtrainingsystems
MPI,parameterservers,all-reduce,RDMAPapersandsystems:Horovod
7
异构计算集群调度与资源管理系统
Schedulingandresourcemanagementsystem
RunningDNNjoboncluster:container,resourceallocation,schedulingPapersandsystems:KubeFlow,OpenPAI,Gandiva,HiveD
8
深度学习推导系统
Inferencesystems
Efficiency,latency,throughput,anddeployment
课程
讲义名称
备注
9
计算图编译优化
Computationgraphcompilationandoptimization
IR,sub-graphpatternmatch,Matrixmultiplicationandmemoryoptimization
Papersandsystems:XLA,MLIR,TVM,NNFusion
10
模型压缩和稀疏化处理
Efficiencyviacompressionandsparsity
Modelcompression,SparsityPruning
11
自动机器学习系统
AutoMLsystems
Hyperparametertuning,NAS
Papersandsystems:Hyperband,SMAC,ENAS,AutoKeras,NNI
12
强化学习系统
Reinforcementlearningsystems
TheoryofRL,systemsforRL
Papersandsystems:AC3,RLlib,AlphaZero
13
模型安全与隐私保护
SecurityandPrivacy
Federatedlearning,security,privacyPapersandsystems:DeepFake
14
用AI技术优化计算机系统
AIforsystems
AIfortraditionalsystemsproblems,forsystemalgorithms
Papersandsystems:LearnedIndexes,Learnedquerypath
课程
讲义名称
备注
Lab1(forweek1,2)
框架及工具入门示例
Asimplethroughoutend-to-endAIexample,froma
systemperspective
Understandthesystemsfromdebuggerinfoand
systemlogs
Lab2(forweek3)
定制一个新的张量运算
Customizeoperators
Designandimplementacustomizedoperator(bothforwardandbackward):inpython
Lab3(forweek4)
CUDA实现和优化CUDAimplementation
AddaCUDAimplementationforthecustomizedoperator
Lab4(forweek5,6)
AllReduce实现和优化
AllReduce
ImproveoneofAllReduceoperators’
implementationonHorovod
Lab5(forweek7,8)
配置Container来进行云上训练或推理准备
Configurecontainersforcustomizedtrainingandinference
Configurecontainers
Lab6
学习使用调度管理系统
Schedulingandresourcemanagementsystem
GetfamiliarwithOpenPAIorKubeFlow
Lab7
分布式训练任务练习
Distributedtraining
Trydifferentkindsofallreduceimplementations
Lab8
自动机器学习系统练习
AutoML
SearchforanewneuralnetworkNNstructureeforImage/NLPtasks
Lab9
强化学习系统练习
RLSystems
Configureandgetfamiliarwithoneofthefollowing
RLSystems:RLlib,…
Self-driving Surveillancedetection Translation Medicaldiagnostics Game
Personalassistant
DeepLearning
深度学习正在改变世界
Art
Imagerecognition Speechrecognition Naturallanguage Generativemodel Reinforcementlearning
cat
dog
honeybadger
𝑤1 𝑤2 𝑤3 𝑤4 𝑤5
CatDogRaccoon
loss
𝑑error
𝑑𝑤1
𝑑error
𝑑𝑤2
𝑑error
𝑑𝑤3
𝑑error
𝑑𝑤4
𝑑error
𝑑𝑤5
Errors
Dog
RDMA
海量的(标识)数据
深度学习算法的进步 语言、框架
计算能力
14Mimages
深度学习+系统的进步:编程语言、优化、计算机体系结构、并行计算以及分布式系统
E.g.,imageclassificationproblem
MNIST
ImageNet
WebImages
60Ksamples
16Msamples
BillionsofImages
10categories
1000categories
Openedcategories
TESTERRORRATE(%)
12
5
7.7
3.3
1.4
4.7
1.7
0.23
LeNet,convolution,max-pooling,softmax,1998
EfficientNet,
3.1%
NAS2019
AlexNet,16.4%
ReLU,Dropout,
2012
Inception,6.7%Batchnormalization,2015
ResNet,3.57%Residualway,2015
Imagerecognition
Speechrecognition
Naturallanguage
Reinforcementlearning
TPUv3
360Tops
V100
TPUv1
125Tops
90Tops
Performance(Op/Sec)
?
TPU
Dedicated
Hardware
GPU
CPU
Moore’slaw
5Kops
ENIAC
~500Gops
XeonE5
108x
105x
1960
1970 1980 1990 2000 2010
2019
CompilerBackend
TVM
TensorFlowXLA
LanguageFrontend
SwiftforTensorFlow
MxNetTensorFlowCNTK
PyTorch
Custompurposemachinelearningalgorithms
TheanoDisBeliefCaffe
Deeplearningframeworks
Algebra&
linearlibs
CPU
GPU
Densematmulengine
GPU
FPGA
SpecialAIaccelerators
TPU
GraphCore
OtherASICs
Custompurposemachinelearningalgorithms
TheanoDisBeliefCaffe
Deeplearningframeworksprovideeasierwaystoleveragevariouslibraries
MachineLearningLanguageandCompiler
PowerfulCompilerInfrastructure:
Codeoptimization,sparsityoptimization,hardwaretargeting
AFull-FeaturedProgrammingLanguageforML:Expressiveandflexible
Controlflow,recursion,sparsity
Algebra&
linearlibs
CPU
GPU
AIframeworkDensematmulengine
SIMDMIMD
SparsitySupport
ControlFlowandDynamicityAssociatedMemory
End-to-EndAIUserExperiences
Model,Algorithm,Pipeline,Experiment,Tool,LifeCycleManagement
Experience
ProgrammingInterfacesComputationgraph,(auto)Gradientcalculation
IR,Compilerinfrastructure
Frameworks
HardwareAPIs(GPU,CPU,FPGA,ASIC)
ResourceManagement/Scheduler
ScalableNetworkStack(RDMA,IB,NVLink)
DeepLearningRuntime:Optimizer,Planner,Executor
Runtime
Architecture
(singlenodeandCloud)
class3
class4
class5
class6
class7
class8
更广泛的AI系统生态
class12
机器学习新模式
(RL)
深度学习算法和框架
class11
class13
class10
自动机器学习
(AutoML)
安全与隐私
模型推导、压缩与优
化
广泛用途的高效新型通用AI算法
多种深度学习框架的支持与进化
深度神经网络编译架
构及优化
核心系统软硬件
深度学习任务运行和优 通用资源管理和调度系化环境 统
新型硬件及相关高性能网络和计算栈
(2)开始训练
定义网络结构
Fullyconnected
通常用作分类问题的最后几层
Convolutionalneuralnetwork
通常用作图像、语音等Locality强的数据
Recurrentneuralnetwork
通常用作序列及结构化的数据,比如文本信息、知识图
Transformerneuralnetwork
通常用作序列数据,比如文本信息
#ArecursiveTreeBankmodelinadozenlinesofJPLcode#Walkthetree,accumulatingembeddingvecs
#Wordembeddingmodelisusedattheleafnodetomapword#indexintohigh-dimensionalsemanticwordrepresentation.
#Getsemanticrepresentationsforleftandrightchildren.
#Acompositionfunctionisusedtolearnsemantic#representationforphraseattheinternalnode.
#Maptreeembeddingtosentiment
更多样化的结构
更强大的建模能力
更复杂的依赖关系
更细粒度的计算模式
ExecutionRuntime
CPU,GPU,RDMAdevices
Graphdefinition(IR)
xw
*b
+
y
Front-end
LanguageBinding:Python,Lua,R,C++
Optimization
Batching,Cache,Overlap
x
y
z
*
a
+
b
Σ
c
TensorFlow
Data-FlowGraph(DFG)
asIntermediateRepresentation
x
y
z
𝛻x
𝛻y
*
a
*𝐠
𝛻z
+
b
Σ
c
+𝐠
𝛻a
𝛻b
Σ𝐠
AddgradientbackpropagationtoData-FlowGraph(DFG)
TensorFlow
x y z
𝛻x 𝛻y
CPUcode
GPUcode
* *
a
+ +𝐠
b 𝛻b
Σ Σ𝐠
c
𝛻a
𝛻z
x
y
z
𝛻x
𝛻y
*
a
*𝐠
𝛻z
+
b
Σ
c
+𝐠
𝛻a
𝛻b
Σ𝐠
......
1
Operators
IDE
Programmingwith:VSCode,JupiterNotebook
Language
IntegratedwithmainstreamPL:PyTorchandTensorFlowinsidePython
Compiler
Intermediaterepresentation
Compilation
Optimization
Basicdatastructure:Tensor
Lexicalanalysis:Token
Usercontrolled:mini-batch
Basiccomputation:DAG
Parsing:AST
Dataparallelismandmodelparallelism
Advancefeatures:controlflow
Semanticanalysis:SymbolicAD
Loopnetsanalysis:pipelineparallelism,controlflow
GeneralIRs:MLIR
Codeoptimization
Dataflowanalysis:CSP,Arithmetic,Fusion
Codegeneration
Hardwaredependentoptimizations:matrixcomputation,layout
Resourceallocationandscheduler:memory,recomputation,
Runtimes
Singlenode:CuDNN
Multimode:Parameterservers,Allreducer
Computationclusterresourcemanagementandjobscheduler
Hardware
Hardwareaccelerators:CPU/GPU/ASIC/FPGA
Networkaccelerators:RDMA/IB/NVLink
Experience
Frameworks
Architecture
CompilerBackend
TVM
TensorFlowXLA
LanguageFrontend
SwiftforTensorFlow
MxNetTensorFlowCNTK
PyTorch
Deeplearningframeworks
SpecialAIaccelerators
TPU
GraphCore
OtherASICs
AIFrameworkDense
matmulengine
GPU
FPGA
import"tensorflow/core/framework/to";import"tensorflow/core/framework/op_to";import"tensorflow/core/framework/tensor_to
MachineLearningLanguageandCompiler
PowerfulCompilerInfrastructure:
Codeoptimization,sparsityoptimization,hardwaretargeting
AFull-FeaturedProgrammingLanguageforML:Expressiveandflexible
Controlflow,recursion,sparsity
SIMDMIMD
SparsitySupport
ControlFlowandDynamicityAssociatedMemory
//SyntacticallysimilartoLLVM:
func@testFunction(%arg0:i32){
%x=call@thingToCall(%arg0):(i32)->i32br^bb1
^bb1:
%y=addi%x,%x:i32
return%y:i32}
深度学习高度依赖数据规模和模型规模
8layers
1.4GFLOP
16%Error
2012
AlexNet
Image
152layers
22.6GFLOP
3.5%Error
2015
ResNet
Speech
提高训练速度可以加快深度学习模型的开发速度
大规模部署深度学习模型需要更快和更高效的推演速度
InferenceperformanceServinglatency
80GFLOP
7,000hrsofData
8%Error
2014
DeepSpeech1
465GFLOP
12,000hrsofData
5%Error
2015
DeepSpeech2
Differentarchitectures:CNN,
RNN,Transformer,…
Highcomputationresource
requirements:modelsize,…
Differentgoals:latency,
throughput,accuracy,…
Betransparenttovarioususerrequirements
Transparentlyapplyoverheterogeneoushardwareenvironment
Scale-out LocalEfficiency MemoryEffectiveness
系统、算法和硬件必须相互结合:
算法层面:模型的结构,是否可压缩、可稀疏化,batch的大小、学习算法
系统层面:各个层次的并行化,去重,Overlap,调度与资
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 智能制造产业链课程设计
- 2024事业单位员工晋升与职业发展合同3篇
- 2024年有机食品生产与销售独家授权合同
- 幼儿诗歌活课程设计
- 现代口腔修复材料的选择与应用
- 微机课程设计要求
- 未来校园主题课程设计
- 上海工程技术大学《金融营销》2023-2024学年第一学期期末试卷
- 幼儿园粘土课程设计
- 学前教育托管课程设计
- 重大火灾事故隐患检查表
- 部编版语文四年级上册普罗米修斯教学反思(两篇)
- 默纳克电梯故障代码(珍藏版)
- 中国台湾茂迪MT4090 LCR测试仪 数字式电桥
- 【课件】第三章+第四节+配合物与超分子高二化学人教版(2019)选择性必修2
- 高速铁路客运乘务的毕业四篇
- 生理学基础(第4版)第十一章 内分泌电子课件 中职 电子教案
- 换热器的传热系数K
- GB/T 20221-2006无压埋地排污、排水用硬聚氯乙烯(PVC-U)管材
- GA/T 1922-2021法庭科学疑似毒品中8种芬太尼类物质检验气相色谱和气相色谱-质谱法
- 五年级道德与法治上册全册知识点考点归纳整理及期末
评论
0/150
提交评论