2024人工智能AI技术教程_第1页
2024人工智能AI技术教程_第2页
2024人工智能AI技术教程_第3页
2024人工智能AI技术教程_第4页
2024人工智能AI技术教程_第5页
已阅读5页,还剩47页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

2024人工智能AI技术教程课程讲义名称备注1课程介绍Overviewand

system/AI

basics2人工智能系统概述System

perspective

of

System

for

AISystem

for

AI:

a

historic

view;

Fundamentals

of

neural

networks;Fundamentals

of

System

for

AI3深度神经网络计算框架基础Computation

frameworks

for

DNNBackprop

and

AD,

Tensor,

DAG,

Execution

graphPapers

and

systems:

PyTorch,

TensorFlow4矩阵运算与计算机体系结构Computer

architecture

for

Matrix

computationMatrix

computation,

CPU/SIMD,

GPGPU,

ASIC/TPUPapers

and

systems:

Blas,

TPU5分布式训练算法Distributed

training

algorithmsData

parallelism,

model

parallelism,

distributed

SGDPapers

and

systems:6分布式训练系统Distributed

training

systemsMPI,

parameter

servers,all-reduce,

RDMAPapers

and

systems:

Horovod7异构计算集群调度与资源管理系统Scheduling

and

resource

management

systemRunning

DNNjob

on

cluster:

container,

resource

allocation,

schedulingPapers

and

systems:

KubeFlow,

OpenPAI,

Gandiva,

HiveD8深度学习推导系统Inference

systemsEfficiency,

latency,

throughput,

and

deployment课程讲义名称备注9计算图编译优化Computation

graph

compilation

and

optimizationIR,

sub-graph

pattern

match,

Matrix

multiplication

and

memoryoptimizationPapers

and

systems:

XLA,

MLIR,

TVM,NNFusion10模型压缩和稀疏化处理Efficiency

via

compression

and

sparsityModel

compression,

SparsityPruning11自动机器学习系统AutoML

systemsHyper

parameter

tuning,

NASPapers

and

systems:

Hyperband,

SMAC,

ENAS,AutoKeras,

NNI12强化学习系统Reinforcement

learning

systemsTheory

of

RL,

systems

for

RLPapers

and

systems:

AC3,

RLlib,

AlphaZero13模型安全与隐私保护Security

and

PrivacyFederated

learning,

security,

privacyPapers

and

systems:

DeepFake14用AI技术优化计算机系统AIfor

systemsAI

for

traditional

systems

problems,

for

system

algorithmsPapers

and

systems:

Learned

Indexes,

Learned

query

path课程讲义名称备注Lab

1

(for

week1,2)框架及工具入门示例A

simple

throughout

end-to-end

AI

example,

from

asystem

perspectiveUnderstand

the

systems

from

debugger

info

andsystem

logsLab

2

(for

week

3)定制一个新的张量运算Customize

operatorsDesign

and

implement

a

customized

operator

(bothforward

and

backward):

in

pythonLab

3

(for

week

4)CUDA实现和优化CUDA

implementationAdd

a

CUDA

implementation

for

thecustomizedoperatorLab

4

(for

week

5,6)AllReduce实现和优化AllReduceImprove

one

of

AllReduce

operators’implementation

onHorovodLab

5

(for

week

7,

8)配置Container来进行云上训练或推理准备Configure

containers

for

customized

training

and

inferenceConfigure

containersLab

6学习使用调度管理系统Scheduling

and

resource

management

systemGet

familiar

with

OpenPAI

or

KubeFlowLab

7分布式训练任务练习Distributed

trainingTry

different

kinds

of

all

reduce

implementationsLab

8自动机器学习系统练习AutoMLSearch

for

a

new

neural

networkNN

structuree

forImage/NLP

tasksLab

9强化学习系统练习RLSystemsConfigure

and

get

familiar

with

one

of

the

followingRL

Systems:

RLlib,

…Deep

Learning深度学习正在改变世界Self-drivingPersonalassistantSurveillance

detectionTranslationMedicaldiagnosticsGameArtImage

recognitionSpeech

recognitionNatural

languageGenerative

modelReinforcement

learningCatDogRaccoonDogcatdoghoney

badger𝑤1𝑤2𝑤3𝑤4𝑤5𝑑error𝑑𝑤5𝑑error𝑑𝑤4𝑑error𝑑𝑤3𝑑error𝑑𝑤2𝑑error𝑑𝑤1ErrorslossRDMA计算能力海量的(标识)数据14M

images深度学习算法的进步语言、框架深度学习+系统的进步:编程语言、优化、计算机体系结构、并行计算以及分布式系统MNISTImageNetWeb

Images60K

samples16M

samplesBillions

of

Images10

categories1000

categoriesOpened

categoriesE.g.,

image

classification

problem1257.73.31.44.71.70.23TEST

ERROR

RATE

(%)LeNet,convolution,max-pooling,softmax,

1998AlexNet,

16.4%ReLU,

Dropout,2012Inception,6.7%Batchnormalization,2015ResNet,3.57%Residual

way,2015EfficientNet,3.1%NAS2019Image

recognitionSpeech

recognitionNatural

languageReinforcement

learning19602019CPUMoore’s

law108x1970

19801990

20002010ENIAC5

Kops~500

GopsXeon

E5DedicatedHardware105xGPUTPUTPUv3360

TopsV100TPUv1125

Tops90

Tops?Performance(Op/Sec)Deep

learning

frameworksMxNetTensorFlowCNTKPyTorchLanguage

FrontendSwift

for

TensorFlowCompiler

BackendTVMTensorFlow

XLACustom

purposemachine

learningalgorithmsTheanoDisBeliefCaffeAlgebra

&linear

libsCPUGPUDense

matmul

engineGPUFPGASpecial

AI

acceleratorsTPUGraphCoreOther

ASICsAI

frameworkDense

matmulengineDeep

learningframeworksprovide

easierways

to

leveragevarious

librariesCustom

purposemachine

learningalgorithmsTheanoDisBeliefCaffeAlgebra

&linear

libsCPUGPUA

Full-Featured

Programming

Language

forML:

Expressive

and

flexibleControl

flow,

recursion,

sparsityPowerful

Compiler

Infrastructure:Code

optimization,

sparsity

optimization,hardware

targetingMachine

Learning

Language

andCompilerSIMD

MIMDSparsity

SupportControl

Flowand

DynamicityAssociated

MemoryScalable

Network

Stack

(RDMA,

IB,

NVLink)Hardware

APIs

(GPU,

CPU,

FPGA,

ASIC)Resource

Management/SchedulerExperienceFrameworksArchitecture(single

node

and

Cloud)Deep

Learning

Runtime:Optimizer,

Planner,

ExecutorRuntimeEnd-to-End

AI

User

ExperiencesModel,

Algorithm,

Pipeline,

Experiment,

Tool,Life

CycleManagementProgramming

InterfacesComputation

graph,

(auto)

Gradient

calculationIR,

Compiler

infrastructureclass

3class

4class

5class

6class

7class

8更广泛的AI系统生态机器学习新模式(RL)自动机器学习(AutoML)安全与隐私模型推导、压缩与优化深度学习算法和框架广泛用途的高效新型通用AI算法多种深度学习框架的支持与进化深度神经网络编译架构及优化核心系统软硬件深度学习任务运行和优化环境通用资源管理和调度系统新型硬件及相关高性能网络和计算栈class

12class

11class

13class

10(2)开始训练(1)定义网络结构Fullyconnected 通常用作分类问题的最后几层Convolutionalneural

network 通常用作图像、语音等Locality强的数据Recurrentneural

network 通常用作序列及结构化的数据,比如文本信息、知识图Transformerneural

network 通常用作序列数据,比如文本信息#

A

recursive

TreeBank

model

in

a

dozen

lines

of

JPL

code#

Walk

the

tree,

accumulating

embedding

vecs#

Word

embedding

model

is

used

at

the

leaf

node

to

map

word#

index

into

high-dimensional

semantic

word

representation.#

Map

tree

embedding

to

sentiment#

Getsemantic

representations

forleft

and

right

children.#

A

composition

function

is

used

to

learn

semantic#

representation

for

phrase

at

the

internal

node.更多样化的结构更强大的建模能力更复杂的依赖关系更细粒度的计算模式Graph

definition

(IR)x

*w

b+

yFront-endLanguage

Binding:

Python,

Lua,

R,

C++OptimizationBatching,

Cache,

OverlapExecution

RuntimeCPU,

GPU,

RDMA

devicesTensorFlowx

yz*a+bΣcData-Flow

Graph

(DFG)as

Intermediate

Representation𝛻b𝛻a𝛻x𝛻y𝛻z+𝐠*𝐠TensorFlowx

yz*a+bΣ

Σ𝐠cAdd

gradient

backpropagation

to

Data-FlowGraph

(DFG)𝛻b𝛻a𝛻z+𝐠*𝐠xy

z

𝛻x

𝛻y*a+bΣ

Σ𝐠cCPU

codeGPU

code𝛻b𝛻a𝛻z+𝐠*𝐠xy

z

𝛻x

𝛻y*a+bΣ

Σ𝐠c......1OperatorsExperienceFrameworksArchitectureIDEProgramming

with:

VSCode,

Jupiter

NotebookLanguageIntegrated

with

mainstream

PL:

PyTorch

and

TensorFlow

inside

PythonCompilerIntermediate

representationCompilationOptimizationBasic

data

structure:

TensorLexical

analysis:

TokenUser

controlled:

mini-batchBasic

computation:

DAGParsing:

ASTData

parallelism

and

model

parallelismAdvance

features:

control

flowSemantic

analysis:Symbolic

ADLoop

nets

analysis:

pipeline

parallelism,control

flowGeneral

IRs:

MLIRCode

optimizationData

flow

analysis:

CSP,

Arithmetic,

FusionCode

generationHardware

dependent

optimizations:matrix

computation,

layoutResource

allocation

and

scheduler:memory,

recomputation,RuntimesSingle

node:

CuDNNMultimode:

Parameter

servers,

All

reducerComputation

cluster

resource

management

and

job

schedulerHardwareHardware

accelerators:CPU/GPU/ASIC/FPGANetworkaccelerators:

RDMA/IB/NVLinkDeep

learning

frameworksMxNetTensorFlowCNTKPyTorchLanguage

FrontendSwift

for

TensorFlowCompiler

BackendTVMTensorFlow

XLAAI

Framework

Densematmul

engineGPUFPGASpecial

AI

acceleratorsTPUGraphCoreOther

ASICsimport

"tensorflow/core/framework/to";import

"tensorflow/core/framework/op_to";import

"tensorflow/core/framework/tensor_toAFull-Featured

Programming

Languagefor

ML:

Expressive

and

flexibleControl

flow,

recursion,

sparsityPowerful

Compiler

Infrastructure:Code

optimization,

sparsity

optimization,hardwaretargetingMachine

Learning

Language

andCompilerSIMD

MIMDSparsity

SupportControl

Flowand

DynamicityAssociated

Memory//

Syntactically

similar

to

LLVM:func

@testFunction(%arg0:

i32){%x

=

call

@thingToCall(%arg0)

:

(i32)->

i32br

^bb1^bb1:%y

=

addi

%x,

%x:i32return

%y

:

i32}深度学习高度依赖数据规模和模型规模提高训练速度可以加快深度学习模型的开发速度大规模部署深度学习模型需要更快和更高效的推演速度Inference

performance

Serving

latency8

layers1.4

GFLOP16%

Error2012AlexNetImage152

layers22.6

GFLOP3.5%

Error2015ResNetSpeech80

GFLOP7,000

hrs

of

Data8%

Error2014Deep

Speech

1465

GFLOP12,000

hrs

of

Data5%

Error2015Deep

Speech

2Different

architectures:

CNN,RNN,

Transformer,

…High

computation

resourcerequirements:

model

size,

…Different

goals:

latency,throughput,

accuracy,

…Transparently

apply

over

heterogeneous

hardware

environmentScale-out Local

Efficiency Memory

EffectivenessBe

transparent

to

various

user

requirements系统、算法和硬件必须相互结合

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论