Serengeti虚拟化你的大数据应用(VMWare)课件_第1页
Serengeti虚拟化你的大数据应用(VMWare)课件_第2页
Serengeti虚拟化你的大数据应用(VMWare)课件_第3页
Serengeti虚拟化你的大数据应用(VMWare)课件_第4页
Serengeti虚拟化你的大数据应用(VMWare)课件_第5页
已阅读5页,还剩77页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

©

2009

VMware

Inc.

All

rights

reservedSerengeti

-

虚拟化你的大数据应用蔺永华Vmware,

Inc.©2009VMwareInc.Allrights1Agenda•

Today’s

big

data

system•

Why

virtualize

hadoop?•

Serengeti

introduction•

Common

questions

about

virtualization•

Serengeti

solution•

Deep

insight

into

Serengeti•

Summary•

Q&AAgenda•Today’sbigdatasyste2Today’s

Big

Data

System:ETLUnstructured

Data

(HDFS)

Real

TimeStructured

DatabaseBig

SQLData

Parallel

BatchProcessingReal

Time

Streams

Real-Time

Processing

(s4,storm)AnalyticsToday’sBigDataSystem:ETLUns3Agenda•

Today’s

big

data

system•

Why

virtualize

hadoop?•

Serengeti

introduction•

Common

questions

about

virtualization•

Serengeti

solution•

Deep

insight

into

Serengeti•

Summary•

Q&AAgenda•Today’sbigdatasyste4Challenges

To

Use

Hadoop

in

physical

infrastructureDeployment•

Difficult

to

deploy,

cost

several

people

for

several

days

even

months•

Difficult

to

tune

cluster

performanceLow

Efficiency•

Hadoop

clusters

are

typically

not

100%

utilized

across

all

hardware

resources.•

Difficult

to

share

resources

safely

between

different

workloadSingle

Point

of

Failure•

Single

point

of

failure

for

Name

Node

and

Job

tracker•

No

HA

for

Hive,

HCatalog,

etc.ChallengesToUseHadoopinph5Why

Virtualize

Hadoop?

-

Get

your

Hadoop

cluster

in

minutes

1/1000humanefforts,

LeastHadoopoperation

knowledgeFullyautomated

process,10

minutesto

get

aHadoop/HBaseclusterfromscratch

Server

preparation

OS

installation

Automateby

Serengeti

on

vSpherewith

best

practice

Network

Configuration

Hadoop

Installation

and

ConfigurationManual

process,

costdaysWhyVirtualizeHadoop?-Gety6Why

Virtualize

Hadoop?

-

Consolidate

sprawling

clustersClustersshareserverswithstrongisolation

Single

Hardware

Infrastructure

Unified

operations

Optimize

Shared

Resources

=

higher

utilization

Elastic

resources

=

faster

on-demand

accessHadoop

DevHadoop

ProdHBase

ClusterSprawlingSingle

purpose

clusters

for

variousbusiness

applications

lead

to

clustersprawl.Cluster

Consolidation

SimplifyFinanceHadoopVirtualization

PlatformHadoop

DevHadoop

ProdHBase...

PortalHadoop

PortalHadoop30%CAPEXDownWhyVirtualizeHadoop?-Conso750%+

resourcesaresittingidlewhilehighpriorityjob

isburningup

its

cluster.Utilizeall

resourcesfrompool

on

demand.

Dynamic

elasticscalingonshared

resourcepoolWhy

Virtualize

Hadoop?

–Utilize

all

your

resources

to

solve

the

priority

problem

3X

fasterto

getanalyticresults50%+resourcesaresittingUtiliz8vSphere

High

Availability

(HA)

-

protection

against

unplanned

downtimeOverview

Protection

against

host

and

VM

failures

Automatic

failure

detection

(host,

guest

OS)

Automatic

virtual

machine

restart

in

minutes,

on

any

available

host

in

cluster

OS

and

application-independent,does

not

require

complex

configuration

changesvSphereHighAvailability(HA)9(Coordination)ZookeeprManagement

ServerHigh

Availability

for

the

Hadoop

Stack(Hadoop

Distributed

File

System)HBase

(Key-Valuestore)

HDFSMapReduce

(Job

Scheduling/Execution

System)Pig

(DataFlow)HiveBI

ReportingETLToolsRDBMSJobtracker

Namenode(SQL)

Hive

MetaDB

HCatalogHcatalog

MDBServer(Coordination)ZookeeprManageme10X

XHA

HAApp

OSApp

App

OS

OSApp

OSApp

OSApp

OSApp

OSVMwareESX

XVMwareESX•

Zero

downtime,

zero

data

loss

failover

for

all

virtual

machines

in

case

of

hardware

failures•

Integrated

with

VMware

HA/DRS•

No

complex

clustering

or

specialized

hardware

required•

Single

common

mechanism

for

all

applications

and

operatingFTvSphere

Fault

Tolerance

provides

continuous

protection

Overview

Single

identical

VMs

running

in

lockstep

on

separate

hosts

systemsZerodowntimeforNameNode,JobTrackerandothercomponentsin

HadoopclustersXXHAHAAppAppA11Agenda•

Today’s

big

data

system•

Why

virtualize

hadoop?•

Serengeti

introduction•

Common

questions

about

virtualization•

Serengeti

solution•

Deep

insight

into

Serengeti•

Summary•

Q&AAgenda•Today’sbigdatasyste12Easy

and

rapid

deployment

and

managementOpen

sourceprojectlaunched

in

June

2012,

0.8

is

released

at

Apr.and

willrelease0.9

at

Jun.Toolkitthat

leveragevirtualizationto

simplifyHadoop

deploymentand

operations

Deploy

a

cluster

in

10

Minutes

fully

automated

Customize

Hadoop

and

HBase

cluster

Automated

cluster

operationCome

with

eco-system

componentsSupport

all

popular

Hadoop

DistributionsSerengetiEasyandrapiddeploymentand13Demo:

10

minutes

to

a

Hadoop

cluster

with

SerengetiDemo:10minutestoaHadoopc14Agenda•

Today’s

big

data

system•

Why

virtualize

hadoop?•

Serengeti

introduction•

Common

questions

about

virtualization•

Serengeti

solution•

Deep

insight

into

Serengeti•

Summary•

Q&AAgenda•Today’sbigdatasyste15Common

questions

about

virtualization

Local

Disk•••••

Can

local

disk

be

used

in

virtualization

environment?Flexibilityand

Scalability

How

to

flexible

schedule

resources

between

clusters

and

different

applications

as

mentioned

above?Data

stability

In

virtual

environment,

how

can

we

distribute

data

across

host

and

rack?Data

locality

Hadoop

will

schedule

compute

tasks

near

by

the

data,

to

reduce

network

IO

for

data

R/W.

Can

virtual

environment

get

the

same

result?Performance

How

about

the

performance

in

virtual

environment?Commonquestionsaboutvirtual16Agenda•

Today’s

big

data

system•

Why

virtualize

hadoop?•

Serengeti

introduction•

Common

questions

about

virtualization•

Serengeti

solution•

Deep

insight

into

Serengeti•

Summary•

Q&AAgenda•Today’sbigdatasyste17Can

I

use

local

diskeasily?CanIuselocaldiskeasily?18Other

VMOther

VMOther

VMOther

VMOther

VMOther

VMOther

VMOther

VMHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopSerengetiExtend

Virtual

StorageArchitectureto

IncludeLocalDiskShared

Storage:SAN

or

NAS

Easy

to

provision

Automated

cluster

rebalancingHybrid

Storage

SAN

for

boot

images,

other

workloads

Local

disk

for

Hadoop

&

HDFSHostHostHostHostHostHostOtherVMOtherVMOtherVMOther19How

to

flexiblescalein/scaleoutHow

to

flexiblescheduleresourcesbetween

clustersanddifferentapplications?Howtoflexiblescalein/scaleou20-ComputeCurrentHadoop:T1T2VMVMVMVM

Combined

Storage/Com

puteHadoopinVM-

*

VM

lifecycle

determined

by

Datanode-

*

Limited

elasticityVM

Storage

SeparateStorageVM

Storage

SeparateComputeClusters-

*

Separate

compute

-

fromdata-

*

Remove

elasticconstrain-

by

Datanode-

*

Elastic

compute-

*

Raise

utilization-*

Separate

virtual

compute*

Compute

clusterpertenant*

Stronger

VM-grade

securityand

resourceisolationEvolution

of

Hadoop

on

VMs

Data/Compute

separation

Slave

Node-ComputeCurrentT1T2VMVMVMVM Co21Serengeti

Node

Scale

Out

/

Scale

InNameNode

Host

DHostJobTrackerCCCC

DHostCCC

C

DHostCCC

C

DHostCCC

CSerengetiNodeScaleOut/Sca22Serengeti

Ballooning

Enhancement

for

Java

ApplicationJVMGuest

OSHostJVMGuest

OSHostGuest

OS

JVMSerengetiBallooningEnhanceme23How

to

keep

data

stability?How

to

access

data

locallyif

data

node

and

computenodeare

located

in

differentVM?Howtokeepdatastability?How24DatanodeandtasktrackercombinedclusterDataComputeseparatedclustermaster

Hostworker

Hostworker

Hostmaster

HostData

node

HostTasktrackerData

node

HostTasktrackerTasktrackerTasktracker

Data

node

HostComputeonly

cluster1Computeonly

cluster2HDFS

cluster

Compute

OnlyclusterRack1Rack2Rack1Distributed

and

Data/Compute

Associated

VM

Placement

Rack2

Rack1Job

trackerJob

trackerName

node

Host

Rack2TasktrackerTasktracker

Data

node

HostDatanodeandtasktrackercombined25HadoopTopologyChangesfor

VirtualizationHadoop

Topology

Awareness

Serengeti

HVE

/D1D2R1R2N1H1H2H3H4H5H6H7H8H9H10H11H12R3R43/D1D2R1R2H1H2H3H4H5H6H7H8H9H10H11H12R3R423N2N3N4N5N6N7N81

12

321

1234HadoopTopologyChangesforVirtu26HADOOP-8468(UmbrellaJIRA)HADOOP-8469HDFS-3495HDFS-3498HadoopNetworkTopologyExtension

Hadoop

Virtualization

Extensions

for

Topology

HVE

TaskScheduling

PolicyExtension

BalancerPolicy

ExtensionReplicaChoosing

PolicyExtensionReplicaPlacement

PolicyExtension

ReplicaRemovalPolicyExtensionHDFSMapReduceHadoop

CommonMAPREDUCE-4310MAPREDUCE-4309HADOOP-8470HADOOP-8472HADOOP-8468(UmbrellaJIRA)Hadoo27Is

there

significantperformancedegradationin

virtualizationenvironment?Is

there

any

performancedata?Istheresignificantperformanc28Virtualized

Hadoop

PerformanceVirtualizedHadoopPerformance29Native

versus

Virtual

Platforms,

32

hosts,

16

disks/hostNativeversusVirtualPlatform30Agenda•

Today’s

big

data

system•

Why

virtualize

hadoop?•

Serengeti

introduction•

Common

questions

about

virtualization•

Serengeti

solution•

Deep

insight

into

Serengeti•

Summary•

Q&AAgenda•Today’sbigdatasyste31RestAPISpringBatchUpdateMetaDBstepVMPlacementcalculationVMProvisionstepSoftwareMgmtstepUI

Client

Flex

UISerengeti

architecture

diagram

CLI

Client

Spring

Shell

Serengeti

Web

ServiceHibernate/

DAOvPostgresVC

adapter

Ironfan

service

ThriftService

ProgressIronfan

report

Chef

serverRestAPICookbookVHMstepRabbitMQVM

runtime

ManagerHostHostHostHostHostVirtualization

PlatformHadoop

NodeChefClient

HA

kitHadoop

NodeHadoop

NodePackagerepositoryvCenterRestAPISpringBatchUpdateVMVMSo32Customizing

your

Hadoop/HBase

cluster

with

Serengeti

Choiceof

distros

Storageconfiguration

Choice

of

shared

storage

or

Local

disk

Resourceconfiguration

High

availabilityoption

#

of

nodes…

"distro":"apache",

"groups":[

{

"name":"master",

"roles":[

"hadoop_namenode",

"hadoop_jobtracker”],

"storage":

{

"type":

"SHARED",

"sizeGB":

20},

"instance_type":MEDIUM,

"instance_num":1,

"ha":true},

{"name":"worker",

"roles":[

"hadoop_datanode",

"hadoop_tasktracker"

],

"instance_type":SMALL,

"instance_num":5,

"ha":false

…CustomizingyourHadoop/HBase33One

command

to

scale

out

your

cluster

with

Serengeti>cluster

resize

–name

<clustername>

--nodegroup

worker

–instanceNum

<#>Onecommandtoscaleoutyour34Configure/reconfigure

Hadoop

with

ease

by

SerengetiModifyHadoop

clusterconfigurationfromSerengeti•

Use

the

“configuration”

section

of

the

json

spec

file•

Specify

Hadoop

attributes

in

core-site.xml,

hdfs-site.xml,

mapred-site.xml,hadoop-env.sh,

perties•

Apply

new

Hadoop

configuration

using

the

edited

spec

file"configuration":{"hadoop":{"core-site.xml":

{//

check

for

all

settings

at

/common/docs/r1.0.0/core-default.html},"hdfs-site.xml":{//

check

for

all

settings

at

/common/docs/r1.0.0/hdfs-default.html},"mapred-site.xml":{//

check

for

all

settings

at

/common/docs/r1.0.0/mapred-default.html"io.sort.mb":

"300"},"hadoop-env.sh":{//

"HADOOP_HEAPSIZE":"",//

"HADOOP_NAMENODE_OPTS":"",//

"HADOOP_DATANODE_OPTS":"",…>

cluster

config

--name

myHadoop

--specFile

/home/serengeti/myHadoop.jsonConfigure/reconfigureHadoopw35Freedom

of

Choice

and

Open

SourceCommunity

ProjectsDistributions•

Flexibilityto

choosefrom

major

distributions

cluster

create

--name

myHadoop

--distro

apache•

Supportfor

multipleprojects•

Open

architectureto

welcomeindustryparticipation•

ContributingHadoop

VirtualizationExtensions(HVE)to

open

sourcecommunityFreedomofChoiceandOpenSou36HDFS2

with

Namenode

Federation

and

HADeploy

CDH4

Hadoop

cluster

Name

Node

Federation

Name

Node

HA

MapReduce

v1•

HBase,

Pig,

Hive,

and

Hive

ServerCDH4

configurationsScale

outElasticityJobTracker

HA/FTActiveNamenodeStandby

NamenodeActiveNamenodeStandby

NamenodeZookeeper

GroupZKZKZK

CoordinateNamenodeGroup1Coordinate

NamenodeGroup2Quorum-basedmetadatastore

Data

NodesDatanode

Datanode

Datanode

Datanode

Datanode

Datanode

Datanode

DatanodeBlockreportBlockreportHDFS2withNamenodeFederation37Proactive

monitoring

and

tuning

with

VCOPsProactivelymonitoring

through

VCOPsGain

comprehensivevisibilityEliminatemanual

processeswith

intelligentautomationProactivelymanage

operationsProactivemonitoringandtunin38Agenda•

Today’s

big

data

system•

Why

virtualize

hadoop?•

Serengeti

introduction•

Common

questions

about

virtualization•

Serengeti

solution•

Deep

insight

into

Serengeti•

Summary•

Q&AAgenda•Today’sbigdatasyste39VMWarebringsAgility,

Efficiency,

and

Elasticityto

Big

DataElasticity

Enable

full

elasticity

through

separation

of

Data

and

Compute

Scale

In/Out

Hadoop

with

Resource

ConstrainAgility

Deploy,

configure

and

monitor

Hadoop

clusters

on

the

fly

Dynamic

reconfiguring

of

Hadoop

to

meet

changing

business

demandsEfficiency

Consolidate

Hadoop

to

achieve

higher

utilization

Pool

resources

to

allow

for

increased

performance

and

priority

job

processingVMWarebringsAgility,Efficienc40Serengeti

ResourcesDownload

and

try

Serengeti

VMware

Hadoop

site

/hadoopSerengetiResourcesVMwareHado41©

2009

VMware

Inc.

All

rights

reservedSerengeti

-

虚拟化你的大数据应用蔺永华Vmware,

Inc.©2009VMwareInc.Allrights42Agenda•

Today’s

big

data

system•

Why

virtualize

hadoop?•

Serengeti

introduction•

Common

questions

about

virtualization•

Serengeti

solution•

Deep

insight

into

Serengeti•

Summary•

Q&AAgenda•Today’sbigdatasyste43Today’s

Big

Data

System:ETLUnstructured

Data

(HDFS)

Real

TimeStructured

DatabaseBig

SQLData

Parallel

BatchProcessingReal

Time

Streams

Real-Time

Processing

(s4,storm)AnalyticsToday’sBigDataSystem:ETLUns44Agenda•

Today’s

big

data

system•

Why

virtualize

hadoop?•

Serengeti

introduction•

Common

questions

about

virtualization•

Serengeti

solution•

Deep

insight

into

Serengeti•

Summary•

Q&AAgenda•Today’sbigdatasyste45Challenges

To

Use

Hadoop

in

physical

infrastructureDeployment•

Difficult

to

deploy,

cost

several

people

for

several

days

even

months•

Difficult

to

tune

cluster

performanceLow

Efficiency•

Hadoop

clusters

are

typically

not

100%

utilized

across

all

hardware

resources.•

Difficult

to

share

resources

safely

between

different

workloadSingle

Point

of

Failure•

Single

point

of

failure

for

Name

Node

and

Job

tracker•

No

HA

for

Hive,

HCatalog,

etc.ChallengesToUseHadoopinph46Why

Virtualize

Hadoop?

-

Get

your

Hadoop

cluster

in

minutes

1/1000humanefforts,

LeastHadoopoperation

knowledgeFullyautomated

process,10

minutesto

get

aHadoop/HBaseclusterfromscratch

Server

preparation

OS

installation

Automateby

Serengeti

on

vSpherewith

best

practice

Network

Configuration

Hadoop

Installation

and

ConfigurationManual

process,

costdaysWhyVirtualizeHadoop?-Gety47Why

Virtualize

Hadoop?

-

Consolidate

sprawling

clustersClustersshareserverswithstrongisolation

Single

Hardware

Infrastructure

Unified

operations

Optimize

Shared

Resources

=

higher

utilization

Elastic

resources

=

faster

on-demand

accessHadoop

DevHadoop

ProdHBase

ClusterSprawlingSingle

purpose

clusters

for

variousbusiness

applications

lead

to

clustersprawl.Cluster

Consolidation

SimplifyFinanceHadoopVirtualization

PlatformHadoop

DevHadoop

ProdHBase...

PortalHadoop

PortalHadoop30%CAPEXDownWhyVirtualizeHadoop?-Conso4850%+

resourcesaresittingidlewhilehighpriorityjob

isburningup

its

cluster.Utilizeall

resourcesfrompool

on

demand.

Dynamic

elasticscalingonshared

resourcepoolWhy

Virtualize

Hadoop?

–Utilize

all

your

resources

to

solve

the

priority

problem

3X

fasterto

getanalyticresults50%+resourcesaresittingUtiliz49vSphere

High

Availability

(HA)

-

protection

against

unplanned

downtimeOverview

Protection

against

host

and

VM

failures

Automatic

failure

detection

(host,

guest

OS)

Automatic

virtual

machine

restart

in

minutes,

on

any

available

host

in

cluster

OS

and

application-independent,does

not

require

complex

configuration

changesvSphereHighAvailability(HA)50(Coordination)ZookeeprManagement

ServerHigh

Availability

for

the

Hadoop

Stack(Hadoop

Distributed

File

System)HBase

(Key-Valuestore)

HDFSMapReduce

(Job

Scheduling/Execution

System)Pig

(DataFlow)HiveBI

ReportingETLToolsRDBMSJobtracker

Namenode(SQL)

Hive

MetaDB

HCatalogHcatalog

MDBServer(Coordination)ZookeeprManageme51X

XHA

HAApp

OSApp

App

OS

OSApp

OSApp

OSApp

OSApp

OSVMwareESX

XVMwareESX•

Zero

downtime,

zero

data

loss

failover

for

all

virtual

machines

in

case

of

hardware

failures•

Integrated

with

VMware

HA/DRS•

No

complex

clustering

or

specialized

hardware

required•

Single

common

mechanism

for

all

applications

and

operatingFTvSphere

Fault

Tolerance

provides

continuous

protection

Overview

Single

identical

VMs

running

in

lockstep

on

separate

hosts

systemsZerodowntimeforNameNode,JobTrackerandothercomponentsin

HadoopclustersXXHAHAAppAppA52Agenda•

Today’s

big

data

system•

Why

virtualize

hadoop?•

Serengeti

introduction•

Common

questions

about

virtualization•

Serengeti

solution•

Deep

insight

into

Serengeti•

Summary•

Q&AAgenda•Today’sbigdatasyste53Easy

and

rapid

deployment

and

managementOpen

sourceprojectlaunched

in

June

2012,

0.8

is

released

at

Apr.and

willrelease0.9

at

Jun.Toolkitthat

leveragevirtualizationto

simplifyHadoop

deploymentand

operations

Deploy

a

cluster

in

10

Minutes

fully

automated

Customize

Hadoop

and

HBase

cluster

Automated

cluster

operationCome

with

eco-system

componentsSupport

all

popular

Hadoop

DistributionsSerengetiEasyandrapiddeploymentand54Demo:

10

minutes

to

a

Hadoop

cluster

with

SerengetiDemo:10minutestoaHadoopc55Agenda•

Today’s

big

data

system•

Why

virtualize

hadoop?•

Serengeti

introduction•

Common

questions

about

virtualization•

Serengeti

solution•

Deep

insight

into

Serengeti•

Summary•

Q&AAgenda•Today’sbigdatasyste56Common

questions

about

virtualization

Local

Disk•••••

Can

local

disk

be

used

in

virtualization

environment?Flexibilityand

Scalability

How

to

flexible

schedule

resources

between

clusters

and

different

applications

as

mentioned

above?Data

stability

In

virtual

environment,

how

can

we

distribute

data

across

host

and

rack?Data

locality

Hadoop

will

schedule

compute

tasks

near

by

the

data,

to

reduce

network

IO

for

data

R/W.

Can

virtual

environment

get

the

same

result?Performance

How

about

the

performance

in

virtual

environment?Commonquestionsaboutvirtual57Agenda•

Today’s

big

data

system•

Why

virtualize

hadoop?•

Serengeti

introduction•

Common

questions

about

virtualization•

Serengeti

solution•

Deep

insight

into

Serengeti•

Summary•

Q&AAgenda•Today’sbigdatasyste58Can

I

use

local

diskeasily?CanIuselocaldiskeasily?59Other

VMOther

VMOther

VMOther

VMOther

VMOther

VMOther

VMOther

VMHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopSerengetiExtend

Virtual

StorageArchitectureto

IncludeLocalDiskShared

Storage:SAN

or

NAS

Easy

to

provision

Automated

cluster

rebalancingHybrid

Storage

SAN

for

boot

images,

other

workloads

Local

disk

for

Hadoop

&

HDFSHostHostHostHostHostHostOtherVMOtherVMOtherVMOther60How

to

flexiblescalein/scaleoutHow

to

flexiblescheduleresourcesbetween

clustersanddifferentapplications?Howtoflexiblescalein/scaleou61-ComputeCurrentHadoop:T1T2VMVMVMVM

Combined

Storage/Com

puteHadoopinVM-

*

VM

lifecycle

determined

by

Datanode-

*

Limited

elasticityVM

Storage

SeparateStorageVM

Storage

SeparateComputeClusters-

*

Separate

compute

-

fromdata-

*

Remove

elasticconstrain-

by

Datanode-

*

Elastic

compute-

*

Raise

utilization-*

Separate

virtual

compute*

Compute

clusterpertenant*

Stronger

VM-grade

securityand

resourceisolationEvolution

of

Hadoop

on

VMs

Data/Compute

separation

Slave

Node-ComputeCurrentT1T2VMVMVMVM Co62Serengeti

Node

Scale

Out

/

Scale

InNameNode

Host

DHostJobTrackerCCCC

DHostCCC

C

DHostCCC

C

DHostCCC

CSerengetiNodeScaleOut/Sca63Serengeti

Ballooning

Enhancement

for

Java

ApplicationJVMGuest

OSHostJVMGuest

OSHostGuest

OS

JVMSerengetiBallooningEnhanceme64How

to

keep

data

stability?How

to

access

data

locallyif

data

node

and

computenodeare

located

in

differentVM?Howtokeepdatastability?How65DatanodeandtasktrackercombinedclusterDataComputeseparatedclustermaster

Hostworker

Hostworker

Hostmaster

HostData

node

HostTasktrackerData

node

HostTasktrackerTasktrackerTasktracker

Data

node

HostComputeonly

cluster1Computeonly

cluster2HDFS

cluster

Compute

OnlyclusterRack1Rack2Rack1Distributed

and

Data/Compute

Associated

VM

Placement

Rack2

Rack1Job

trackerJob

trackerName

node

Host

Rack2TasktrackerTasktracker

Data

node

HostDatanodeandtasktrackercombined66HadoopTopologyChangesfor

VirtualizationHadoop

Topology

Awareness

Serengeti

HVE

/D1D2R1R2N1H1H2H3H4H5H6H7H8H9H10H11H12R3R43/D1D2R1R2H1H2H3H4H5H6H7H8H9H10H11H12R3R423N2N3N4N5N6N7N81

12

321

1234HadoopTopologyChangesforVirtu67HADOOP-8468(UmbrellaJIRA)HADOOP-8469HDFS-3495HDFS-3498HadoopNetworkTopologyExtension

Hadoop

Virtualization

Extensions

for

Topology

HVE

TaskScheduling

PolicyExtension

BalancerPolicy

ExtensionReplicaChoosing

PolicyExtensionReplicaPlacement

PolicyExtension

ReplicaRemovalPolicyExtensionHDFSMapReduceHadoop

CommonMAPREDUCE-4310MAPREDUCE-4309HADOOP-8470HADOOP-8472HADOOP-8468(UmbrellaJIRA)Hadoo68Is

there

significantperformancedegradationin

virtualizationenvironment?Is

there

any

performancedata?Istheresignificantperformanc69Virtualized

Hadoop

PerformanceVirtualizedHadoopPerformance70Native

versus

Virtual

Platforms,

32

hosts,

16

disks/hostNativeversusVirtualPlatform71Agenda•

Today’s

big

data

system•

Why

virtualize

hadoop?•

Serengeti

introduction•

Common

questions

about

virtualization•

Serengeti

solution•

Deep

insight

into

Serengeti•

Summary•

Q&AAgenda•Today’sbigdatasyste72RestAPISpringBatchUpdateMetaDBstepVMPlacementcalculationVMProvisionstepSoftwareMgmtstepUI

Client

Flex

UISerengeti

architecture

diagram

CLI

Client

Spring

Shell

Serengeti

Web

ServiceHibernate/

DAOvPostgresVC

adapter

Ironfan

service

ThriftService

ProgressIronfan

report

Chef

serverRestAPICookbookVHMstepRabbitMQVM

runtime

ManagerHostHostHostHostHostVirtualization

PlatformHadoop

NodeChefClient

HA

kitHadoop

NodeHadoop

NodePackagerepositoryvCenterRestAPISpringBatchUpdateVMVMSo73Customizing

your

Hadoop/HBase

cluster

with

Serengeti

Choiceof

distros

Storageconfiguration

Choice

of

shared

storage

or

Local

disk

Resourceconfiguration

High

availabilityoption

#

of

nodes…

"distro":"apache",

"groups":[

{

"name":"master",

"roles":[

"hadoop_namenode",

"hadoop_jobtracker”],

"storage":

{

"type":

"SHARED",

"sizeGB":

20},

"instance_type":MEDIUM,

"instance_num":1,

"ha":true},

{"name":"worker",

"roles":[

"hadoop_datanode",

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论