版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
©
2009
VMware
Inc.
All
rights
reservedSerengeti
-
虚拟化你的大数据应用蔺永华Vmware,
Inc.©2009VMwareInc.Allrights1Agenda•
Today’s
big
data
system•
Why
virtualize
hadoop?•
Serengeti
introduction•
Common
questions
about
virtualization•
Serengeti
solution•
Deep
insight
into
Serengeti•
Summary•
Q&AAgenda•Today’sbigdatasyste2Today’s
Big
Data
System:ETLUnstructured
Data
(HDFS)
Real
TimeStructured
DatabaseBig
SQLData
Parallel
BatchProcessingReal
Time
Streams
Real-Time
Processing
(s4,storm)AnalyticsToday’sBigDataSystem:ETLUns3Agenda•
Today’s
big
data
system•
Why
virtualize
hadoop?•
Serengeti
introduction•
Common
questions
about
virtualization•
Serengeti
solution•
Deep
insight
into
Serengeti•
Summary•
Q&AAgenda•Today’sbigdatasyste4Challenges
To
Use
Hadoop
in
physical
infrastructureDeployment•
Difficult
to
deploy,
cost
several
people
for
several
days
even
months•
Difficult
to
tune
cluster
performanceLow
Efficiency•
Hadoop
clusters
are
typically
not
100%
utilized
across
all
hardware
resources.•
Difficult
to
share
resources
safely
between
different
workloadSingle
Point
of
Failure•
Single
point
of
failure
for
Name
Node
and
Job
tracker•
No
HA
for
Hive,
HCatalog,
etc.ChallengesToUseHadoopinph5Why
Virtualize
Hadoop?
-
Get
your
Hadoop
cluster
in
minutes
1/1000humanefforts,
LeastHadoopoperation
knowledgeFullyautomated
process,10
minutesto
get
aHadoop/HBaseclusterfromscratch
Server
preparation
OS
installation
Automateby
Serengeti
on
vSpherewith
best
practice
Network
Configuration
Hadoop
Installation
and
ConfigurationManual
process,
costdaysWhyVirtualizeHadoop?-Gety6Why
Virtualize
Hadoop?
-
Consolidate
sprawling
clustersClustersshareserverswithstrongisolation
•
Single
Hardware
Infrastructure
•
Unified
operations
Optimize
•
Shared
Resources
=
higher
utilization
•
Elastic
resources
=
faster
on-demand
accessHadoop
DevHadoop
ProdHBase
ClusterSprawlingSingle
purpose
clusters
for
variousbusiness
applications
lead
to
clustersprawl.Cluster
Consolidation
SimplifyFinanceHadoopVirtualization
PlatformHadoop
DevHadoop
ProdHBase...
PortalHadoop
PortalHadoop30%CAPEXDownWhyVirtualizeHadoop?-Conso750%+
resourcesaresittingidlewhilehighpriorityjob
isburningup
its
cluster.Utilizeall
resourcesfrompool
on
demand.
Dynamic
elasticscalingonshared
resourcepoolWhy
Virtualize
Hadoop?
–Utilize
all
your
resources
to
solve
the
priority
problem
3X
fasterto
getanalyticresults50%+resourcesaresittingUtiliz8vSphere
High
Availability
(HA)
-
protection
against
unplanned
downtimeOverview
•
Protection
against
host
and
VM
failures
•
Automatic
failure
detection
(host,
guest
OS)
•
Automatic
virtual
machine
restart
in
minutes,
on
any
available
host
in
cluster
•
OS
and
application-independent,does
not
require
complex
configuration
changesvSphereHighAvailability(HA)9(Coordination)ZookeeprManagement
ServerHigh
Availability
for
the
Hadoop
Stack(Hadoop
Distributed
File
System)HBase
(Key-Valuestore)
HDFSMapReduce
(Job
Scheduling/Execution
System)Pig
(DataFlow)HiveBI
ReportingETLToolsRDBMSJobtracker
Namenode(SQL)
Hive
MetaDB
HCatalogHcatalog
MDBServer(Coordination)ZookeeprManageme10X
XHA
HAApp
OSApp
App
OS
OSApp
OSApp
OSApp
OSApp
OSVMwareESX
XVMwareESX•
Zero
downtime,
zero
data
loss
failover
for
all
virtual
machines
in
case
of
hardware
failures•
Integrated
with
VMware
HA/DRS•
No
complex
clustering
or
specialized
hardware
required•
Single
common
mechanism
for
all
applications
and
operatingFTvSphere
Fault
Tolerance
provides
continuous
protection
Overview
•
Single
identical
VMs
running
in
lockstep
on
separate
hosts
systemsZerodowntimeforNameNode,JobTrackerandothercomponentsin
HadoopclustersXXHAHAAppAppA11Agenda•
Today’s
big
data
system•
Why
virtualize
hadoop?•
Serengeti
introduction•
Common
questions
about
virtualization•
Serengeti
solution•
Deep
insight
into
Serengeti•
Summary•
Q&AAgenda•Today’sbigdatasyste12Easy
and
rapid
deployment
and
managementOpen
sourceprojectlaunched
in
June
2012,
0.8
is
released
at
Apr.and
willrelease0.9
at
Jun.Toolkitthat
leveragevirtualizationto
simplifyHadoop
deploymentand
operations
Deploy
a
cluster
in
10
Minutes
fully
automated
Customize
Hadoop
and
HBase
cluster
Automated
cluster
operationCome
with
eco-system
componentsSupport
all
popular
Hadoop
DistributionsSerengetiEasyandrapiddeploymentand13Demo:
10
minutes
to
a
Hadoop
cluster
with
SerengetiDemo:10minutestoaHadoopc14Agenda•
Today’s
big
data
system•
Why
virtualize
hadoop?•
Serengeti
introduction•
Common
questions
about
virtualization•
Serengeti
solution•
Deep
insight
into
Serengeti•
Summary•
Q&AAgenda•Today’sbigdatasyste15Common
questions
about
virtualization
Local
Disk•••••
Can
local
disk
be
used
in
virtualization
environment?Flexibilityand
Scalability
How
to
flexible
schedule
resources
between
clusters
and
different
applications
as
mentioned
above?Data
stability
In
virtual
environment,
how
can
we
distribute
data
across
host
and
rack?Data
locality
Hadoop
will
schedule
compute
tasks
near
by
the
data,
to
reduce
network
IO
for
data
R/W.
Can
virtual
environment
get
the
same
result?Performance
How
about
the
performance
in
virtual
environment?Commonquestionsaboutvirtual16Agenda•
Today’s
big
data
system•
Why
virtualize
hadoop?•
Serengeti
introduction•
Common
questions
about
virtualization•
Serengeti
solution•
Deep
insight
into
Serengeti•
Summary•
Q&AAgenda•Today’sbigdatasyste17Can
I
use
local
diskeasily?CanIuselocaldiskeasily?18Other
VMOther
VMOther
VMOther
VMOther
VMOther
VMOther
VMOther
VMHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopSerengetiExtend
Virtual
StorageArchitectureto
IncludeLocalDiskShared
Storage:SAN
or
NAS
•
Easy
to
provision
•
Automated
cluster
rebalancingHybrid
Storage
•
SAN
for
boot
images,
other
workloads
•
Local
disk
for
Hadoop
&
HDFSHostHostHostHostHostHostOtherVMOtherVMOtherVMOther19How
to
flexiblescalein/scaleoutHow
to
flexiblescheduleresourcesbetween
clustersanddifferentapplications?Howtoflexiblescalein/scaleou20-ComputeCurrentHadoop:T1T2VMVMVMVM
Combined
Storage/Com
puteHadoopinVM-
*
VM
lifecycle
determined
by
Datanode-
*
Limited
elasticityVM
Storage
SeparateStorageVM
Storage
SeparateComputeClusters-
*
Separate
compute
-
fromdata-
*
Remove
elasticconstrain-
by
Datanode-
*
Elastic
compute-
*
Raise
utilization-*
Separate
virtual
compute*
Compute
clusterpertenant*
Stronger
VM-grade
securityand
resourceisolationEvolution
of
Hadoop
on
VMs
–
Data/Compute
separation
Slave
Node-ComputeCurrentT1T2VMVMVMVM Co21Serengeti
Node
Scale
Out
/
Scale
InNameNode
Host
DHostJobTrackerCCCC
DHostCCC
C
DHostCCC
C
DHostCCC
CSerengetiNodeScaleOut/Sca22Serengeti
Ballooning
Enhancement
for
Java
ApplicationJVMGuest
OSHostJVMGuest
OSHostGuest
OS
JVMSerengetiBallooningEnhanceme23How
to
keep
data
stability?How
to
access
data
locallyif
data
node
and
computenodeare
located
in
differentVM?Howtokeepdatastability?How24DatanodeandtasktrackercombinedclusterDataComputeseparatedclustermaster
Hostworker
Hostworker
Hostmaster
HostData
node
HostTasktrackerData
node
HostTasktrackerTasktrackerTasktracker
Data
node
HostComputeonly
cluster1Computeonly
cluster2HDFS
cluster
Compute
OnlyclusterRack1Rack2Rack1Distributed
and
Data/Compute
Associated
VM
Placement
Rack2
Rack1Job
trackerJob
trackerName
node
Host
Rack2TasktrackerTasktracker
Data
node
HostDatanodeandtasktrackercombined25HadoopTopologyChangesfor
VirtualizationHadoop
Topology
Awareness
–
Serengeti
HVE
/D1D2R1R2N1H1H2H3H4H5H6H7H8H9H10H11H12R3R43/D1D2R1R2H1H2H3H4H5H6H7H8H9H10H11H12R3R423N2N3N4N5N6N7N81
12
321
1234HadoopTopologyChangesforVirtu26HADOOP-8468(UmbrellaJIRA)HADOOP-8469HDFS-3495HDFS-3498HadoopNetworkTopologyExtension
Hadoop
Virtualization
Extensions
for
Topology
HVE
TaskScheduling
PolicyExtension
BalancerPolicy
ExtensionReplicaChoosing
PolicyExtensionReplicaPlacement
PolicyExtension
ReplicaRemovalPolicyExtensionHDFSMapReduceHadoop
CommonMAPREDUCE-4310MAPREDUCE-4309HADOOP-8470HADOOP-8472HADOOP-8468(UmbrellaJIRA)Hadoo27Is
there
significantperformancedegradationin
virtualizationenvironment?Is
there
any
performancedata?Istheresignificantperformanc28Virtualized
Hadoop
PerformanceVirtualizedHadoopPerformance29Native
versus
Virtual
Platforms,
32
hosts,
16
disks/hostNativeversusVirtualPlatform30Agenda•
Today’s
big
data
system•
Why
virtualize
hadoop?•
Serengeti
introduction•
Common
questions
about
virtualization•
Serengeti
solution•
Deep
insight
into
Serengeti•
Summary•
Q&AAgenda•Today’sbigdatasyste31RestAPISpringBatchUpdateMetaDBstepVMPlacementcalculationVMProvisionstepSoftwareMgmtstepUI
Client
Flex
UISerengeti
architecture
diagram
CLI
Client
Spring
Shell
Serengeti
Web
ServiceHibernate/
DAOvPostgresVC
adapter
Ironfan
service
ThriftService
ProgressIronfan
report
Chef
serverRestAPICookbookVHMstepRabbitMQVM
runtime
ManagerHostHostHostHostHostVirtualization
PlatformHadoop
NodeChefClient
HA
kitHadoop
NodeHadoop
NodePackagerepositoryvCenterRestAPISpringBatchUpdateVMVMSo32Customizing
your
Hadoop/HBase
cluster
with
Serengeti
Choiceof
distros
Storageconfiguration
•
Choice
of
shared
storage
or
Local
disk
Resourceconfiguration
High
availabilityoption
#
of
nodes…
"distro":"apache",
"groups":[
{
"name":"master",
"roles":[
"hadoop_namenode",
"hadoop_jobtracker”],
"storage":
{
"type":
"SHARED",
"sizeGB":
20},
"instance_type":MEDIUM,
"instance_num":1,
"ha":true},
{"name":"worker",
"roles":[
"hadoop_datanode",
"hadoop_tasktracker"
],
"instance_type":SMALL,
"instance_num":5,
"ha":false
…CustomizingyourHadoop/HBase33One
command
to
scale
out
your
cluster
with
Serengeti>cluster
resize
–name
<clustername>
--nodegroup
worker
–instanceNum
<#>Onecommandtoscaleoutyour34Configure/reconfigure
Hadoop
with
ease
by
SerengetiModifyHadoop
clusterconfigurationfromSerengeti•
Use
the
“configuration”
section
of
the
json
spec
file•
Specify
Hadoop
attributes
in
core-site.xml,
hdfs-site.xml,
mapred-site.xml,hadoop-env.sh,
perties•
Apply
new
Hadoop
configuration
using
the
edited
spec
file"configuration":{"hadoop":{"core-site.xml":
{//
check
for
all
settings
at
/common/docs/r1.0.0/core-default.html},"hdfs-site.xml":{//
check
for
all
settings
at
/common/docs/r1.0.0/hdfs-default.html},"mapred-site.xml":{//
check
for
all
settings
at
/common/docs/r1.0.0/mapred-default.html"io.sort.mb":
"300"},"hadoop-env.sh":{//
"HADOOP_HEAPSIZE":"",//
"HADOOP_NAMENODE_OPTS":"",//
"HADOOP_DATANODE_OPTS":"",…>
cluster
config
--name
myHadoop
--specFile
/home/serengeti/myHadoop.jsonConfigure/reconfigureHadoopw35Freedom
of
Choice
and
Open
SourceCommunity
ProjectsDistributions•
Flexibilityto
choosefrom
major
distributions
cluster
create
--name
myHadoop
--distro
apache•
Supportfor
multipleprojects•
Open
architectureto
welcomeindustryparticipation•
ContributingHadoop
VirtualizationExtensions(HVE)to
open
sourcecommunityFreedomofChoiceandOpenSou36HDFS2
with
Namenode
Federation
and
HADeploy
CDH4
Hadoop
cluster
•
Name
Node
Federation
•
Name
Node
HA
•
MapReduce
v1•
HBase,
Pig,
Hive,
and
Hive
ServerCDH4
configurationsScale
outElasticityJobTracker
HA/FTActiveNamenodeStandby
NamenodeActiveNamenodeStandby
NamenodeZookeeper
GroupZKZKZK
CoordinateNamenodeGroup1Coordinate
NamenodeGroup2Quorum-basedmetadatastore
Data
NodesDatanode
Datanode
Datanode
Datanode
Datanode
Datanode
Datanode
DatanodeBlockreportBlockreportHDFS2withNamenodeFederation37Proactive
monitoring
and
tuning
with
VCOPsProactivelymonitoring
through
VCOPsGain
comprehensivevisibilityEliminatemanual
processeswith
intelligentautomationProactivelymanage
operationsProactivemonitoringandtunin38Agenda•
Today’s
big
data
system•
Why
virtualize
hadoop?•
Serengeti
introduction•
Common
questions
about
virtualization•
Serengeti
solution•
Deep
insight
into
Serengeti•
Summary•
Q&AAgenda•Today’sbigdatasyste39VMWarebringsAgility,
Efficiency,
and
Elasticityto
Big
DataElasticity
Enable
full
elasticity
through
separation
of
Data
and
Compute
Scale
In/Out
Hadoop
with
Resource
ConstrainAgility
Deploy,
configure
and
monitor
Hadoop
clusters
on
the
fly
Dynamic
reconfiguring
of
Hadoop
to
meet
changing
business
demandsEfficiency
Consolidate
Hadoop
to
achieve
higher
utilization
Pool
resources
to
allow
for
increased
performance
and
priority
job
processingVMWarebringsAgility,Efficienc40Serengeti
ResourcesDownload
and
try
Serengeti
•
VMware
Hadoop
site
•
/hadoopSerengetiResourcesVMwareHado41©
2009
VMware
Inc.
All
rights
reservedSerengeti
-
虚拟化你的大数据应用蔺永华Vmware,
Inc.©2009VMwareInc.Allrights42Agenda•
Today’s
big
data
system•
Why
virtualize
hadoop?•
Serengeti
introduction•
Common
questions
about
virtualization•
Serengeti
solution•
Deep
insight
into
Serengeti•
Summary•
Q&AAgenda•Today’sbigdatasyste43Today’s
Big
Data
System:ETLUnstructured
Data
(HDFS)
Real
TimeStructured
DatabaseBig
SQLData
Parallel
BatchProcessingReal
Time
Streams
Real-Time
Processing
(s4,storm)AnalyticsToday’sBigDataSystem:ETLUns44Agenda•
Today’s
big
data
system•
Why
virtualize
hadoop?•
Serengeti
introduction•
Common
questions
about
virtualization•
Serengeti
solution•
Deep
insight
into
Serengeti•
Summary•
Q&AAgenda•Today’sbigdatasyste45Challenges
To
Use
Hadoop
in
physical
infrastructureDeployment•
Difficult
to
deploy,
cost
several
people
for
several
days
even
months•
Difficult
to
tune
cluster
performanceLow
Efficiency•
Hadoop
clusters
are
typically
not
100%
utilized
across
all
hardware
resources.•
Difficult
to
share
resources
safely
between
different
workloadSingle
Point
of
Failure•
Single
point
of
failure
for
Name
Node
and
Job
tracker•
No
HA
for
Hive,
HCatalog,
etc.ChallengesToUseHadoopinph46Why
Virtualize
Hadoop?
-
Get
your
Hadoop
cluster
in
minutes
1/1000humanefforts,
LeastHadoopoperation
knowledgeFullyautomated
process,10
minutesto
get
aHadoop/HBaseclusterfromscratch
Server
preparation
OS
installation
Automateby
Serengeti
on
vSpherewith
best
practice
Network
Configuration
Hadoop
Installation
and
ConfigurationManual
process,
costdaysWhyVirtualizeHadoop?-Gety47Why
Virtualize
Hadoop?
-
Consolidate
sprawling
clustersClustersshareserverswithstrongisolation
•
Single
Hardware
Infrastructure
•
Unified
operations
Optimize
•
Shared
Resources
=
higher
utilization
•
Elastic
resources
=
faster
on-demand
accessHadoop
DevHadoop
ProdHBase
ClusterSprawlingSingle
purpose
clusters
for
variousbusiness
applications
lead
to
clustersprawl.Cluster
Consolidation
SimplifyFinanceHadoopVirtualization
PlatformHadoop
DevHadoop
ProdHBase...
PortalHadoop
PortalHadoop30%CAPEXDownWhyVirtualizeHadoop?-Conso4850%+
resourcesaresittingidlewhilehighpriorityjob
isburningup
its
cluster.Utilizeall
resourcesfrompool
on
demand.
Dynamic
elasticscalingonshared
resourcepoolWhy
Virtualize
Hadoop?
–Utilize
all
your
resources
to
solve
the
priority
problem
3X
fasterto
getanalyticresults50%+resourcesaresittingUtiliz49vSphere
High
Availability
(HA)
-
protection
against
unplanned
downtimeOverview
•
Protection
against
host
and
VM
failures
•
Automatic
failure
detection
(host,
guest
OS)
•
Automatic
virtual
machine
restart
in
minutes,
on
any
available
host
in
cluster
•
OS
and
application-independent,does
not
require
complex
configuration
changesvSphereHighAvailability(HA)50(Coordination)ZookeeprManagement
ServerHigh
Availability
for
the
Hadoop
Stack(Hadoop
Distributed
File
System)HBase
(Key-Valuestore)
HDFSMapReduce
(Job
Scheduling/Execution
System)Pig
(DataFlow)HiveBI
ReportingETLToolsRDBMSJobtracker
Namenode(SQL)
Hive
MetaDB
HCatalogHcatalog
MDBServer(Coordination)ZookeeprManageme51X
XHA
HAApp
OSApp
App
OS
OSApp
OSApp
OSApp
OSApp
OSVMwareESX
XVMwareESX•
Zero
downtime,
zero
data
loss
failover
for
all
virtual
machines
in
case
of
hardware
failures•
Integrated
with
VMware
HA/DRS•
No
complex
clustering
or
specialized
hardware
required•
Single
common
mechanism
for
all
applications
and
operatingFTvSphere
Fault
Tolerance
provides
continuous
protection
Overview
•
Single
identical
VMs
running
in
lockstep
on
separate
hosts
systemsZerodowntimeforNameNode,JobTrackerandothercomponentsin
HadoopclustersXXHAHAAppAppA52Agenda•
Today’s
big
data
system•
Why
virtualize
hadoop?•
Serengeti
introduction•
Common
questions
about
virtualization•
Serengeti
solution•
Deep
insight
into
Serengeti•
Summary•
Q&AAgenda•Today’sbigdatasyste53Easy
and
rapid
deployment
and
managementOpen
sourceprojectlaunched
in
June
2012,
0.8
is
released
at
Apr.and
willrelease0.9
at
Jun.Toolkitthat
leveragevirtualizationto
simplifyHadoop
deploymentand
operations
Deploy
a
cluster
in
10
Minutes
fully
automated
Customize
Hadoop
and
HBase
cluster
Automated
cluster
operationCome
with
eco-system
componentsSupport
all
popular
Hadoop
DistributionsSerengetiEasyandrapiddeploymentand54Demo:
10
minutes
to
a
Hadoop
cluster
with
SerengetiDemo:10minutestoaHadoopc55Agenda•
Today’s
big
data
system•
Why
virtualize
hadoop?•
Serengeti
introduction•
Common
questions
about
virtualization•
Serengeti
solution•
Deep
insight
into
Serengeti•
Summary•
Q&AAgenda•Today’sbigdatasyste56Common
questions
about
virtualization
Local
Disk•••••
Can
local
disk
be
used
in
virtualization
environment?Flexibilityand
Scalability
How
to
flexible
schedule
resources
between
clusters
and
different
applications
as
mentioned
above?Data
stability
In
virtual
environment,
how
can
we
distribute
data
across
host
and
rack?Data
locality
Hadoop
will
schedule
compute
tasks
near
by
the
data,
to
reduce
network
IO
for
data
R/W.
Can
virtual
environment
get
the
same
result?Performance
How
about
the
performance
in
virtual
environment?Commonquestionsaboutvirtual57Agenda•
Today’s
big
data
system•
Why
virtualize
hadoop?•
Serengeti
introduction•
Common
questions
about
virtualization•
Serengeti
solution•
Deep
insight
into
Serengeti•
Summary•
Q&AAgenda•Today’sbigdatasyste58Can
I
use
local
diskeasily?CanIuselocaldiskeasily?59Other
VMOther
VMOther
VMOther
VMOther
VMOther
VMOther
VMOther
VMHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopSerengetiExtend
Virtual
StorageArchitectureto
IncludeLocalDiskShared
Storage:SAN
or
NAS
•
Easy
to
provision
•
Automated
cluster
rebalancingHybrid
Storage
•
SAN
for
boot
images,
other
workloads
•
Local
disk
for
Hadoop
&
HDFSHostHostHostHostHostHostOtherVMOtherVMOtherVMOther60How
to
flexiblescalein/scaleoutHow
to
flexiblescheduleresourcesbetween
clustersanddifferentapplications?Howtoflexiblescalein/scaleou61-ComputeCurrentHadoop:T1T2VMVMVMVM
Combined
Storage/Com
puteHadoopinVM-
*
VM
lifecycle
determined
by
Datanode-
*
Limited
elasticityVM
Storage
SeparateStorageVM
Storage
SeparateComputeClusters-
*
Separate
compute
-
fromdata-
*
Remove
elasticconstrain-
by
Datanode-
*
Elastic
compute-
*
Raise
utilization-*
Separate
virtual
compute*
Compute
clusterpertenant*
Stronger
VM-grade
securityand
resourceisolationEvolution
of
Hadoop
on
VMs
–
Data/Compute
separation
Slave
Node-ComputeCurrentT1T2VMVMVMVM Co62Serengeti
Node
Scale
Out
/
Scale
InNameNode
Host
DHostJobTrackerCCCC
DHostCCC
C
DHostCCC
C
DHostCCC
CSerengetiNodeScaleOut/Sca63Serengeti
Ballooning
Enhancement
for
Java
ApplicationJVMGuest
OSHostJVMGuest
OSHostGuest
OS
JVMSerengetiBallooningEnhanceme64How
to
keep
data
stability?How
to
access
data
locallyif
data
node
and
computenodeare
located
in
differentVM?Howtokeepdatastability?How65DatanodeandtasktrackercombinedclusterDataComputeseparatedclustermaster
Hostworker
Hostworker
Hostmaster
HostData
node
HostTasktrackerData
node
HostTasktrackerTasktrackerTasktracker
Data
node
HostComputeonly
cluster1Computeonly
cluster2HDFS
cluster
Compute
OnlyclusterRack1Rack2Rack1Distributed
and
Data/Compute
Associated
VM
Placement
Rack2
Rack1Job
trackerJob
trackerName
node
Host
Rack2TasktrackerTasktracker
Data
node
HostDatanodeandtasktrackercombined66HadoopTopologyChangesfor
VirtualizationHadoop
Topology
Awareness
–
Serengeti
HVE
/D1D2R1R2N1H1H2H3H4H5H6H7H8H9H10H11H12R3R43/D1D2R1R2H1H2H3H4H5H6H7H8H9H10H11H12R3R423N2N3N4N5N6N7N81
12
321
1234HadoopTopologyChangesforVirtu67HADOOP-8468(UmbrellaJIRA)HADOOP-8469HDFS-3495HDFS-3498HadoopNetworkTopologyExtension
Hadoop
Virtualization
Extensions
for
Topology
HVE
TaskScheduling
PolicyExtension
BalancerPolicy
ExtensionReplicaChoosing
PolicyExtensionReplicaPlacement
PolicyExtension
ReplicaRemovalPolicyExtensionHDFSMapReduceHadoop
CommonMAPREDUCE-4310MAPREDUCE-4309HADOOP-8470HADOOP-8472HADOOP-8468(UmbrellaJIRA)Hadoo68Is
there
significantperformancedegradationin
virtualizationenvironment?Is
there
any
performancedata?Istheresignificantperformanc69Virtualized
Hadoop
PerformanceVirtualizedHadoopPerformance70Native
versus
Virtual
Platforms,
32
hosts,
16
disks/hostNativeversusVirtualPlatform71Agenda•
Today’s
big
data
system•
Why
virtualize
hadoop?•
Serengeti
introduction•
Common
questions
about
virtualization•
Serengeti
solution•
Deep
insight
into
Serengeti•
Summary•
Q&AAgenda•Today’sbigdatasyste72RestAPISpringBatchUpdateMetaDBstepVMPlacementcalculationVMProvisionstepSoftwareMgmtstepUI
Client
Flex
UISerengeti
architecture
diagram
CLI
Client
Spring
Shell
Serengeti
Web
ServiceHibernate/
DAOvPostgresVC
adapter
Ironfan
service
ThriftService
ProgressIronfan
report
Chef
serverRestAPICookbookVHMstepRabbitMQVM
runtime
ManagerHostHostHostHostHostVirtualization
PlatformHadoop
NodeChefClient
HA
kitHadoop
NodeHadoop
NodePackagerepositoryvCenterRestAPISpringBatchUpdateVMVMSo73Customizing
your
Hadoop/HBase
cluster
with
Serengeti
Choiceof
distros
Storageconfiguration
•
Choice
of
shared
storage
or
Local
disk
Resourceconfiguration
High
availabilityoption
#
of
nodes…
"distro":"apache",
"groups":[
{
"name":"master",
"roles":[
"hadoop_namenode",
"hadoop_jobtracker”],
"storage":
{
"type":
"SHARED",
"sizeGB":
20},
"instance_type":MEDIUM,
"instance_num":1,
"ha":true},
{"name":"worker",
"roles":[
"hadoop_datanode",
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 教学心得体会
- 工程经理述职报告8篇
- 学院学风工作总结(3篇)
- 初中阅读之星事迹材料范文500字(34篇)
- 房屋建筑工作总结7篇
- 支部主题日护士演讲词(3篇)
- 珍惜时间演讲稿400字(32篇)
- 快递柜和快递员合同7篇
- 生活垃圾分类工作方案
- 校车消毒登记表
- UHFReader18CSharpDLL动态连接库使用手册V25
- 地沟及盖板图集02J331
- 新人教版八年级下册英语单词表汉语
- 水箱满水(闭水)试验记录(完成)
- 美容导师岗位工作说明书
- 扫黑除恶目录
- 输电线路强制性条文监理实施细则(共21页)
- 形式发票中英文_通用范本
- 英语情景剧狐假虎威(课堂PPT)
- 林织项目三级动火许可证
- 瑞文智力测验及答案(经典版)
评论
0/150
提交评论