Hadoop2.6.0分布式部署参考手册_第1页
Hadoop2.6.0分布式部署参考手册_第2页
Hadoop2.6.0分布式部署参考手册_第3页
Hadoop2.6.0分布式部署参考手册_第4页
Hadoop2.6.0分布式部署参考手册_第5页
已阅读5页,还剩13页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、.页脚Hadoop 260分布式部署参考手册1. 环境说明. 21.1安装环境说明 . 22.2 Hadoop集群环境说明: . 22. 基础环境安装及配置 . 22.1 添加hadoop用户 . 22.2 JDK 1.7 安装 . 22.3 SSH无密码登陆配置 . 32.4修改hosts映射文件 . 33. Hadoop安装及配置 . 43.1通用部分安装及配置 . 43.2各节点配置 . 44. 格式化/启动集群 . 44.1格式化集群HDFS文件系统. 44.2启动Hadoop集群 . 5附录1关键配置内容参考 . 51core-site.xml . 52hdfs-site.xml .

2、 53mapred-site.xml . 64yar n-site.xml . 65hadoop-e nv.sh . 76slaves . 7附录2详细配置内容参考 . 71core-site.xml . 72hdfs-site.xml . 83mapred-site.xml . 84yarn-site.xml . 105hadoop-e nv.sh . 136slaves . 13附录3详细配置参数参考 . 13* con f/core-site.xml . 13* con f/hdfs-site.xml . 13o Con figurati ons forNameNode: 13o Con

3、 figurati ons forDataNode: 14* conf/yarn-site.xml . 14o Con figuratio ns forResourceMa nager and NodeMa nager: 14o Con figurati ons forResourceMa nager: 15o Con figurati ons forNodeMa nager: 16o Con figuratio ns for History Server (Needs to be moved elsewhere):.17* con f/mapred-site.xml . 17o Con fi

4、guratio ns for MapReduce Applicati ons:. 17o Con figuratio ns for MapReduce JobHistory Server:. 18.页脚1.环境说明1.1安装环境说明本列中,操作系统为 Centos 7.0 , JDK版本为 Oracle HotSpot 1.7 , Hadoop版本为 ApacheHadoop 2.6.0 ,操作用户为 hadoop。2.2 Hadoop集群环境说明:集群各节点信息参考如下:主机名IP地址角色ResourceMa nagerResourceMa nager & MR JobH

5、istory ServerNameNodeNameNodeSec on daryNameNodeSec on daryNameNodeDataNode01DataNode & NodeMa nagerDataNode02DataNode & NodeMa nagerDataNode03DataNode & NodeMa nagerDataNode04DataNode & NodeMa nagerDataNode05DataNode & NodeMa nag

6、er注:上述表中用”&连接多个角色,如主机”ResourceMa nager ”有两个角色,分别为ResourceManager 和 MR JobHistory Server 。2.基础环境安装及配置2.1添加hadoop用户useradd hadoop用户hadoop”即为Hadoop集群的安装和使用用户。2.2 JDK 1.7 安装Centos 7自带的JDK版本为OpenJDK 1.7,本例中需要将其更换为Oracle HotSpot 1.7版,本例中采用解压二进制包方式安装,安装目录为/opt/。查看当前JDK rpm包rpm -qa | grep jdk java-1.7.0-ope

7、njdk-1-.el7.x86_64java-1.7.0-openjdk-headless-1-.el7.x86_64.页脚删除自带JDKrpm -e -no depsjava-1.7.0-openjdk-1-.el7.x86_64rpm -e -no depsjava-1.7.0-openjdk-headless-1-.el7.x86_64安装指定JDK进入安装包所在目录并解压配置环境变量编辑/.bashrc 或者/etc/profile,添加如下内容:#JAVAexport JA

8、VA_HOME=/opt/jdk1.7export PATH=$PATH:$JAVA_HOME/binexport CLASSPATH=$JAVA_HOME/libexport CLASSPATH=$CLASSPATH:$JAVA_HOME/jre/lib2.3 SSH无密码登陆配置需要设置如上表格所示8台主机间的SSH无密码登陆。进入hadoop用户的根目录下并通过命令ssh-keygen -t rsa生成秘钥对创建公钥认证文件authorized_keys并将生成的/.ssh 目录下的id_rsa.pub 文件的内容输出至该文件:more id_rsa.pub auhorized_keys

9、分别改变/.ssh目录和authorized_keys 文件的权限:chmod 700 /.ssh;chmod 600 /.ssh/authorized_keys每个节点主机都重复以上步骤,并将各自的/.ssh/ id_rsa.pub 文件的公钥拷贝至其他主机。对于以上操作,也可以通过一句命令搞定:rm -rf /.ssh;ssh-keyge n -t rsa;chmod 700 /.ssh;more /.ssh/id_rsa.pub /.ssh/authorized_keys;chmod 600 /.ssh/authorized_keys;注:在centos 6中可以用dsa方式:ssh-k

10、eygen -t dsa命令来设置无密码登陆,在centos7中只能用rsa方式,否则只能ssh无密码登陆本机,无能登陆它机。2.4修改hosts映射文件分别编辑各节点上的/etc/hosts文件,添加如下内容:ResourceManagerNameNodeSecondaryNameNodeDataNode01DataNode02.页脚DataNode03DataNode04DataNode05NodeManager

11、01NodeManager02NodeManager03NodeManager04 NodeManager053.Hadoop安装及配置3.1通用部分安装及配置以下操作内容为通用操作部分,及在每个节点上的内容一样。分别在每个节点上重复如下操作:将hadoop安装包(hadoop-2.6.0.tar )拷贝至/opt目录下,并解压:tar -xvf hadoop-2.6.0.tar解压后的 hadoop-2.6.0 目录(/opt/hadoop-2.6.0) 即为hadoop的安装根目录更改hadoop安装目录had

12、oop-2.6.0的所有者为 hadoop用户:chow n -R hadoop.hadoop /opt/hadoop-2.6.0添加环境变量:#hadoopexport HAD00P_H0ME=/opt/hadoop-2.6.0export PATH=$PATH:$HADOOP_HOME/binexport PATH=$PATH:$HADOOP_HOME/sbin3.2各节点配置分别将如下配置文件解压并分发至每个节点的Hadoop “ $HADOOP_HOME/etc/hadoop目录中,如提示是否覆盖文件,确认即可。Hadoop配置文件 参考 .zip注:关于各节点的配置参数设置,请参考后

13、面的“附录1”或“附录2”4.格式化/启动集群4.1格式化集群HDFS文件系统安装完毕后,需登陆NameNode节点或任一 DataNode节点执行 hdfs name node -format格式化集群HDFS文件系统;.页脚注:如果非第一次格式化HDFS文件系统,则需要在进行格式化操作前分别将NameNode的 .dir 禾口各个 DataNode 节点的 dfs.datanode.data.dir目录(在本例中为/home/hadoop/hadoopdata) 下的所有内容清空。4.2启动Hadoop集群分别登陆如下主机并执行相应命令:登陆Resourc

14、eManger执行start-yarn.sh命令启动集群资源管理系统yarn登陆NameNode执行start-dfs.sh 命令启动集群 HDFS文件系统分别登陆 SecondaryNameNode、DataNode01、DataNode02、DataNode03、DataNode04节点执行jps命令,查看每个节点是否有如下Java进程运行:ResourceManger 节点运行的进程:ResouceNamagerNameNode?点运行的进程:NameNodeSecondaryNameNod节点运行的进程:SecondaryNameNode 各个 DataNode节点运行的进程:Data

15、Node & NodeManager如果以上操作正常则说明Hadoop集群已经正常启动。附录1关键配置内容参考1 core-site.xmlconfigurationfs.defaultFShdfs:/NameNode:9000NameNode URI 属性” fs.defaultFS “表示 NameNode节点地址, 由” hdfs:/ 主机名(或ip):端口号” 组成.dirfile:/home/hadoop/hadoopdata/hdfs/namenodedfs.datanode.data.dirfile:/home/jack

16、/hadoopdata/hdfs/datanode/node.secondary.http-addressSecondaryNameNode:50090v/configuration属性dfs.n ame no de. name.dir”表示NameNode存储命名空间和操作日志相关的元数.页脚据信息的本地文件系统目录,该项默认本地路径为/tmp/hadoop-username/dfs/name”;属性” dfs.datanode.data.dir表示DataNode节点存储 HDFS文件的本地文件系统目录,由file:/ 本地目录组成,该项默认本地路径为/tm

17、p/hadoop-username/dfs/data 。属性node.secondary.http-address表示 SecondNameNode主机及端口号(如果无需额外指定 Seco ndNameNode角色,可以不进行此项配置);3yarnExecution framework set to Hadoop YARN. 属性表示执行 mapreduce任务所使用的运行框架,默认为local,需要将其改为yarn 4yarn

18、-site.xmlyarn.resourcemanager.hostnameResourceManagerResourceManager hostyarn.nodemanager.aux-servicesmapreduce_shuffleShuffle service that needs to be set for Map Reduce applications. .页脚属性” yarn.resourcemanager.hostname”用来指定 ResourceManager主机地址;属性” yarn.nodemanager.aux-service表示MRapplicatons所使用的sh

19、uffle 工具5hadoop-e nv.shJAVA_H0M表示当前的 Java安装目录export JAVA_H0ME=/opt/jdk-1.76slaves集群中的 master节点(NameNode ResourceManager)需要配置其所拥有的slaver 节点,其中:NameNode节点的slaves内容为:DataNode01DataNodeO2DataNodeO3DataNode04DataNode05ResourceManager 节点的 slaves 内容为:NodeManager01NodeManager02NodeManager03NodeManager04Node

20、Manager05附录2详细配置内容参考注:以下的红色字体部分的配置参数为必须配置的部分,其他配置皆为默认配置。1 core-site.xmlconfiguration fs.defaultFShdfs:/NameNode:9000NameNode URIio.file.buffer.size131072Size of read/write buffer used in SequenceFiles,The default value is 131072属性” fs.defaultFS “表示 NameNode节点地址, 由” hdfs:/ 主机名(或ip):端口号”组成。.页脚hdfs-sit

21、e.xmlconfiguration!-Configurations for NameNode:-.dirfile:/home/hadoop/hadoopdata/hdfs/node.secondary.http-addressSecondaryNameNode:50090dfs.replication3/node.handler.count100dfs.datanode.data.dirfile:/home/hadoop/hadoopdata/hdfs

22、/datanode 属性dfs.n ame no de. name.dir”表示NameNode存储命名空间和操作日志相关的元数据信息的本地文件系统目录,该项默认本地路径为” /tmp/hadoop-username/dfs/name ”; 属性” dfs.datanode.data.dir 表示DataNode节点存储 HDFS文件的本地文件系统目录,由” file:/ 本地目录”组成,该项默认本地路径 为” /tmp/hadoop-username/dfs/data ”。属性node.secondary.http-address”表示 SecondNameNode主机及端口

23、号(如果无需额外指定 Seco ndNameNode角色,可以不进行此项配置);3mapred-site.xml页脚yarnExecution framework set to Hadoop YARN.mapreduce.map.memory.mb1024Larger resource limit for maps.mapreduce.map.java.optsXmx1024MLarger heap-size for child jvms of maps.mapreduce.reduce.memory.mb1024Larger resource

24、limit for reduces.mapreduce.reduce.java.optsXmx2560Mmapreduce.task.io.sort.mb512mapreduce.task.io.sort.factor10More streams merged at once while sorting files.mapreduce.reduce.shuffle.parallelcopies5Highernumber of parallel copies run by reduces to fetch outputs from very large numberof maps.!-Confi

25、gurations for MapReduce JobHistory Server:-mapreduce.jobhistory.addressResourceManager:10020MapReduce JobHistory Server host:port Default port is 10020mapreduce.jobhistory.webapp.addressResourceManager:19888MapReduce JobHistory Server Web UI host:port Default port is 19888erm

26、ediate-done-dir/mr-history/tmpDirectory where history files are written by MapReduce jobs. Defalut is/mr-history/tmpmapreduce.jobhistory.done-dir.页脚/mr-history/doneDirectory where history files are managed by the MR JobHistory Server.Default value is /mr-history/done属性” 表示执行

27、mapreduce任务所使用的运行框架,默认为local,需要将其改为yarn ”4yarn-site.xmlconfigurationyarn.acl.enablefalseEnable ACLs? Defaults to false. The value of the optional is true orfalseyarn.admin.acl*ACL to set admins on the cluster. ACLs are of for comma-separated-usersspacecomma-separated-groups. Defaults to special valu

28、e of * which means anyone. Special value of just space means no one has accessyar n.lo g-aggregation-enablefalseConfiguration to enable or disable log aggregationyarn.resourcemanager.addressResourceManager:8032ResourceManager host:port for clients to submit jobs.NOTES:host:port If set, overrides the

29、 hostname set in yarn.resourcemanager.hostname.yarn.resourcemanager.scheduler.addressResourceManager:8030ResourceManager host:port for ApplicationMasters to talk to Scheduler to obtain resources.NOTES:host:port If set, overrides the hostname set in yarn.resourcemanager.hostnameyarn.resourcemanager.r

30、esource-tracker.addressResourceManager:8031ResourceManager host:port for NodeManagers.NOTES:host:port If set, overrides thehostname set in yarn.resourcemanager.hostnameyarn.resourcemanager.admin.addressResourceManager:8033ResourceManager host:port for administrative commands.NOTES:host:port If set,o

31、verrides the hostname set in yarn.resourcemanager.hostname.yarn.resourcemanager.webapp.address.页脚ResourceManager:8088ResourceManager web-ui host:port. NOTES:host:port If set, overrides the hostname set in yarn.resourcemanager.hostnameyarn.resourcemanager.hostnameResourceManagerResourceManager hostya

32、rn.resourcemanager.scheduler.classorg.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerResourceManager Scheduler class CapacityScheduler (recommended), FairScheduler (also recommended), or FifoScheduler.The default value is org.apache.hadoop.yarn.server.resourcemanager.s

33、cheduler.capacity.CapacityScheduler.yarn.scheduler.minimum-allocation-mb1024Minimum limit of memory to allocate to each container request at the ResourceManager.NOTES:In MBsyarn.scheduler.maximum-allocation-mb8192Maximum limit of memory to allocate to each container request at the ResourceManager.NO

34、TES:In MBsyar n.lo g-aggregation.retain-seconds-1How long to keep aggregation logs before deleting them. -1 disables. Be careful, setthis too small and you will spam the name node.yar n.lo g-aggregation.retain-check-interval-seconds-1Time between checks for aggregated log retention. If set to 0 or a

35、 negative value then the value is computed as one-tenth of the aggregated log retention time. Be careful, set this too small and you will spam the name node.yarn.nodemanager.resource.memory-mb8192Resource i.e. available physical memory, in MB, for given NodeManager.The default value is 8192.NOTES:De

36、fines total available resources on the NodeManager to be made available to runningcontainers.页脚memoryusageyarn.nodemanager.vmem-pmem-ratio2.1Maximum ratio by which virtual memory usage of tasks may exceed physical memory.The default value is 2.1NOTES:The virtual memory usage of each task may exceed

37、its physical memory limit by this ratio. The total amount of virtual memoryused by tasks on the NodeManagermayexceed its physical by this ratio.yarn.nodemanager.local-dir$hadoop.tmp.dir/nm-local-dirComma-separated list of paths on the local filesystem where intermediate data is written.The default v

38、alue is $hadoop.tmp.dir/nm-local-dirNOTES:Multiple paths help spread disk i/o.yarn.nodemanager.log-dirs$yar n.lo g.dir/userlogsComma-separated list of paths on the local filesystem where logs are writtenThe default value is $yar n.lo g.dir/userlogsNOTES:Multiple paths help spread disk i/o.yarn.nodem

39、anager.log.retain-seconds10800Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled.The default value is 10800yarn.nodemanager.remote-app-log-dir/logsHDFS directory where the application logs are moved on application completion. Need to set a

40、ppropriate permissions. Only applicable if log-aggregation is enabled.The default value is /logs or /tmp/logsyarn.nodemanager.remote-app-log-dir-suffixlogsSuffix appended to the remote log dir. Logs will be aggregated to $yarn.nodemanager.remote-app-log-dir/$user/$thisParam Only applicable if log-ag

41、gregation is enabled.yarn.nodemanager.aux-servicesmapreduce_shuffleShuffle service that needs to be set for Map Reduce applications. .页脚属性” yarn.resourcemanager.hostname”用来指定 ResourceManager主机地址;属性” yarn.nodemanager.aux-service表示MRapplicatons所使用的shuffle 工具5hadoop-e nv.shJAVA_H0M表示当前的 Java安装目录export

42、JAVA_H0ME=/opt/jdk-1.76slaves集群中的 master节点(NameNode ResourceManager)需要配置其所拥有的slaver 节点,其中:NameNode节点的slaves内容为:DataNode01DataNodeO2DataNodeO3DataNode04DataNode05ResourceManager 节点的 slaves 内容为:NodeManager01NodeManager02NodeManager03NodeManager04NodeManager05附录3详细配置参数参考Con figuri ng the Hadoop Daem on

43、s in Non-Secure ModeThis section deals with important parameters to be specified in the given configuration files:* conf/core-site.xmlParameterValueNotesfs.defaultFSNameNode URIhdfs:/host:port/io.file.buffer.size131072Size of read/write buffer used in SequenceFiles.* conf/hdfs-site.xmlo Configuratio

44、ns for NameNode:.页脚ParameterValueN.dirPath on the local filesystem whe re the NameNode stores the names pace and transactions logs persi stently.If this is a comma-delimited lis t of directories then the name t able is replicated in all of the directories, for redundancy.dfs.nam

45、enode.hosts/node.hosts.excludeList of permitted/excluded DataN odes.If necessary, use these files to control the list of allowable d atanodes.dfs.blocksize268435456HDFS blocksize of 256MB for larg e node.handler.count100More NameNode server threads to handle RPCs from la

46、rge number of DataNodes.o Configurations for DataNode:ParameterValueNotesdfs.datanode.data.dirComma separated list of paths on the local filesystem of aDataNode where it should store it s blocks.If this is a comma-delimited lis t of directories, then data will be stored in all named director ies, ty

47、pically on different devi ces.* conf/yarn-site.xmlo Configurations for ResourceManager and NodeManager:ParameterValueNotesyarn.acl.enabletrue / falseEnable ACLs? Defaults tofalse .yarn.admin.aclAdmin ACLACL to set admins on the cluster. ACLs are of forcomma-separated-usersspacecomma-separated-gr oup

48、s. Defaults to special value of * which means anyone. Spec ial value of justspace meansno one has access.yarn .l og-aggregation-enablefalseConfiguration to enable or disab le log aggregation.页脚o Configurations for ResourceManager:ParameterValueNotesyarn.resourcemanager.addressResourceManager host:po

49、rt for clients to submit jobs.host:portIf set, overrides the hostname set in yarn.resourcemanager.h ostname.yarn.resourcemanager.scheduler.addressResourceManager host:port for ApplicationMasters to talk to S cheduler to obtain resources.host:portIf set, overrides the hostname set in yarn.resourceman

50、ager.h ostname.yarn.resourcemanager.resourc e-tracker.addressResourceManager host:port for NodeManagers.host:portIf set, overrides the hostname set in yarn.resourcemanager.h ostname.yarn.resourcemanager.admin.addressResourceManager host:port foradministrative commands.host:portIf set, overrides the

51、hostname set in yarn.resourcemanager.h ostname.yarn.resourcemanager.webapp. addressResourceManager web-ui host:p ort.host:portIf set, overrides the hostname set in yarn.resourcemanager.h ostname.yarn.resourcemanager.hostnameResourceManager host.hostSingle hostname that can be set in place of setting

52、 allyarn.resourcemanager*address resou rces. Results in default ports f or ResourceManager components.yarn.resourcemanager.scheduler.classResourceManager Scheduler class.CapacityScheduler(recommended), FairScheduler(also recommended), or FifoScheduleryarn.scheduler.minimum-allocation-mbMinimum limit

53、 of memory to alloc ate to each container request at the Resource Manager .In MBsyarn.scheduler.maximum-allocation-mbMaximum limit of memory to alloc ate to each container request at the Resource Manager .In MBs.页脚yarn.resourcemanager.nodes.i nclude-path/yarn.resourcemanager.nodes.exclude-pathList o

54、f permitted/excluded NodeM anagers.If necessary, use these files to control the list of allowable N odeManagers.o Configurations for NodeManager:ParameterValueNotesyarn.nodemanager.resource.me mory-mbResource i.e. available physical memory, in MB, for givenNodeManagerDefines total available resource

55、s on the NodeManager to be m ade available to running contain ersyarn.nodemanager.vmem-pmem-ratioMaximum ratio by which virtual m emory usage of tasks may exceed physical memoryThe virtual memory usage of each task may exceed its physical me mory limit by this ratio. The to tal amount of virtual mem

56、ory use d by tasks on the NodeManager ma y exceed its physical memory usa ge by this ratio.yarn.nodemanagero cal-dirsComma-separated list of paths on the local filesystem where inte rmediate data is written.Multiple paths help spread disk i/o.yarn.nodemanagero g-dirsComma-separated list of paths on

57、the local filesystem where logs are written.Multiple paths help spread disk i/o.yarn.nodemanager.log.retain-seconds10800Default time (in seconds) to ret ain log files on the NodeManager Only applicable if log-aggregat ion is disabled.yarn.nodemanager.remote-app- log-dir/logsHDFS directory where the applica tion logs are moved on applicati on completion. Need to set appro priate permissions. Only applica ble if log-aggregation is enable d.yarn.nodemanager.remote-app- log-dir-suffixlogsSuffix

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论