虚拟机中centos66hadoop环境搭建_第1页
虚拟机中centos66hadoop环境搭建_第2页
虚拟机中centos66hadoop环境搭建_第3页
虚拟机中centos66hadoop环境搭建_第4页
虚拟机中centos66hadoop环境搭建_第5页
已阅读5页,还剩35页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、一、Hadoop安装1、解压文件后的目录配置之前,先在本地文件系统创建以下文件夹: mkdir -p /home/hadoop/tmp /home/dfs/data /home/dfs/name4、进去hadoop配置文件目录rootmaster hadoop-2.6.0# cd etc/hadoop/4.1、配置 hadoop-env.sh文件-修改JAVA_HOME# The java implementation to use.export JAVA_HOME=/opt/jdk1.8.0_664.2、配置 yarn-env.sh 文件-修改JAVA_HOME# some Java par

2、ametersexport JAVA_HOME=/opt/jdk1.8.0_664.3、配置slaves文件-增加slave节点slave1slave24.4、配置 core-site.xml文件-增加hadoop核心配置(hdfs文件端口是9000、file:/home/spark/opt/hadoop-2.6.0/tmp、)fs.defaultFShdfs:/master.hadoop:9000io.file.buffer.sizehadoop.tmp.dirfile:/home/hadoop/tmpAbasefor other temporary directories.hadoop.p

3、roxyuser.root.hosts*xyuser.root.groups*ha.zookeeper.quorummast1:2181,mast2:2181,mast3:21814.5、配置hdfs-site.xml 文件-增加hdfs配置信息(namenode、datanode端口和目录位置)node.secondary.http-addressmaster.hadoop:9001 .dir file:/home/dfs/namedfs.datanode.data.dirfile:/home/dfs/datadfs.re

4、node.ha.ConfiguredFailoverProxyProvider4.6、配置mapred-site.xml 文件-增加mapreduce配置(使用yarn框架、jobhistory使用地址以及web地址) yarnmapreduce.jobhistory

5、.addressmaster.hadoop:10020mapreduce.jobhistory.webapp.addressmaster.hadoop:19888复制代码4.7、配置 yarn-site.xml文件-增加yarn功能 yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.resourcemanager.address master.hadoo

6、p:8032 yarn.resourcemanager.scheduler.address master.hadoop:8030 yarn.resourcemanager.resource-tracker.address master.hadoop:8035 yarn.resourcemanager.admin.address master.hadoop:8033 yarn.resourcemanager.webapp.address master.hadoop:80885、将配置好的hadoop文件copy到另一台slave机器上$ scp -r hadoop-2.6.0/rootslave

7、1:/opt/四、验证1、格式化namenode:./bin/hdfs namenode -format2、重新格式化hdfs系统的方法:删除namenode上存储hdfs名字空间元数据,datanode上数据块的物理存储位置,namenode上本地的hadoop临时文件夹重新执行命令:hadoop namenode -format格式化完毕。注意:原来的数据全部被清空了。产生了一个新的hdfs。3、查看集群状态:jpsmaster中运行Slave中安装过程中如果重新格式化了hdfs要在slave中也删除对应的文件夹dataname,否则slave上的datanode无法启动。 $ ./bin

8、/hdfs dfsadmin -report-Live datanodes (1):Name: 6:50010 (S1PA222)Hostname: S1PA209Decommission Status : NormalConfigured Capacity: (48.52 GB)DFS Used: (804 KB)Non DFS Used: (5.92 GB)DFS Remaining: (42.61 GB)DFS Used%: 0.00%DFS Remaining%: 87.81%Configured Cache Capacity: 0 (0 B)Cache Used

9、: 0 (0 B)Cache Remaining: 0 (0 B)Cache Used%: 100.00%Cache Remaining%: 0.00%Xceivers: 1Last contact: Mon Jan 05 16:44:50 CST 2015对应整个集群4解决本地库问题如果你是hadoop2.6的可以下载下面这个:/sequenceiq/sequenceiq-bin/hadoop-native-64-2.6.0.tar下载完以后,解压到hadoop的native目录下,覆盖原有文件即可。操作如下:tar -x hadoop-native-

10、64-2.4.0.tar -C hadoop/lib/native/ 7、查看hdfs:7:50070/8、查看RM:7:8088/9、运行wordcount程序9.1、创建 input目录:sparkS1PA11 hadoop-2.6.0$ mkdir input9.2、在input创建f1、f2并写内容sparkS1PA11 hadoop-2.6.0$ cat input/f1Hello worldbye jjsparkS1PA11 hadoop-2.6.0$ cat input/f2Hello Hadoopbye Hadoo

11、p9.3、在hdfs创建/tmp/input目录sparkS1PA11 hadoop-2.6.0$ ./bin/hadoop fs-mkdir /tmp15/01/05 16:53:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform. using builtin-java classes where applicablesparkS1PA11 hadoop-2.6.0$ ./bin/hadoop fs-mkdir /tmp/input15/01/05 16:54:16 WAR

12、N util.NativeCodeLoader: Unable to load native-hadoop library for your platform. using builtin-java classes where applicable9.4、将f1、f2文件copy到hdfs /tmp/input目录sparkS1PA11 hadoop-2.6.0$ ./bin/hadoop fs-put input/ /tmp15/01/05 16:56:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library fo

13、r your platform. using builtin-java classes where applicable9.5、查看hdfs上是否有f1、f2文件sparkS1PA11 hadoop-2.6.0$ ./bin/hadoop fs -ls /tmp/input/15/01/05 16:57:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform. using builtin-java classes where applicableFound 2 items-rw-

14、r-r- 3 spark supergroup 20 2015-01-04 19:09 /tmp/input/f1-rw-r-r- 3 spark supergroup 25 2015-01-04 19:09 /tmp/input/f29.6、执行wordcount程序sparkS1PA11 hadoop-2.6.0$ ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /tmp/input /output15/01/05 17:00:09 WARN util.NativeC

15、odeLoader: Unable to load native-hadoop library for your platform. using builtin-java classes where applicable15/01/05 17:00:09 INFO client.RMProxy: Connecting to ResourceManager at S1PA11/7:803215/01/05 17:00:11 INFO input.FileInputFormat: Total input paths to process : 215/01/05 17:00:11

16、 INFO mapreduce.JobSubmitter: number of splits:215/01/05 17:00:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_52_000115/01/05 17:00:12 INFO impl.YarnClientImpl: Submitted application application_52_000115/01/05 17:00:12 INFO mapreduce.Job: The url to track the job: http:/S1PA11:8088/

17、proxy/application_52_0001/15/01/05 17:00:12 INFO mapreduce.Job: Running job: job_52_00019.7、查看执行结果sparkS1PA11 hadoop-2.6.0$ ./bin/hadoop fs -cat /output/part-r-000015/01/05 17:06:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform. using builtin-java classes where a

18、pplicable二、Hive 安装1、内嵌模式:(1)、修改/opt/apache-hive-1.2.1-bin/conf/ hive-env.sh (2)、编辑hive-site.xml (/home/lin/hadoop/apache-hive-1.2.1-bin/conf/hive-default.xml.template复制为hive-site.xml)修改以下参数:hive.metastore.warehouse.dir hdfs 上的指定目录hive.exec.scratchdir hdfs上的临时文件目录 hive.metastore.warehouse.dir /home/h

19、ive/warehouse hive.exec.scratchdir /tmp/hive hive.server2.logging.operation.log.location /tmp/hive/operation_logs Top level directory where operation logs are stored if logging functionality is enabled hive.exec.local.scratchdir /tmp/hive Local scratch space for Hive jobs hive.downloaded.resources.d

20、ir /tmp/hive/resources Temporary local directory for added resources in the remote file system. (3)、配置hive的log4j: conf下 cp perties.template perties#log4j.appender.EventCounter=org.apache.hadoop.hive.shims.HiveEventCounterlog4j.appender.EventCounter=org.apache.hadoop.log.m

21、etrics.EventCounter否则会有警告:WARN conf.HiveConf: HiveConf of name hive.metastore.local does not existWARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the perties files.(4)、启动hive bin/hive启动时候遇到错误:ERROR Terminal init

22、ialization failed; falling back to unsupportedjava.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at jline.TerminalFactory.create(TerminalFactory.java:101) at jline.TerminalFactory.get(TerminalFactory.java:158) at jline.console.ConsoleReader.(ConsoleReader.

23、java:229) at jline.console.ConsoleReader.(ConsoleReader.java:221) at jline.console.ConsoleReader.(ConsoleReader.java:209) at org.apache.hadoop.hive.cli.CliDriver.setupConsoleReader(CliDriver.java:787) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:721) at org.apache.hadoop.hive

24、.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(D

25、elegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)发现是jline有问题,最后找到,原来hadoop里面的是jline0.0.94 hive里面的是jline2.12 把/home/lin/hadoop/apache-hive-1.2.1-bin/lib

26、里面的jline2.12替换了hadoop 中/home/lin/hadoop/hadoop-2.6.1/share/hadoop/yarn/lib的jline0.0.94 就可以成功了!2、独立模式在mysql中为hive建立账号,并授予足够的权限,例如hive账号,授予all privileges用上述账号登陆mysql,创建数据库,比如名字叫hive,用于存放hive的元数据在本地安装MySQL客户端配置hive-site.xml,指出使用本地mysql数据库,以及连接协议,账号,口令等把mysql-connector-java.jar 复制到 hive的lib目录下启动hive能进入s

27、hell表示成功(1)、安装mysql并启动服务联网安装:sudo apt-get install cmakesudo apt-get install libncurses5-dev1. sudo apt-get install mysql-server2. sudo apt-get isntall mysql-client3. sudo apt-get install libmysqlclient-dev查看是否安装成功: sudo netstat -tap | grep mysql通过上述命令检查之后,如果看到有mysql 的socket处于 listen 状态则表示安装成功。启动设置my

28、sql启动mysql服务sudo service mysqld start设置为开机自启动sudo chkconfig mysqld on设置root用户登录密码mysqladmin-urootpasswordhadoop登录mysql 以root用户身份登录mysql -uroot -phadoopmysql CREATE USER hivelocalhost IDENTIFIED BY hive; Query OK, 0 rows affected (0.00 sec)mysql GRANT ALL PRIVILEGES ON *.* TO hivelocalhost WITH GRANT

29、 OPTION;Query OK, 0 rows affected (0.00 sec)mysql flush privileges; Query OK, 0 rows affected (0.00 sec)退出mysqlexit验证hive用户mysql -uhive -phiveshow databases;退出mysqlexit (2)、在内嵌模式下继续配置hive加上驱动和数据库地址 javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver Driver class name for a JDBC metastore jav

30、ax.jdo.option.ConnectionURL jdbc:mysql:/66:3306/hive?createDatabaseIfNotExist=true JDBC connect string for a JDBC metastore javax.jdo.option.ConnectionUserName hive username to use against metastore database javax.jdo.option.ConnectionPassword hive password to use against metastore databa

31、se(2)、拷贝数据驱动jar拷贝mysql-connector-java-5.1.24-bin 到/home/lin/hadoop/apache-hive-1.2.1-bin/lib(4)、进入hive3、远程模式在独立模式的基础上添加: hive.metastore.local true controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM3、启动测试hive启动hadoop后,执行hive命令hive测试输入 show databas

32、e;hive show databases;OKdefaultTime taken: 0.907 seconds, Fetched: 1 row(s)4、遇到问题总结建议先建元数据库,设置编码latin1。否则建好元数据相关可能会出问题,如drop table 卡死, create table too long等等hive对utf-8支持不好。设置完编码latin1,发现table 字段描述无法显示中文。修改元数据库表的字符(1)修改表字段注解和表注解alter table COLUMNS_V2 modify column COMMENT varchar(256) character set

33、utf8alter table TABLE_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8(2) 修改分区字段注解:alter table PARTITION_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8 ;alter table PARTITION_KEYS modify column PKEY_COMMENT varchar(4000) character set utf8;(3)修改索引注解:alter table IN

34、DEX_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;连接元数据设置dbc:mysql:/:3306/metastore_hive_db?createDatabaseIfNotExist=true&characterEncoding=UTF-8对于已经建好的表,不起作用。 最好安装的时候就修改编码格式。三、zookeeper集群搭建1、修改配置文件使用zookeeper进行管理,zookeeper这里步骤跟操作一模一样,修改 zoo.cfg# the directory wh

35、ere the snapshot is stored.# do not use /tmp for storage, /tmp here is just# example sakes.dataDir=/home/zookeeper/data最后添加server.1=master.hadoop:2888:3888server.2=slave1:2888:3888server.3=slave2:2888:38882、创建文件夹mkdir /home/zookeeper/datacd /home/zookeeper/dataecho 1 myid$ scp -r /home/zookeeper/dat

36、a/myidrootslave1: /home/zookeeper/data/修改myid为2$ scp -r /home/zookeeper/data/myidrootslave3: /home/zookeeper/data/修改myid为33、分发到从节点$ scp -r zookeeper-3.4.6/rootslave1:/opt/$ scp -r zookeeper-3.4.6/rootslave2:/opt/4、启动并测试./bin/zkServer.sh start./bin/zkServer.sh statusMode: follower如果报错Error contacting

37、 service. It is probably not running查看zookeeper-3.4.6 下的zookeeper.out文件四、Hbase集群1、配置(1)hbase-env.sh修改export HBASE_MANAGES_ZK=falseexportJAVA_HOME=/opt/jdk1.8.0_66这表示不使用hbase自带的zookeeper,而使用外部的zookeeper(这里指我们在上面建的zookeeper)(2)hbase-site.xmlhbase.rootdirhdfs:/master.hadoop:9000/hbaseThe directory shar

38、ed by region perty.clientPort2181Property from ZooKeepers config zoo.cfg. The port at which the clients will connect.zookeeper.session.timeouthbase.zookeeper.quorummaster.hadoop,slave1,slave2hbase.tmp.dir/home/hbase/datahbase.cluster.distributedtrue(3)regionservers master.

39、hadoopslave1slave2(4)一些配置参数的解释perty.clientPort:指定zk的连接端口zookeeper.session.timeout:RegionServer与Zookeeper间的连接超时时间。当超时时间到后,ReigonServer会被Zookeeper从RS集群清单中移除,HMaster收到移除通知后,会对这台server负责的regions重新balance,让其他存活的RegionServer接管.hbase.zookeeper.quorum:默认值是localhost,列出zookeepr ensemble中的se

40、rvers2、启动/关闭bin/start-hbase.shbin/stop-hbase.sh3、测试验证(1)通过浏览器查看: 输入:http:/master.hadoop:16030(2)Hbase shell测试(1)执行hbase shell 命令:(2)创建testtable表 create testtable, colfaml(3)put数据4、注意事项HBase集群需要依赖于一个Zookeeper ensemble。HBase集群中的所有节点以及要访问HBase的客户端都需要能够访问到该Zookeeper ensemble。HBase自带了Zookeeper,但为了方便其他应用程

41、序使用Zookeeper,最好使用单独安装的Zookeeper ensemble。此外,Zookeeper ensemble一般配置为奇数个节点,并且Hadoop集群、Zookeeper ensemble、HBase集群是三个互相独立的集群,并不需要部署在相同的物理节点上,他们之间是通过网络通信的。需要注意的是,如果要禁止启动hbase自带的zookeeper,那么,不仅仅需要刚才的export HBASE_MANAGES_ZK=false配置,还需要hdfs-site.xml中的hbase.cluster.distributed为true,否则你在启动时会遇到Could not start

42、ZK at requested port of 2181错误,这是因为hbase尝试启动自带的zookeeper,而我们已经启动了自己安装的那个zookeeper,默认都使用2181端口,所以出错。还有,有时候会遇到stop-hbase.sh执行很长时间未结束,很可能的原因是你之前把zookeeper关闭了.最后,Hbase不需要mapreduce,所以只要start-dfs.sh启动hdfs,然后到zookeeper各节点上启动zookeeper,最后再hbase-start.sh启动hbase即可.2)停止HBase集群时报错如下:plainstopping hbasecat: /tmp/

43、hbase-mango-master.pid: No such file or directory 原因是,默认情况下pid文件保存在/tmp目录下,/tmp目录下的文件很容易丢失,解决办法:在hbase-env.sh中修改pid文件的存放路径java# The directory where pid files are stored. /tmp by default. export HBASE_PID_DIR=/var/hadoop/pids 五、Spark安装配置安装配置Scala1 下载scala下载解压scala包:略附:下载链接/dow

44、nload/2.10.4.html2 配置scala环境变量export SCALA_HOME=/usr/local/scala/scala-2.10.43 测试scala运行环境输入scala进入scala环境:测试:12*12 回车安装配置Spark1.6.01 下载Spark1.6.0根据Hadoop选择对应版本下载Spark附:下载链接/downloads.html2 配置Spark环境变量3 配置Sparkcp spark-env.sh.template spark-env.shvim spark-env.shcp slaves.templ

45、ate slavesvim slaves添加节点:4 启动Spark,查看集群状况cd /usr/local/spark/spark-1.6.0-bin-hadoop2.6sbin/start-all.shjps查看进程:多了一个Master和Worker进程启动:spark-shell测试运行:val file=sc.textFile(hdfs:/tmp/input/f1)val count=file.flatMap(line=line.split( ).map(word=(word,1).reduceByKey(_+_)count.collectSpark UI:六、Kafka安装1)下载

46、KAFKA$ wget /apache-mirror/kafka//kafka_2.9.2-.tgz安装和配置参考上一篇文章:/Linux/2014-09/.htm2)配置$KAFKA_HOME/config/perties我们安装3个broker,分别在3个vm上:master.hadoop,slave1,slave2修改每个节点的文件$ vi $KAFKA_HOME/config/pertiesbroker.id=0(依次递增)port=9092=master.hadoop(每个节点的host name)num.partitions=2zookeeper.contact= master.hadoop:2181, slave1:2181, slave2:21813)启动zookeeper服务在master.hadoop,slave1,slave2上分别运行:$ zkServer.sh start4)启动kafka服务 在master.hadoop,slave1,slave2上分别运行:$ bin/kafka-s

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论