文稿easyhadoop实战手册_第1页
文稿easyhadoop实战手册_第2页
文稿easyhadoop实战手册_第3页
文稿easyhadoop实战手册_第4页
文稿easyhadoop实战手册_第5页
已阅读5页,还剩43页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

EasyHadoop集群部署Hadoop 添加添加EasyHiveEasyHadoop集群部署文 文档概 背 #Hadoop试验集群的部署结 RedhatLinux基础环境搭 #linux安装(vm虚拟机 #配置集群hosts列 #并安装JAVAJDK系统软 #创建用户账号和Hadoop部 和数 Hadoop单机系统安装配 #Hadoop文件和解 #配置hadoop-env.sh环境变 #HadoopCommon组件配置core- #HDFSNameNode,DataNode组建配置hdfs- #Hadoop单机系统,启动执行和异常检 #通过执行Hadooppi运行样例检查集群是否成 #安装部署常见错 Hadoop集群系统配置安装配 #检查node节点linux基础环境是否正常,参考[linux基础环境搭建]一节 #配置从master机器到node节点无密钥登 #检查master到每个node节点在hadoop用户下使用密钥登陆是否正 #配置master集群服务器地址stop-all.shstart-all.sh的时候调 #通过执行Hadooppi运行样例检查集群是否成 Hive仓库集群部署文 Hive的作用和原理说 #Hive仓库流程 #hive内部结构 Hive部署和安 #安装Hadoop集群,看EasyHadoop安装文档 #解压Hive包并配置JDBC连接地址 #启动HivethriftServer #启动内置的HiveUI HiveCli的基本用 HQL基本语法(创建表,加载表,分析查询,删除表 使用Mysql构建简单数据集 #Mysql的两种引擎介 #创建一个数据表使用Hivecli进行数据分 #使用s编写Hsql并使用HiveCli导出数据,使用Mysql命令加载到数据库中 #使用crontab新增每日运行任务定时 使用FineReport数据展现数 #FineReport的问题和局 本文档是Hadoop部署文档,提供了Hadoop单机安装和Hadoop集群安装的方法和步骤,Hadoop安装部署更简单(Easy)centos5redhat5.232位,64位版本,ubuntuHadoophadoop程序,hdfs Apache hadoopNameNode,hadoopHDFS元数据主节点服务器,负责保存DataNode文件元数据信息JobTracker,hadoop的Map/Reduce调度器,负责与TackTracker通信分配计算任务并任务进度DataNode,hadoop数据节点,负责数据TaskTracker,hadoopMap,Reduce mountlinux部署路径:/opt/modules/hadoop/hadoop-HadoopVM网络结构(net,(hadoopfs+hadoopSSHHDFSHDFSJob任nodenode(n-点 RedhatLinux#linux(vm虚拟机vmwarenetrootcrontab–e001***/usr/sbin/ntpdate#hostnamemasterhostnamemaster#(hostname)vi/etc/sysconfig/network #setup/ #ip##AdvancedMicroDevices[AMD]79c970[PCnet32LANCE]/sbin/servicenetwork #ip#关闭如果不关闭报错如下 MetricsSystem,sub=Statsregistered.2012-07-1802:47:26,533ERRORorg.apache.hadoop.metrics2.impl.MetricsSystemImpl:Errorgettinglocalhostname.Using'localhost'...at#hostsvi#添加一下内容到vi#并安装JAVAJDK系统软#wget#jdkod vi#粘贴一下内容到vi中exportexportexportPATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH#Hadoopsucdssh-keygen-q-trsa-N""-f/home/hadoop/.ssh/id_rsacd.sshcatid_rsa.pub>authorized_keysodgo- #公钥:文件内容id_rsa.pub到#集群环境id_ras_pub到 -a #hadoop#hadoophadoop/usr/sbin/useraddhadoop-g#创建hadoop代 结mkdir-p#创建hadoop数 结mkdir-p 结构权限为为chown-Rhadoop:hadoop chown-Rhadoop:hadoop [hadoop@master[hadoop@masterroot]$ Link HWaddrinetinet inet6addr:fe80::20c:29ff:fe7a:de12/64UPBROADCASTRUNNING RXpackets:14errors:0dropped:0overruns:0frame:0TXpackets:821errors:0dropped:0overruns:0carrier:0collisions:0txqueuelen:1000RXbytes:1591(1.5 TXbytes:81925(80.0Interrupt:67Basemastersshmasterechoecho$HADOOP_HOMEHadoop#Hadoop文件和解#hadoopcd#从Hadoop安装文wget #如果已经请文件到安装hadoop文件cphadoop- #加压或者的Hadoop文cd/opt/modules/hadoop/#hadoop-env.sh#HadoopHADOOP_HEAPSIZE大小,1000,512m,这里配置较小。#配置压缩类库地址exportexport#HadoopCommoncore-#core-site.xml<!--hadoopnamenode服务器地址和端口, 形式--<!--hadoopsecondary数 editlog30 <!--editlog32m的时候触发一次合并 <!--Hadoop<description>Hadoop文件回收站,自动回收时间,单位分钟,1天。#HDFSNameNode,DataNodehdfs-<?xml<?xml-stylesheettype="text/xsl"<!--HDFSnamenodeimage文件保存地址 <!--HDFS数据文 路径,可以配置多个不同的分区和磁盘中,使用,号分隔--<description><!---HDFSWeb查看主机和端口<!--HDFSweb查看主机和端口<!--HDFS3 datanode1G,而非写满,bytes <!--HDFS128M/BlockHDFS<?xml-stylesheettype="text/xsl"<!--Putsite-specificpropertyoverridesinthisfile.--<!--MapReduce产生的中间文件数据,按照磁盘可以配置成多个MapReduce的系统控制文件 <!--最大map槽位数量,3 <!--reduce<!--reduce排序使用内存大小,100M,要小于mapred.child.java.optsmapreduceJVM=系统datanodetasktracker(mapreduce)16*? mapreduce#创建Hadoopmapred和hdfsnamenode和 在rootmkdir-pchown-Rhadoop:hadoop#hadoopsu mkdir-p/opt/data/hadoop/mapred/mrsystemmkdir-p/opt/data/hadoop/hdfs/namemkdir-p/opt/data/hadoop/hdfs/datamkdir-p#hadoopsu/opt/modules/hadoop/hadoop-1.0.3/bin/hadoopnamenode-#Masternode#启动#/opt/modules/hadoop/hadoop-1.0.3/bin/hadoop-daemon.shstartsecondarynamenode#启动DataNode&&TaskTracker:startstoptail-f/opt/modules/hadoop/hadoop-#namenodedatanode11#jobtrackertasktracker11#Hadooppi12/07/1510:50:48INFOmapred.FileInputFormat:Totalinputpaths12/07/1510:50:48INFOmapred.FileInputFormat:Totalinputpathstoprocess:12/07/1510:50:48INFO12/07/1510:50:49INFO12/07/1510:51:42INFO12/07/1510:52:07INFO12/07/1510:52:10INFO12/07/1510:52:11INFO12/07/1510:52:22INFO12/07/1510:52:28INFO12/07/1510:52:28INFOmapred.JobJobFinishedin100.608secondsEstimatedvalueofPiis:Runningjob:::::::map0%reducemap40%reducemap70%reducemap80%reducemap90%reducemap100%reduce::Virtualmemory(bytes)Mapoutput#namenodedatanodejobtrackertasktrackerhadoop-daemon.shstartxxxx #hdfshadoopfs-lshadoopfs-mkdir/data/hadoopfs-putxxx.log/data/#mapreducehadoopjarhadoop-examples-1.0.3.jarpi100#主机文件/etc/hostsIP错误。mapred-site.xml中任务分配过多或过少,导致效率降低或内存溢出。如果遇到服务无法启动。请检查 具体日志情况tail-n100$HADOOP_HOME/logs/*namenode* #检查namenode服务日志tail-n100$HADOOP_HOME/logs/*datanode* #检查datanode服务日志Tail-n100$HADOOP_HOME/logs/*jobtracker* #检查jobtracker服务日志Hadoop#nodelinux基础环境是否正常,linux基础环境搭建]#masternode#Hadoopsucdssh-keygen-q-trsa-N""-fcd/home/hadoop/.sshcatid_rsa.pub[hadoop@master.ssh]$cat[hadoop@master.ssh]$catid_rsa.pubyko/TtGNWVOtESBT8/Ya3wBzZd+Ef2ppsWuBbMOhvwB++gqlIfmM5UtYJkfYuUMr6SuQAJ1W6n+gA3VHRWIS2stlEVQ+F#id_rsa.pub公钥到authorized_keyscatid_rsa.pub>authorized_keys#master密钥权限,odgo-rwx#masterauthorized_keyscopynode1 #输入hadoop#node1odgo-rwx#验证本机无密钥登陆,如果无需算成功sshexitsshmasterexit#陆01,如果无需算成功sshexitsshnode1exit#masternodehadoopsu#mastermastersshmasterexit#masternode1sshnode1exit#masterstop-all.shstart-all.sh#hadoopsecondarynodehostname#secondary#datanodetasktracker#masterhadoop到node1node2节点服务器#hadoopsuscpropt/modules/hadoop/hadoop-1.0.3/node1:/opt/modules/hadoop/#登陆到node1节点上ssh mkdir-p/data/hadoop/mapred/mrsystemmkdir-p/data/hadoop/hdfs/namemkdir-p/data/hadoop/hdfs/dataodgo-w#namenodedatanode#jobtrackertasktrackerhadoopfs-lshadoopfs-mkdir#Hadooppi12/07/1510:50:48INFOmapred.FileInputFormat:Totalinputpaths12/07/1510:50:48INFOmapred.FileInputFormat:Totalinputpathstoprocess:12/07/1510:50:48INFO12/07/1510:50:49INFO12/07/1510:51:42INFO12/07/1510:52:07INFO12/07/1510:52:10INFO12/07/1510:52:11INFO12/07/1510:52:22INFO12/07/1510:52:28INFO12/07/1510:52:28INFOmapred.JobJobFinishedin100.608secondsEstimatedvalueofPiis:Runningjob:::::::map0%reducemap40%reducemap70%reducemap80%reducemap90%reducemap100%reduce::Virtualmemory(bytes)Mapoutput为加快服务器集群的安装和部署,会使用自动化安装安装。以下为自动化部署样例。中#红色部分具体参考以上配置做具体修改。本里面的安装包用于64位服务器安装,32位安装包需要单独修改。#master服务器自动安装 中并执行viyumyinstalllrzszgccgcc-clibstdc++-develntp#安装gccecho"01***root/usr/sbin/"/etc/crontab##/usr/sbin/groupaddhadoop新增hadoop/usr/sbin/useraddhadoopghadoopHadoophadoop#hadoop用户mkdir-p/opt/modules/hadoop/mkdir-p/opt/data/hadoop/#/etc/hostsipecho-e"\tlocalhost.local 6localhost6#1">/etc/hosts IP并替换host中collect-IP=`/sbin/ifconfigeth0|grep"inetaddr"|awk-F":"'{print$2}'|awk-F""'{print$1}'`sed-i"s/^\tcollect/${IP}\tcollect/g"/etc/hostsecho"----------------envinitfinishandpreparesu cdsudo-uhadoopmkdirssh-keygen-q-trsa-N""-fCd$HADOOP/.ssh&&echocatmasterid_rsa.pub$HADOOP/.ssh/authorized_keysodgo-rwx$HADOOP/.ssh/authorized_keys#修改文件权限cd Hadoopcd$HADOOP/hadooptarzxvfhadoop_gz.tar.gzrpm-ivhjdk-6u21-linux-amd64.rpmrpm-ivhlrzsz-0.12.20-19.x86_64.rpmtarxzvflzo-cdlzo-2.06&&./configure--enable-shared&&make&&makeinstallcp/usr/local/lib/liblzo2.*/usr/lib/cdtarxzvflzop-1.03.tar.gzcdlzop-1.03./configure&&make&&makeinstall&&cdchown-Rhadoop:hadoopchown-Rhadoop:hadoop/opt/modules/hadoop/chown-Rhadoop:hadoop/home/hadoop#相关LZOwgetwgetwgetwgetwget#LZO#lzocdlzo-2.06./configure&&make&&makeinstallcd../cdlzop-1.03./configure&&make&&make #mapred-#capacity-scheduler.xmlhivestreaming<?xml<?xml<!--ThisistheconfigurationfilefortheresourcemanagerinHadoop.--<!--Youcanconfigurevariousschedulingparametersrelatedtoqueues.--<!--Thepropertiesforaqueuefollowanamingconvention,suchas,--umnumberofjobsinthesystemwhichcanbeconcurrently,bythe<description>Percentageofthenumberofslotsintheclusterthataretobeavailableforjobsinthisqueue. um-capacitydefinesalimitbeyondwhichaqueuecannotusethecapacityoftheThisprovidesameanstolimithowmuchexcesscapacityaqueuecanuse.Bydefault,thereisnolimit. um-capacityofaqueuecanonlybegreaterthanorequaltoitsminimumcapacity.Defaultvalueof-1impliesaqueuecanusecompletecapacityoftheThispropertycouldbetocurtailcertainjobswhicharelongrunninginnaturefromoccupyingmorethanacertainpercentageofthecluster,whichintheabsenceofpre-emption,couldleadtocapacityguaranteesofotherqueuesbeingaffected.Oneimportantthingtonoteisthat um-capacityisapercentage,sobasedonthecluster'scapacitythemaxcapacitywouldchange.Soiflargenoofnodesorracksgetaddedtothecluster,maxCapacityinabsolutetermswouldincreaseaccordingly.<description>Iftrue,prioritiesofjobswillbetakenintoaccountinschedulingdecisions.<description>Eachqueueen salimitonthepercentageofresourcesallocatedtoauseratanygiventime,ifthereiscompetitionforthem.Thisuserlimitcanvarybetweenaminimumand umvalue.Theformerdependsonthenumberofuserswhohavesubmittedjobs,andthelatterissettothispropertyvalue.Forexample,supposethevalueofthispropertyis25.Iftwousershavesubmittedjobstoaqueue,nousercanusemorethan50%ofthequeueresources.Ifathirdusersubmitsajob,nosingleusercanusemorethan33%ofthequeueresources.With4ormoreusers,nousercanusemorethan25%ofthequeue'sresources.Avalueof100impliesnouserlimitsareimposed.<description>Themultipleofthequeuecapacitywhichcanbeconfiguredtoallowasingleusertoacquiremoreslots. umnumberoftasks,acrossalljobsinthequeue,whichcanbeinitializedconcurrently.Oncethequeue'sjobsexceedthislimittheywillbequeuedondisk. umnumberoftasksper-user,acrossalltheoftheuser'sjobsinthequeue,whichcanbeinitializedconcurrently.Oncetheuser'sjobsexceedthislimittheywillbequeuedondisk.<description>Themultipeof( um-system-jobs*queue-capacity)usedtodeterminethenumberofjobswhichareacceptedbythescheduler.<!--Thedefaultconfigurationsettingsforthecapacitytaskscheduler--<!--Thedefaultvalueswouldbeappliedtoallthequeueswhichdon'thave--<!--theappropriatepropertyfortheparticularqueue--<description>Iftrue,prioritiesofjobswillbetakenaccountinschedulingdecisionsbydefaultinajob<description>Thepercentageoftheresourceslimitedtoaparticularuserforthejobqueueatanygivenpointoftimebydefault.<description>Thedefaultmultipleofqueue-capacitywhichisusedtodeterminetheamountofslotsasingleusercanconsumeconcurrently. <description>Thedefault umnumberoftasks,acrossalljobsinthequeue,whichcanbeinitializedconcurrently.Oncethequeue'sjobsexceedthislimittheywillbequeuedondisk. <description>Thedefault umnumberoftasksper-user,acrossalltheoftheuser'sjobsinthequeue,whichcanbeinitializedconcurrently.Oncetheuser'sjobsexceedthislimittheywillbequeuedon<description>Thedefaultmultipeof( um-system-jobs*queue-capacity)usedtodeterminethenumberofjobswhichareacceptedbythescheduler.<!--CapacityschedulerJobInitializationconfigurationparameters--<description>Theamountoftimeinmilisecondswhichisusedtopollthejobqueuesforjobstoinitialize.<description>NumberofworkerthreadswhichwouldbeusedbyInitializationpollertoinitializejobsinasetofqueue.Ifnumbermentionedinpropertyisequaltonumberofjobqueuesthenasinglethreadwouldinitializejobsinaqueue.Iflesserthenathreadwouldgetasetofqueuesassigned.Ifthenumberisgreaterthennumberofthreadswouldbeequaltonumberofjobqueues. #hadoopcore-site.xml#RackAware.py#-*-coding:UTF-8-*-importsysrack={"hadoopnode-101":"rack1",}ifname=="mainprint"/"+#Hadoop#core-site.xml为公共配置,hdfs-site.xmlmapred-site.xmlhdfsmapreduceHEAPSIZE辅助NameNode检查点,分别到各 editlogfsimage,合并触发周期30editlogfsimage,合并触发日志大小32M24 LZONameNode上持久化元数据和事务日志的路径指定多个 使用NFS在加载一个,以便后续主机宕机,快速DataNode上数据块的地方。如果指定多个 HDFSsecondarynamenodehttp数据的份HDFS支持文件append,主要是支持JobtrackerRPC 运行在tasktrackermap4运行在tasktrackerreduce(MAP+RED=CPU*2)JVM 使用Lzo加载Lzo为reducemapreduce默认:5,reducecopyDataNodeTaskTrackerHostNamedefault.htmldefault.htmldefault.html111核ID21核ID31核ID42核6Nagios1核7HiveMetaStoreDBServer(Mysql)2核5N核62核SATA 开源的数据仓库系统,可以基于Hql语句操作Hadoop集群数据Mysql,开源数据库系统,Hive原数据使用FineReportalexa国际知名统Sogou数据资源 属于互为补充的关系,相比传统数据仓库技术,HadoopWEBWEBNginxExcel用户HivehttpLoadhttphttp#hiveHHveCMysqlHiveJDBC驱动Hive#安装Hadoop集群,看EasyHadoop集群部署章节--with--without--make&&make-----'--with-plugin-innodb_plugin''--without-ndbcluster''--with-plugin-innodb_plugin''--without-ndbcluster''--with-archive-storage-engine''--with-csv-storage- '--without-plugin-ftexample''--with-partition''--with-big-tables''--with-zlib-dir=bundled''--enable-shared''CC=gcc''CFLAGS=-O2-g-pipe''LDFLAGS=''CXX=gcc''CXXFLAGS=-O2-g-pipe-felide-constructors-fno-exceptions-fno-rtti' ''--'-- '--with-fast-mutexes''--with-mysqld-user=mysql''--with-unix-socket-path=/var/lib/mysql/mysql.sock''--with- 地址#mysqlrpm-rpm-#启动mysql[root@master~]#/sbin/service[root@master~]#/sbin/servicemysqlShuttingdownMySQL..StartingMySQL. #Mysqlmysqlmysql-uroot-##Hive 法。例如,myusermypasswordmysql[root@master~]#mysql-uhive-phive-ERROR1130(HY000):Host'::ffff:00'isnotallowedtoconnecttothisMySQLGRANTALLPRIVILEGESON*.*TO'hive'@'%'IDENTIFIEDBY'hive'WITHGRANTOPTIONGRANTALLPRIVILEGESON*.*TO'hive'@'192.168.1.%'IDENTIFIEDBY'hive'WITHGRANT#HiveJDBCmkdirmkdir- cd/opt/modules/tar-xzvfhive- #HiveexportHADOOP_HEAPSIZE=64默认exportHADOOP_HEAPSIZE=64默认vi/opt/modules/hive/hive-0.9.0/conf/hive-site.xml<description>JDBCconnectstring<description>JDBCconnectstringforaJDBC<description>DriverclassnameforaJDBC<description>usernametouseagainstmetastore<description>passwordtouseagainstmetastore#添加MysqlHive用户名和,创建Hive仓#登陆mysqlcreatedatabasehive;#Hivethrift#hive/opt/modules/hive/hive-0.9.0/bin/hiveservicehiveserver10001[root@hadoop-231bin]#netstat-nap|grep[root@hadoop-231bin]#netstat-nap|grep 0HiveCli#hivehive[root@hadoop-231[root@hadoop-231bin]#./hivehive>showdatabases;Timetaken:3.103seconds#HQLviuseuseselect*fromtest_textlimit#查询文件查询方[root@hadoop-231bin]#./hive-f[root@hadoop-231bin]#./hive-fTimetaken:3.306./hivee"select*fromtest.test_textlimit30"HQL(创建表,加载表,分析查询,删除表 #hiveCli模式CREATEdatabasealexa;建库usealexa; uid3(uidstring)PARTITIONEDBY(dtSTRING)ROWFORMATDELIMITEDFIELDSTERMINATEDBYcollectionitemsterminatedby"\n"STOREDASTEXTFILECOMMENT'alexaSTRINGCOMMENTROWFORMATDELIMITEDFIELDSTERMINATEDBYcollectionitemsterminatedby"\n"STOREDASTEXTFILELOAD LOADDATA loaddatainpath'/data/uid.txt'overwriteintotableuid;SELECT*FROMalexa.top100wlimitselect*fromalexa.top100wwhere STRING)COMMENT'alexaCREATECREATEEXTERNALTABLEtop100w(idCOMMENT'alexaROWFORMATDELIMITEDFIELDSTERMINATEDBYcollectionitemsterminatedby"\n"STOREDASTEXTFILESTRINGCOMMENT#putcopyFromLocalHadoopfs-mkdir/data/dw/alexa/top100w/hadoopfsputroot/top-1m.csvdata/dw/alexa/top100w/Select*fromalexa.top100wlimitselect*fromtop100wlimit10简单查询selectafromtop100wlimit wheregroupbyorderbyunionallINSERTOVERWRITETABLEINSERTOVERWRITEDIRECTORY'/user/SELECTa.key,a.valueFROMaWHEREa.keyin(SELECTb.keyFROMB)->SELECTa.key,a.valFROMaLEFTSEMIJOINbon(a.key=b.key)Hive函数查询方法查询地

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论