第08周数据仓库hive从入门到小牛附件3基础使用丨教程

上传人：洞*** IP属地：北京上传时间：2023-05-21 格式：DOCX 页数：11 大小：137.83KB 积分：12 举报 版权申诉

已阅读5页，还剩6页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

慕课网首免费课实战课金职慕课专手从所有的词条中查询Java3Hive快速了解

徐老师·更新于2020-08- 上一节2数据库与数据 4Hive实战下一Hive的使操作Hv可以在S 命令行下操作，或者是使用JDBC代码的方式操下面先来看一下在命令行中操作的方式针对命令行这种方式，其实还有两种使第一个是使用下的hive命令，这个是从hive一开始就支持的使用方后来又出现一个beeline命令，它是通过HveSrver2服务连接hiv，它是一个轻量级的客户端工具，所以后来开始推荐使用这个。具体使用哪个我觉得属于个人的一个习惯问题，特别是一些做了很多年大数据开发的人，已经习惯了使hive命令，如果让我使用beeline会感觉有点别针对我们写的ves通过哪一种客户端去执行结果都是一样的，没有任何区别，所以在这里我们使用哪个就无所谓了。1：先看第一种，这种直接就可以连[root@bigdata04apache-hive-3.1.2-bin]#Hive-on-MRisdeprecatedinHive2andmaynotbeavailableinthefutureHiveSessionID=32d36bcb-21b8-488f-8c13-这里有一行信息提示，从Hve开始Hv--MR就过时了，并且在以后的版本中可能就不了，建议使用其它的计算引擎，例如：sprk或者z如果你确实想使用MapReduce引擎，那建议你使用Hive1.x的版下面以v开头的内容就说明我们进入了Hve令行，在这里可以写HvSQ了khive>showTimetaken:0.18创建一个hive>createtablet1(idint,nameTimetaken:1.064再查看有哪些hive>showTimetaken:0.18向表里面添加数据，注意，此时就产生了MapReduce任hive>insertintot1(id,name)QueryID=root_20200506162317_02b2802a-5640-4656-88e6-Totaljobs=LaunchingJob1outofNumberofreducetasksdeterminedatcompiletime:Inordertochangetheaverageloadforareducer(insetInordertolimittheumnumberofsetInordertosetaconstantnumberofsetStartingJob=job_1588737504319_0001,TrackingURL=KillCommand=/data/soft/hadoop-3.2.0/bin/mapredjob-killHadoopjobinformationforStage-1:numberofmappers:1;numberof2020-05-0616:23:36,954Stage-1map=0%,reduce=2020-05-0616:23:47,357Stage-1map=100%,reduce=0%,CumulativeCPU2020-05-0616:23:56,917Stage-1map=100%,reduce=100%,CumulativeCPUMapReduceTotalcumulativeCPUtime:5seconds160EndedJob=Stage-4isselectedbyconditionStage-3isfilteredoutbyconditionStage-5isfilteredoutbyconditionMovingdatatodirectoryhdfs://bigdata01:9000/user/hive/warehouse/t1/.hive-LoadingdatatotableMapReduceJobsStage-Stage-1:Map:1Reduce: CumulativeCPU:5.16 HDFSRead:TotalMapReduceCPUTimeSpent:5seconds160Timetaken:41.534查询数据，为什么这时没有产生mprece任务呢？因为这个计算太简单了，不需要经过arec任务就可以获取到结果，直接表对应的数据文件就可以了。11234hive>select*from1Timetaken:2.438seconds,Fetched:1删除hive>droptableTimetaken:0.89可以输入quit退出hive令行，或者直接按ctrl+c也可以退 hive>2：接着看一下第二种方[root@bigdata0apachehive3.1.2bin]which:nohbasein(.:/data/soft/jdk1.8/bin:/data/soft/hadoop-2020-05-0616:43:11:StartingSLF4J:ClasspathcontainsmultipleSLF4JSLF4J:Foundbindingin[jar:file:/data/soft/apache-hive-3.1.2-SLF4J:Foundbindingin[jar:file:/data/soft/hadoop- SLF4J:SeeforanexSLF4J:ActualbindingisoftypeHiveSessionID=008af6a0-4f7a-47f0-b45a-HiveSessionID=670a0c62-7744-4949-a25f-HiveSessionID=7aa43b1a-eafb-4848-9d29-HiveSessionID=a5c20828-7f39-4ed6-ba5e-注意了，启动hiveserver2服务之后，最下面会输出几行HiveSessionID的信息，一定要等到输出hiveserver2默认会本机的10000端口，所以命令是这样bin/beeline-u当hiveserver2服务没有真正启动成功之前连接会提示这样的信[root@bigdata04apache-hive-3.1.2-bin]#bin/beeline-uConnectingto20/05/0616:44:21[main]:WARNjdbc.HiveConnection:FailedtoconnecttoCouldnotopenconnectiontotheHS2server.PleasechecktheserverURIError:CouldnotopentransportwithJDBCUri:Beelineversion3.1.2byApache等待hiveserver2服务真正启动之后再连接，此时就可以连接进[root@bigdata04apache-hive-3.1.2-bin]#bin/beeline-uSLF4J:ClasspathcontainsmultipleSLF4JSLF4J:Foundbindingin[jar:file:/data/soft/apache-hive-3.1.2-SLF4J:Foundbindingin[jar:file:/data/soft/hadoop- SLF4J:SeeforanexSLF4J:ActualbindingisoftypeConnectingtoConnectedto:ApacheHive(versionDriver:HiveJDBC(versionTransactionisolation:Beelineversion3.1.2byApache0:接着尝试一下建0:jdbc:hive2://localhost:10000>createtablet1(idint,nameNorowsaffected(2.459添加数 0:jdbc:hive2://localhost:10000>insertintot1(id,name)43d1-INFO:SemanticysisCompleted(retrial=INFO:ReturningHiveschema:Schema(fieldSchemas:[FieldSchema(name:_col0,INFO:Completedcompilingcommand(queryId=root_20200506172404_54b9bfc4-INFO:Concurrencymodeisdisabled,notcreatingalockINFO:Executingcommand(queryId=root_20200506172404_54b9bfc4-d67f-43d1-WARN:Hive-on-MRisdeprecatedinHive2andmaynotbeavailableintheINFO:QueryID=root_20200506172404_54b9bfc4-d67f-43d1- INFO:Totaljobs=INFO:LaunchingJob1outofINFO:Startingtask[Stage-1:MAPRED]inserialINFO:Numberofreducetasksdeterminedatcompiletime:INFO:Inordertochangetheaverageloadforareducer(inINFO setINFO:InordertolimittheumnumberofINFO setINFO:InordertosetaconstantnumberofINFO setINFO:Cleaningupthestagingarea/tmp/hadoop-ERROR:JobStocolPB.to.atatatatatjava.security.AccessController.doPrivileged(Nativeatatat org.apache.hadoop.security.AccessControlException:Permissiondenied:发现添加数据报错，提示用户对/tmp/hadoop-yarn没有写权限给hfs中的/tmp/hadoop-yarn设置77权限，让用户具备权限可以直接给tmp及下面的所有设置77权限hdfsdfs-od-R777在启动beeline的时候指定一个对这有操作权限的用bin/beeline-ujdbc:hive2://localhost:10000-n在这我就直接指定一个有操作权限的用户[root@bigdata04apache-hive-3.1.2-bin]#bin/beeline-uConnectingtoConnectedto:ApacheHive(versionDriver:HiveJDBC(versionTransactionisolation:B 312 h

INFO:Compilingcommand(queryId=root_20200506174646_0087c68d-289a-4fb8-INFO:Concurrencymodeisdisabled,notcreatingalockINFO:SemanticysisCompleted(retrial=INFO:ReturningHiveschema:Schema(fieldSchemas:[FieldSchema(name:_col0,INFO:Completedcompilingcommand(queryId=root_20200506174646_0087c68d-INFO:Concurrencymodeisdisabled,notcreatingalockINFO:Executingcommand(queryId=root_20200506174646_0087c68d-289a-4fb8-WARN:Hive-on-MRisdeprecatedinHive2andmaynotbeavailableintheINFO:QueryID=root_20200506174646_0087c68d-289a-4fb8-89d4-INFO:Totaljobs=INFO:LaunchingJob1outofINFO:Startingtask[Stage-1:MAPRED]inserialINFO:Numberofreducetasksdeterminedatcompiletime:INFO:Inordertochangetheaverageloadforareducer(inINFO setINFO:InordertolimittheumnumberofINFO setINFO:InordertosetaconstantnumberofINFO setINFO:numberofINFO:Submittingtokensforjob:INFO:Executingwithtokens:INFO:Theurltotrackthejob:INFO:StartingJob=job_1588756653704_0002,TrackingURL=INFO:KillCommand=/data/soft/hadoop-3.2.0/bin/mapredjob-killINFO:HadoopjobinformationforStage-1:numberofmappers:1;numberofINFO:2020-05-0617:47:01,940Stage-1map=0%,reduce=INFO:2020-05-0617:47:09,397Stage-1map=100%,reduce=0%,INFO:2020-05-0617:47:18,642Stage-1map=100%,reduce=100%,INFO:MapReduceTotalcumulativeCPUtime:3seconds240INFO:EndedJob=INFO:Startingtask[Stage-7:CONDITIONAL]inserialINFO:Stage-4isselectedbyconditionINFO:Stage-3isfilteredoutbyconditionINFO:Stage-5isfilteredoutbyconditionINFO:Startingtask[Stage-4:MOVE]inserialINFO:MovingdatatodirectoryINFO:Startingtask[Stage-0:MOVE]inserialINFO:Loadingdatatotabledefault.t1fromINFO:Startingtask[Stage-2:STATS]inserialINFO:MapReduceJobsINFO:Stage-Stage-1:Map:1Reduce: CumulativeCPU:3.24 HDFSINFO:TotalMapReduceCPUTimeSpent:3seconds240INFO:Completedexecutingcommand(queryId=root_20200506174646_0087c68d-INFO:INFO:Concurrencymodeisdisabled,notcreatingalockNorowsaffected(35.069查询数0:jdbc:hive2://localhost:10000>select*fromINFO:Compilingcommand(queryId=root_20200506174821_360087a8-eb30-49d3-INFO:Concurrencymodeisdisabled,notcreatingalockINFO:SemanticysisCompleted(retrial=INFO:ReturningHiveschema:Schema(fieldSchemas:[FieldSchema(name:t1.id,INFO:Completedcompilingcommand(queryId=root_20200506174821_360087a8-INFO:Concurrencymodeisdisabled,notcreatingalockINFO:Executingcommand(queryId=root_20200506174821_360087a8-eb30-49d3-INFO:Completedexecutingcommand(queryId=root_20200506174821_360087a8-INFO: didibl l |t1.id| | | 1rowselected(1.2110:此时使用bin/hive命令行查看也是可以的，这两种方式的是同一份11234hive>select*from1Timetaken:2.438seconds,Fetched:1注意：在beeline后面指定hiveserver2的地址的时候，可以指定当前机器的内网ip也是可以的[root@bigdata04apache-hive-3.1.2-bin]#bin/beeline-uConnectingtoConnectedto:ApacheHive(versionDriver:HiveJDBC(versionTransactionisolation:Beelineversion3.1.2byApache0:退出beeline客户端，按ctrl+c即可后面我们使用的时候我还是使用hv命令，已经习惯用这个了，还有一个就是大家如果也用这个的话，别人是不是感觉你也是老了，但是你要知道目前是推荐使用ben命令的在工作中我们如果遇到了每天都需要执行令，那我肯定想要把具体的执行s写到中去执行，但是现在这种用法每次都需要开启一个会话，好像还没办法把命令写到中。注意了，ve后面可以使用-e命令，这样这条ve命令就可以放到中定时调度执行了因为这样每次v[root@bigdata04apache-hive-3.1.2-bin]#bin/hive-e"select*fromHiveSessionID=efadf29a-4ed7-4aba-84c8-3Logginginitializedusingconfigurationinjar:file:/data/soft/apache-HiveSessionID=65b9718b-4030-4c0f-a557- Timetaken:3.263seconds,Fetched:1[root@bigdata04apache-hive-3.1.2-当然了beeline也可以，后面也是跟一个-e参[root@bigdata04apache-hive-3.1.2-bin]#bin/beeline-uConnectingtojdbc:hive2://192168182Driver:HiveJDBC(versionTransactionisolation:INFO:Compilingcommand(queryId=root_20200506191420_9132f97a-af65-4f5a-INFO:Concurrencymodeisdisabled,notcreatingalockINFO:SemanticysisCompleted(retrial=INFO:ReturningHiveschema:Schema(fieldSchemas:[FieldSchema(name:t1.id,INFO:Completedcompilingcommand(queryId=root_20200506191420_9132f97a-INFO:Concurrencymodeisdisabled,notcreatingalockINFO:Executingcommand(queryId=root_20200506191420_9132f97a-af65-4f5a-INFO:Completedexecutingcommand(queryId=root_20200506191420_9132f97a-INFO:INFO:Concurrencymodeisdisabled,notcreatingalock |t1.id| | | 1rowselected(0.307Beelineversion3.1.2byApacheClosing:0:[root@bigdata04apache-hive-3.1.2-此时我们再把hive的配置到path环境变量中，在直接使用hive或者beeline就可以[root@bigdata04apache-hive-3.1.2-bin]#viexportexportHADOOP_HOME=/data/soft/hadoop-exportHIVE_HOME=/data/soft/apache-hive-3.1.2-export[root@bigdata04apache-hive-3.1.2-bin]#sourceJDBC这种方式也需要连接vsrvr2服务，前面我们已经启动了vesrvr服务，在这里直接使用就可以了创建maven项目在pom中添加hive-jdbchive-jdbc<artifactId>hive-开发代码，创建包名：创建类名代码如下 package2importimportimportimport *JDBC代码操作*注意：需要先启动hiveserver2服*CreatedbypublicclassHiveJdbcDemopublicstaticvoidmain(String[]args)throws//指定hiveserver2StringjdbcUrl=//获取jdbc连接，这里的user使用root，就是linux中的用户名，password随便指Connectionconn=DriverManager.getConnection(jdbcUrl,"root",//获取Statementstmt=//指定查询的Stringsql="select*from//执行ResultSetres=//循环结while 执行代码，可以看到查询出来的结果，但是会打印出来一堆红色的警告信SLF4J:ClasspathcontainsmultipleSLF4JSLF4J:Foundbindingin[jar:file:/D:/.m2/org/apache/logging/log4j/log4j-SLF4J:Foundbindingin[jar:file:/D:/.m2/org/slf4j/slf4j-SLF4J:SeeforanexSLF4J:ActualbindingisoftypeERRORStatusLoggerNolog4j2configurationfilefound.Usingdefault 分析上面的警告信息，发现现在是有两个lgj的实现类，需要去掉一个，还有就是缺少lgjo42的配置文件是x格式的，不是roris1:去掉多余的log4j依赖，从日志中可以看到日志的路org/apache/logging/log4j/log4j-slf4j-impl/2.10.0/log4j-slf4j-impl-org/apache/logging/log4j/log4j-slf4j-impl/2.10.0/log4j-slf4j-impl-org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-这两个去掉哪个都可以，这两个都是hve-jdbc这个依赖带过来的，所以需要修改pom文件中hve-j的依赖hive-jdbc<artifactId>hive-去掉log4j依赖 2：在项目的中增加log4j2.xml配置文<?xml<?xmlversion="1.0"encoding="UTF-<Configuration<Consolename="Console"<PatternLayoutpattern="%d{YYYY-MM-ddHH:mm:ss}[%t]%-5p<Root<AppenderRefref="Console"再执行代码，打印的就只有结果1在vs其实就是临时修改v-sx中参数的值不过通过set命令设置的参数只在当前会话有效，退出重新打开如果想要对当前机器上的当前用户有效的话可以把命令配置在~/.hiverc文件所以总结一下，使用st命令配置的参数是当前会话有效，在/.hvrc文件中配置的是当前机器中的当前用户有效，而在ve-sexl中配置的则是永久有效了，在hive-site.xml中有一个参数是hive.cli.print.current.db，这个参数可以显示当前所在的在这里我们设置为hive>sethive.cli.print.current.db=hive还有一个参数hive.cli.print.header可以控制获取结果的时候显示字段名称，这样看起来会比较清hive(default)>select*from Timetaken:0.184seconds,Fetched:1hive(default)>sethive.cli.print.header=hive(default)>select*from Timetaken:0.202seconds,Fetched:1这些参数属于的个人习惯，所以我希望把这个配置放到我个人用户下修改~/.hiverc，我们每次在进入hive命令行的时候都会加载当前用下的.hiverc文件中的内[root@bigdata04apache-hive-3.1.2-bin]#visethive.cli.print.current.db=sethive.cli.print.header=这个时候重新进来确认一下效 hive如果我们想查看一下hive的历史操作命令如何查看呢linux中有一个history命令可以查看历史操作命hive中也有类似的功能，hive中的历史命令会在当前用下的 [root@bigdata04[root@bigdata04apache-hive-3.1.2-bin]#moreshowHive的日[root@bigdata04conf]#which:nohbasein(.:/data/soft/jdk1.8/bin:/data/soft/hadoop-SLF4J:ClasspathcontainsmultipleSLF4JSLF4J:Foundbindingin[jar:file:/data/soft/apache-hive-3.1.2-SLF4J:Foundbindingin[jar:file:/data/soft/hadoop- SLF4J:SeeforanexSLF4J:ActualbindingisoftypeHiveSessionID=3b4db1d6-d283-48a7-b986-9Logginginitializedusingconfigurationinjar:file:/data/soft/apache-hive-HiveSessionID=83532edf-47e9-47ef-87fc-Hive-on-MRisdeprecatedinHive2andmaynotbeavailabl

人人文库> 全部分类> 教育资料 > 课件下载

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

第08周数据仓库hive从入门到小牛附件3基础使用丨教程

文档简介

温馨提示

最新文档

评论

第08周数据仓库hive从入门到小牛附件3基础使用丨教程

文档简介

温馨提示

最新文档

评论

相关文档