HIVE安装使用

hive-0.10.0-bin.tar.gz，解压到hadoop目录下，编辑 hive/bin/hive-config.sh文件配置如下

export HIVE_HOME=/home/ssy/hadoop/hive
export HADOOP_HOME=/home/ssy/hadoop
export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-amd64

执行bin/hive测试hive是否可用

hive> show tables;
hive> create table log(day int, bytes int, tag string, user string);
hive> describe table log;
hive> drop table log;

创建测试数据样本test.log

20121221  04567 user s00001
20121221  75531 user s00003
20121222  52369 user s00002
20121222  01297 user s00001
20121223  61223 user s00002
20121223  33121 user s00003

将数据导入到hive中

hive> create table log(day int, bytes int, tag string, user string) row format delimited fields terminated by ' ';
hive> load data local inpath '../test.log' into table log;
//清空以前的数据表数据
//load data local inpath '../test.log' overwrite into table log;

//需要指定间隔符为' '，不然加载后会出现null
//hive> select * from log;
//OK
//NULL    NULL    NULL    NULL
//NULL    NULL    NULL    NULL
//NULL    NULL    NULL    NULL
//NULL    NULL    NULL    NULL
//NULL    NULL    NULL    NULL
//NULL    NULL    NULL    NULL

hive> select * from logs;
OK
20121221        4567    user    s00001
20121221        75531   user    s00003
20121222        52369   user    s00002
20121222        1297    user    s00001
20121223        61223   user    s00002
20121223        33121   user    s00003

这样数据就存储到了hadoop集群中，不用我们手动进行fs文件操作。hive的元数据存储在本机metastore_db文件夹中，这样就可以像操作数据库一样对数据进行查询

查找最大值

hive> select day, max(bytes) from logs group by day;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201305221738_0002, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid                                                                                                               =job_201305221738_0002
Kill Command = /root/hadoop/libexec/../bin/hadoop job  -kill job_201305221738_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2013-05-22 18:45:04,552 Stage-1 map = 0%,  reduce = 0%
2013-05-22 18:45:07,586 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.96 sec
2013-05-22 18:45:08,596 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.96 sec
2013-05-22 18:45:09,608 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.96 sec
2013-05-22 18:45:10,620 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.96 sec
2013-05-22 18:45:11,628 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.96 sec
2013-05-22 18:45:12,640 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.96 sec
2013-05-22 18:45:13,648 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.96 sec
2013-05-22 18:45:14,665 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.96 sec
2013-05-22 18:45:15,673 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 0.96 sec
2013-05-22 18:45:16,686 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.3 sec
2013-05-22 18:45:17,698 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.3 sec
2013-05-22 18:45:18,713 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.3 sec
MapReduce Total cumulative CPU time: 3 seconds 300 msec
Ended Job = job_201305221738_0002
MapReduce Jobs Launched:
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 3.3 sec   HDFS Read: 371 HDFS Write: 45 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 300 msec
OK
20121221        75531
20121222        52369
20121223        61223
Time taken: 21.278 seconds

计算总和

hive> select day, sum(bytes) from logs group by day;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201305221738_0003, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid                                                                                                               =job_201305221738_0003
Kill Command = /root/hadoop/libexec/../bin/hadoop job  -kill job_201305221738_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2013-05-22 18:46:03,892 Stage-1 map = 0%,  reduce = 0%
2013-05-22 18:46:06,911 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.93 sec
2013-05-22 18:46:07,919 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.93 sec
2013-05-22 18:46:08,928 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.93 sec
2013-05-22 18:46:09,935 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.93 sec
2013-05-22 18:46:10,943 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.93 sec
2013-05-22 18:46:11,952 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.93 sec
2013-05-22 18:46:12,960 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.93 sec
2013-05-22 18:46:13,967 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.93 sec
2013-05-22 18:46:14,974 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 0.93 sec
2013-05-22 18:46:15,983 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.38 sec
2013-05-22 18:46:16,990 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.38 sec
2013-05-22 18:46:18,004 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.38 sec
MapReduce Total cumulative CPU time: 3 seconds 380 msec
Ended Job = job_201305221738_0003
MapReduce Jobs Launched:
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 3.38 sec   HDFS Read: 371 HDFS Write: 45 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 380 msec
OK
20121221        80098
20121222        53666
20121223        94344
Time taken: 20.743 seconds

一些简单的操作就可以不手动创建mapreduce任务，而用hive直接统计就行了，非常方便，如果要将结果保存为本地文件，可以执行如下命令

bin/hive -e "select day, sum(bytes) from logs group by day" >> res.csv

还有貌似这个版本hive只能同时支持一个终端进入hive操作，再有终端进入hive操作会提示如下错误。希望新的版本能解决这个问题

FAILED: Error in metadata: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

See Also : Hadoop权威指南第12章 Hive简介

猜你喜欢