





create database myhive;
use myhive;
create table emp(id int,name string);

那么hive会将元数据存储在数据库中。Hive 中的元数据包括表的名字,表的列和分区及其属性,表的属性(是否为外部表等),表的数据所在目录等。
对于上面的库和表来说,会在hdfs上创建/user/hive/warehouse/myhive.db这样的目录结构,而表的信息则可以自己上传个文件比如图中的emp.data到/user/hive/warehouse/myhive.db目录下。那么就可以写sql进行查询了(注:写查询语句写的是myhive这张表不删emp.data,如select * from myhive,但是查询到的是emp.data中的信息,两者结合可以理解为传统数据库的某张表),而这些元数据信息都会存储到外部的数据库中(如mysql,当然也可以使用内嵌的derby,不推荐使用derby毕竟是内嵌的不能共享信息)。

select id,name from emp where id>2 order by id desc;

那么是怎么执行的呢?查询语句交给hive,hive利用解析器、优化器等(图中表示Compiler),调用mapreduce模板,形成计划,生成的查询计划存储在 HDFS 中,随后由Mapreduce程序调用,提交给job放在Yarn上运行。


1、Hive中所有的数据都存储在 HDFS 中,没有专门的数据存储格式(可支持Text,SequenceFile,ParquetFile,RCFILE等)
2、只需要在创建表的时候告诉 Hive 数据中的列分隔符和行分隔符,Hive 就可以解析数据。
3、Hive 中包含以下数据模型:DB、Table,External Table,Partition,Bucket。
external table:外部表, 与table类似,不过其数据存放位置可以在任意指定路径
普通表: 删除表后, hdfs上的文件都删了
External外部表删除后, hdfs上的文件没有删除, 只是把文件删除了
bucket:桶, 在hdfs中表现为同一个表目录下根据hash散列之后的多个文件, 会根据不同的文件把数据放到不同的文件中



[root@mini1 ~]# cd apps/hive/bin
[root@mini1 bin]# ll
总用量 32
-rwxr-xr-x. 1 root root 1031 4月  30 2015 beeline
drwxr-xr-x. 3 root root 4096 10月 17 12:38 ext
-rwxr-xr-x. 1 root root 7844 5月   8 2015 hive
-rwxr-xr-x. 1 root root 1900 4月  30 2015 hive-config.sh
-rwxr-xr-x. 1 root root  885 4月  30 2015 hiveserver2
-rwxr-xr-x. 1 root root  832 4月  30 2015 metatool
-rwxr-xr-x. 1 root root  884 4月  30 2015 schematool
[root@mini1 bin]# ./hive

但是界面并不好看,而hive也可以发布为服务(Hive thrift服务),然后可以使用hive自带的beeline去连接。如下


[root@mini1 bin]# ./hiveserver2


[root@mini1 bin]# ./beeline 
Beeline version 1.2.1 by Apache Hive
beeline> [root@mini1 bin]# 
[root@mini1 bin]# ./beeline 
Beeline version 1.2.1 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000: root
Enter password for jdbc:hive2://localhost:10000: ******
Connected to: Apache Hive (version 1.2.1)
Driver: Hive JDBC (version 1.2.1)
0: jdbc:hive2://localhost:10000>


Error: Failed to open new session: java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=EXECUTE, inode="/tmp":hadoop3:supergroup:drwx------

./hadoop dfs -chmod -R 777 /tmp


0: jdbc:hive2://localhost:10000> show databases;
| database_name  |
| default        |
1 row selected (1.456 seconds)


0: jdbc:hive2://localhost:10000> create database myhive;
No rows affected (0.576 seconds)
0: jdbc:hive2://localhost:10000> show databases;
| database_name  |
| default        |
| myhive         |
0: jdbc:hive2://localhost:10000> use myhive;
No rows affected (0.265 seconds)
0: jdbc:hive2://localhost:10000> show tables;
| tab_name  |



0: jdbc:hive2://localhost:10000> drop table emp;
No rows affected (1.122 seconds)
0: jdbc:hive2://localhost:10000> show tables;
| tab_name  |
0: jdbc:hive2://localhost:10000> create table emp(id int,name string)
0: jdbc:hive2://localhost:10000> row format delimited
0: jdbc:hive2://localhost:10000> fields terminated by ',';
No rows affected (0.265 seconds)
0: jdbc:hive2://localhost:10000> 

[root@mini1 ~]# hadoop fs -put sz.data /user/hive/warehouse/myhive.db/emp
0: jdbc:hive2://localhost:10000> select * from emp;
| emp.id  | emp.name  |
| 1       | zhangsan  |
| 2       | lisi      |
| 3       | wangwu    |
| 4       | furong    |
| 5       | fengjie   |


0: jdbc:hive2://localhost:10000> select id,name from emp where id>2 order by id desc;
INFO  : Number of reduce tasks determined at compile time: 1
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1508216103995_0004
INFO  : The url to track the job: http://mini1:8088/proxy/application_1508216103995_0004/
INFO  : Starting Job = job_1508216103995_0004, Tracking URL = http://mini1:8088/proxy/application_1508216103995_0004/
INFO  : Kill Command = /root/apps/hadoop-2.6.4/bin/hadoop job  -kill job_1508216103995_0004
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
INFO  : 2017-10-18 00:35:39,865 Stage-1 map = 0%,  reduce = 0%
INFO  : 2017-10-18 00:35:46,275 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.33 sec
INFO  : 2017-10-18 00:35:51,487 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.34 sec
INFO  : MapReduce Total cumulative CPU time: 2 seconds 340 msec
INFO  : Ended Job = job_1508216103995_0004
| id  |   name   |
| 5   | fengjie  |
| 4   | furong   |
| 3   | wangwu   |
3 rows selected (18.96 seconds)



0: jdbc:hive2://localhost:10000> create external table emp2(id int,name string)
0: jdbc:hive2://localhost:10000> row format delimited fields terminated by ','//指定逗号分割
0: jdbc:hive2://localhost:10000> stored as textfile//文本存储方式
0: jdbc:hive2://localhost:10000> location '/company';
No rows affected (0.101 seconds)//存储在/company目录下


0: jdbc:hive2://localhost:10000> load data local inpath '/root/sz.data' into table emp2;(也可以用hadoo直接上传)
INFO  : Loading data to table myhive.emp2 from file:/root/sz.data
INFO  : Table myhive.emp2 stats: [numFiles=0, totalSize=0]
No rows affected (0.414 seconds)
0: jdbc:hive2://localhost:10000> select * from emp2;
| emp2.id  | emp2.name  |
| 1        | zhangsan   |
| 2        | lisi       |
| 3        | wangwu     |
| 4        | furong     |
| 5        | fengjie    |


0: jdbc:hive2://localhost:10000> create table stu(id int,name string)
0: jdbc:hive2://localhost:10000> partitioned by(school string)
0: jdbc:hive2://localhost:10000> row format delimited fields terminated by ',';
No rows affected (0.319 seconds)
0: jdbc:hive2://localhost:10000> show tables;
| tab_name  |
| emp       |
| emp2      |
| stu       |
| t_sz_ext  |
0: jdbc:hive2://localhost:10000> load data local inpath '/root/sz.data' into table stu partition(school='scu');
INFO  : Loading data to table myhive.stu partition (school=scu) from file:/root/sz.data
INFO  : Partition myhive.stu{school=scu} stats: [numFiles=1, numRows=0, totalSize=46, rawDataSize=0]
No rows affected (0.607 seconds)
0: jdbc:hive2://localhost:10000> select * from stu;
| stu.id  | stu.name  | stu.school  |
| 1       | zhangsan  | scu         |
| 2       | lisi      | scu         |
| 3       | wangwu    | scu         |
| 4       | furong    | scu         |
| 5       | fengjie   | scu         |
5 rows selected (0.286 seconds)
0: jdbc:hive2://localhost:10000> load data local inpath '/root/sz2.data' into table stu partition(school='hfut');
INFO  : Loading data to table myhive.stu partition (school=hfut) from file:/root/sz2.data
INFO  : Partition myhive.stu{school=hfut} stats: [numFiles=1, numRows=0, totalSize=46, rawDataSize=0]
No rows affected (0.671 seconds)
0: jdbc:hive2://localhost:10000> select * from stu;
| stu.id  | stu.name  | stu.school  |
| 1       | Tom       | hfut        |
| 2       | Jack      | hfut        |
| 3       | Lucy      | hfut        |
| 4       | Kitty     | hfut        |
| 5       | Lucene    | hfut        |
| 6       | Sakura    | hfut        |
| 1       | zhangsan  | scu         |
| 2       | lisi      | scu         |
| 3       | wangwu    | scu         |
| 4       | furong    | scu         |
| 5       | fengjie   | scu         |



0: jdbc:hive2://localhost:10000> alter table stu add partition (school='Tokyo');


