[Hadoop] Hive 库表基本操作

Hive 库表基本操作

创建数据库

hive> create database if not exists db1;
hive> create schema if not exists db2;

删除数据库

hive> drop database db2;
hive> drop schema db1;

创建表

CREATE TABLE IF NOT EXISTS employee ( 
eid int, 
name String,
salary String, 
job string,
year int)
COMMENT 'Employee details'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

导入数据进表

准备数据文件sample.txt

[root@g12-1 ~]# cat /tmp/sample.txt 
1201	Gopal	45000	TechnicalManager	2013	
1202	Manisha	45000	ProofReader	2013
1203	Masthanvali	40000	TechnicalWriter	2014
1204	Kiran	40000	HrAdmin	2014
[root@g12-1 ~]#

导入数据进表

hive> LOAD DATA LOCAL INPATH '/tmp/sample.txt' OVERWRITE INTO TABLE employee;
Loading data to table db1.employee
Table db1.employee stats: [numFiles=1, numRows=0, totalSize=150, rawDataSize=0]
OK
Time taken: 0.354 seconds
hive> select * from employee;
OK
1201	Gopal	45000	TechnicalManager	2013
1202	Manisha	45000	ProofReader	2013
1203	Masthanvali	40000	TechnicalWriter	2014
1204	Kiran	40000	HrAdmin	2014
Time taken: 0.094 seconds, Fetched: 4 row(s)
hive>

HiveQL

SELECT...WHERE

hive> select * from employee where salary > 40000;

ORDER BY

hive> select * from employee order by eid;

GROUP BY

hive> select salary,count(salary) from employee group by salary;

SELECT...JOIN

hive> SELECT c.ID, c.NAME, c.AGE, o.AMOUNT FROM CUSTOMERS c JOIN ORDERS o ON (c.ID = o.CUSTOMER_ID);

分区表

Hive的数据库是目录，表也是目录，分区表表目录的子目录

create table xx(...) partitioned by()

alter table xxx add partitions() ...

load data local inpath ... into table xxx partions (...);

bucket表（桶表）

create table xxx(...) ... clustered by (fileName) into n buckets;

桶表是数据文件.hash

HiveQL调优

1）explain 解释执行计划

explain extended select count(*) from employee;

explain formatted select count(*) from employee;

2）启用limit调优，避免全表扫描，使用抽样机制

select * from employee limit 1,2；

配置hive.limite.optimize.enable=true

3）JOIN

使用map端链接（/*+ streamtable(table) */）

连接查询表的大小是从左至右一次增长。

4）设置本地模式，在单台机器上处理所有任务

使用小数据情况

hive.exec.mode.local.auto=true //默认false

hive> set hive.exec.mode.local.auto=true;

...

[Hadoop] Hive 库表基本操作

猜你喜欢