Hive基础之建表

一、Hive的特殊数据类型

hive总体和mysql类似，但是多了一些数据类型——集合数据类型：

ARRAY：存储的数据为相同类型

MAP：具有相同类型的键值对

STRUCT：封装了一组字段

类型	格式	定义
array	['aaa','bbb','bbb']	ARRAY<string>
map	{'A':'Apex','B':'Bee'}	MAP<string,string>
struct	{'aaa',666}	STRUCT<fruit:string,weight:int>

二、创建静态表

创建一张静态表的语句：

create table if not exists employee(
    name string,
    work_place array<string>,
    gender_age struct<gender:string,age:int>,
    skills_score map<string,int>,
    depart_title map<string,string>
)
row format delimited fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';

row format delimited ：分隔符设置开始语句

fields terminated by '|' ：设置字段与字段之间的分隔符为“|”
collection items terminated by ',' ：设置一个复杂类型（array,struct)字段的各个item之间的分隔符为 “,”
map keys terminated by ':' ：设置一个复杂类型(Map)字段的key value之间的分隔符为 “:”
lines terminated by '\n'; ：设置行与行之间的分隔符为 “\n”

将本地文件的数据导入表格:

 load data local inpath '/opt/employee.txt' into table employee;

将服务器文件的数据导入表格：

load data inpath '/employee.txt' into table employee;

将文件的数据覆写进表格：

load data inpath '/employee.txt' overwrite  into table employee;

三、创建分区表

创建分区表的语句：

create table employee2(
    name string,
    work_place array<string>,
    gender_age struct<gender:string,age:int>,
    skills_score map<string,int>,
    depart_title map<string,string>
)
partitioned by (age int) --以age作为分区依据
row format delimited 
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';

将数据导入分区表：

load data local inpath '/opt/employee.txt' into table employee2 partition(age=20);
load data local inpath '/opt/employee.txt' into table employee2 partition(age=30);

查看分区表分区信息：

show partitions employee2;

四、内部表和外部表

数据表分为内部表和外部表

内部表（管理表）

HDFS中为所属数据库目录下的子文件夹
数据完全由Hive管理，删除表(元数据)会删除数据

外部表（External Tables）

数据保存在指定位置的HDFS路径中
Hive不完全管理数据，删除表(元数据)不会删除数据

上面创建的两张雇员表均为内部表

创建外部表的语句：


create external table if not exists employee(
    name string,
    work_place array<string>,
    gender_age struct<gender:string,age:int>,
    skills_score map<string,int>,
    depart_title map<string,string>
)
row format delimited 
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n
location '/tmp/hivedata/employee';

创建外部表要在create后面加上一个 external

location '/tmp/hivedata/employee'; 含义是：指定数据存储路径（HDFS）