一、Hive的特殊数据类型
hive总体和mysql类似,但是多了一些数据类型——集合数据类型:
ARRAY:存储的数据为相同类型
MAP:具有相同类型的键值对
STRUCT:封装了一组字段
类型 | 格式 | 定义 |
array | ['aaa','bbb','bbb'] | ARRAY<string> |
map | {'A':'Apex','B':'Bee'} | MAP<string,string> |
struct | {'aaa',666} | STRUCT<fruit:string,weight:int> |
二、创建静态表
创建一张静态表的语句:
create table if not exists employee(
name string,
work_place array<string>,
gender_age struct<gender:string,age:int>,
skills_score map<string,int>,
depart_title map<string,string>
)
row format delimited fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';
row format delimited :分隔符设置开始语句
fields terminated by '|' :设置字段与字段之间的分隔符为“|”
collection items terminated by ',' :设置一个复杂类型(array,struct)字段的各个item之间的分隔符为 “,”
map keys terminated by ':' :设置一个复杂类型(Map)字段的key value之间的分隔符为 “:”
lines terminated by '\n'; :设置行与行之间的分隔符为 “\n”
将本地文件的数据导入表格:
load data local inpath '/opt/employee.txt' into table employee;
将服务器文件的数据导入表格:
load data inpath '/employee.txt' into table employee;
将文件的数据覆写进表格:
load data inpath '/employee.txt' overwrite into table employee;
三、创建分区表
创建分区表的语句:
create table employee2(
name string,
work_place array<string>,
gender_age struct<gender:string,age:int>,
skills_score map<string,int>,
depart_title map<string,string>
)
partitioned by (age int) --以age作为分区依据
row format delimited
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';
将数据导入分区表:
load data local inpath '/opt/employee.txt' into table employee2 partition(age=20);
load data local inpath '/opt/employee.txt' into table employee2 partition(age=30);
查看分区表分区信息:
show partitions employee2;
四、内部表和外部表
数据表分为内部表和外部表
内部表(管理表)
- HDFS中为所属数据库目录下的子文件夹
- 数据完全由Hive管理,删除表(元数据)会删除数据
外部表(External Tables)
- 数据保存在指定位置的HDFS路径中
- Hive不完全管理数据,删除表(元数据)不会删除数据
上面创建的两张雇员表均为内部表
创建外部表的语句:
create external table if not exists employee(
name string,
work_place array<string>,
gender_age struct<gender:string,age:int>,
skills_score map<string,int>,
depart_title map<string,string>
)
row format delimited
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n
location '/tmp/hivedata/employee';
创建外部表要在create后面加上一个 external
location '/tmp/hivedata/employee'; 含义是:指定数据存储路径(HDFS)