目录
Hive的数据类型主要有两类,一类是基本数据类型,一类是集合数据类型:
基本数据类型:
Hive数据类型 |
Java数据类型 |
长度 |
例子 |
TINYINT |
byte |
1byte有符号整数 |
20 |
SMALINT |
short |
2byte有符号整数 |
20 |
INT |
int |
4byte有符号整数 |
20 |
BIGINT |
long |
8byte有符号整数 |
20 |
BOOLEAN |
boolean |
布尔类型,true或者false |
TRUE FALSE |
FLOAT |
float |
单精度浮点数 |
3.14159 |
DOUBLE |
double |
双精度浮点数 |
3.14159 |
STRING |
string |
字符系列。可以指定字符集。可以使用单引号或者双引号。 |
‘now is the time’ “for all good men” |
TIMESTAMP |
|
时间类型 |
|
BINARY |
|
字节数组 |
|
集合数据类型:
数据类型 |
描述 |
语法示例 |
STRUCT |
和c语言中的struct类似,都可以通过“点”符号访问元素内容。例如,如果某个列的数据类型是STRUCT{first STRING, last STRING},那么第1个元素可以通过字段.first来引用。 |
struct() 例如struct<street:string, city:string> |
MAP |
MAP是一组键-值对元组集合,使用数组表示法可以访问数据。例如,如果某个列的数据类型是MAP,其中键->值对是’first’->’John’和’last’->’Doe’,那么可以通过字段名[‘last’]获取最后一个元素 |
map() 例如map<string, int> |
ARRAY |
数组是一组具有相同类型和名称的变量的集合。这些变量称为数组的元素,每个数组元素都有一个编号,编号从零开始。例如,数组值为[‘John’, ‘Doe’],那么第2个元素可以通过数组名[1]进行引用。 |
Array() 例如array<string> |
规律总结:
array和struct在提取值的时候:只会单纯的提取值,值前面的是字段的信息
map在提取值的时候:会加key和value都提取出来,key 和 value之外的 才是字段的信息
推荐,值之间的分隔符号使用 “ ,” 集合之间值的分隔符号使用 “_” ,key 和 value之间的值用来分隔两者。
案例实操:
{
"name": "songsong",
"friends": ["bingbing" , "lili"] , //列表Array,
"children": { //键值Map,
"xiao song": 18 ,
"xiaoxiao song": 19
}
"address": { //结构Struct,
"street": "hui long guan" ,
"city": "beijing"
}
}
基于上述数据结构,我们在Hive里创建对应的表,并导入数据。
转化之后的数据形式:
songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijing
hive (default)> create table person(
> name string,
> friends array<string>,
> children map<string,int>,
> address struct<street:string,city:string>
> )
> row format delimited
> fields terminated by ','
> collection items terminated by '_'
> map keys terminated by ':';
OK
Time taken: 1.182 seconds
hive (default)> load data local inpath "/opt/module/hive/pson.data" into table default.person;
Loading data to table default.person
Table default.person stats: [numFiles=1, totalSize=75]
OK
Time taken: 1.2 seconds
hive (default)> select * from person;
OK
person.name person.friends person.children person.address
songsong ["bingbing","lili"] {"xiao song":18,"xiaoxiao song":19} {"street":"hui long guan","city":"beijing"}
所有的中间过程在subline中写好,然后在粘贴到CLI中去:
{
"name": "songsong",
"friends": ["bingbing" , "lili"] , //列表Array,
"children": { //键值Map,
"xiao song": 18 ,
"xiaoxiao song": 19
}
"address": { //结构Struct,
"street": "hui long guan" ,
"city": "beijing"
}
}
songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijing
创建表
create table person(
name string,
friends array<string>,
children map<string,int>,
address struct<street:string,city:string>
)
row format delimited
fields terminated by ','
collection items terminated by '_'
map keys terminated by ':';
load data local inpath "/opt/module/datas/pson.data" into table default.person;
另外一种方案:
hive (default)> create table human(
> name string,
> friends array<string>,
> children map<string,int>,
> address map<string,string>
> )
> row format delimited
> fields terminated by ','
> collection items terminated by '_'
> map keys terminated by ':';
OK
Time taken: 0.303 seconds
hive (default)> load data local inpath "/opt/module/hive/human.data" into table default.human;
Loading data to table default.human
Table default.human stats: [numFiles=1, totalSize=87]
OK
Time taken: 0.356 seconds
hive (default)> select * from human;
OK
human.name human.friends human.children human.address
songsong ["bingbing","lili"] {"xiao song":18,"xiaoxiao song":19} {"street":"hui long guan","city":"beijing"}
Time taken: 0.093 seconds, Fetched: 1 row(s)
hive (default)>
我的草稿纸入下:
{
"name": "songsong",
"friends": ["bingbing" , "lili"] , //列表Array,
"children": { //键值key
"xiao song": 18 ,
"xiaoxiao song": 19
}
"address": { //keys,
"street": "hui long guan" ,
"city": "beijing"
}
}
songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,street:hui long guan_city:beijing
创建表的第二种方式:
create table human(
name string,
friends array<string>,
children map<string,int>,
address map<string,string>
)
row format delimited
fields terminated by ','
collection items terminated by '_'
map keys terminated by ':';
load data local inpath "/opt/module/hive/human.data" into table default.human;