创建一个员工表:
create table if not exists employees( name string, salary float, subordinates array<string>, deductions map<string,float>, address struct<street:string, city:string, state:string, zip:int> ) row format delimited fields terminated by '\001' collection items terminated by '\002' map keys terminated by '\003' lines terminated by '\n' stored as textfile;
其中subordinates下属员工字段为array类型,deductions扣除字段为map类型,address地址字段为struct类型。
每个列之间通过'\001'分割,array、struct、map中每组元素通过'\002'分割,map的key、value之间通过'\003'分割。
下面来导入数据:
load data local inpath './data/employees.txt' overwrite into table employees;
employeets.txt文件的内容如下:
John Doe100000.0Mary SmithTodd JonesFederal Taxes.2State Taxes.05Insurance.11 Michigan Ave.ChicagoIL60600 Mary Smith80000.0Bill KingFederal Taxes.2State Taxes.05Insurance.1100 Ontario St.ChicagoIL60601 Todd Jones70000.0Federal Taxes.15State Taxes.03Insurance.1200 Chicago Ave.Oak ParkIL60700 Bill King60000.0Federal Taxes.15State Taxes.03Insurance.1300 Obscure Dr.ObscuriaIL60100 Boss Man200000.0John DoeFred FinanceFederal Taxes.3State Taxes.07Insurance.051 Pretentious Drive.ChicagoIL60500 Fred Finance150000.0Stacy AccountantFederal Taxes.3State Taxes.07Insurance.052 Pretentious Drive.ChicagoIL60500 Stacy Accountant60000.0Federal Taxes.15State Taxes.03Insurance.1300 Main St.NapervilleIL60563
因为分隔符是不可见的,所以在这上面看不出来,可以直接复制。
array类型的数据可以通过'数组名[index]'的方式访问,index从0开始:
select subordinates[0],subordinates[1] from employees limit 1;
查询结果为:
c0 c1 Mary Smith Todd Jones
第一行是字段的默认列名。
struct类型的数据可以通过'列名.字段名'的方式访问,如下:
select address.street, address.city, address.state, address.zip from employees limit 1;
查询结果为:
street city state zip 1 Michigan Ave. Chicago IL 60600
map类型的数据可以通过'列名['key']'的方式访问,如下:
select deductions['Federal Taxes'],deductions['State Taxes'],deductions['Insurance'] from employees limit 1;
查询结果为:
c0 c1 c2 0.2 0.05 0.1
如果没有查到结果可以使用if判断:
select if(deductions['Federal Taxes'] is null, 0, deductions['Federal Taxes']) from employees limit 1;
我们也可以用UDTF把结果变成多行,比如:
select explode(deductions) from employees;
查询结果为:
key value Federal Taxes 0.2 State Taxes 0.05 Insurance 0.1 Federal Taxes 0.2 State Taxes 0.05 Insurance 0.1 Federal Taxes 0.15 State Taxes 0.03 Insurance 0.1 Federal Taxes 0.15 State Taxes 0.03 Insurance 0.1 Federal Taxes 0.3 State Taxes 0.07 Insurance 0.05 Federal Taxes 0.3 State Taxes 0.07 Insurance 0.05 Federal Taxes 0.15 State Taxes 0.03 Insurance 0.1
有时候我们需要把name也查询出来:
select name, explode(deductions) from employees;
发现执行语句会报错:
FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions
hive不支持这种语法。
可以用下面语句查询:
select name,dekey,devalue from employees LATERAL VIEW explode(deductions) dedView as dekey,devalue;
查询结果为:
name dekey devalue John Doe Federal Taxes 0.2 John Doe State Taxes 0.05 John Doe Insurance 0.1 Mary Smith Federal Taxes 0.2 Mary Smith State Taxes 0.05 Mary Smith Insurance 0.1 Todd Jones Federal Taxes 0.15 Todd Jones State Taxes 0.03 Todd Jones Insurance 0.1 Bill King Federal Taxes 0.15 Bill King State Taxes 0.03 Bill King Insurance 0.1 Boss Man Federal Taxes 0.3 Boss Man State Taxes 0.07 Boss Man Insurance 0.05 Fred Finance Federal Taxes 0.3 Fred Finance State Taxes 0.07 Fred Finance Insurance 0.05LATERAL VIEW关键字把explode产生的多行或多列变成一个视图,然后再与其他的数据聚合在一起。