Hive(五)——查询
-
SELECT… FROM语句,一般情况不再赘述,以下展示复合数据结构如何取值
# 先提供几条数据与建表语句,方便随手练习 John Doe!100000.0!Mary Smith$Todd Jones!Federal Taxes,0.2$State Taxes,0.05$Insurance,0.1!1 Michigan Ave.$Chicago$IL$60600 Mary Smith!80000.0!Bill King!Federal Taxes,0.2$State Taxes,0.05$Insurance,0.1!100 Ontario St.$Chicago$IL$60601 Todd Jones!70000.0!lili!Federal Taxes,0.15$State Taxes,0.03$Insurance,0.1!200 Chicago Ave.$Oak Park$IL$60700 Bill King!60000.0!Huahua$Xixi!Federal Taxes,0.15$State Taxes,0.03$Insurance,0.1!300 Obscure Dr.$Obscuria$IL$60100 CREATE TABLE employees (name STRING,salary FLOAT,subordinates ARRAY<STRING>,deductions MAP<STRING, FLOAT>,address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>) ROW FORMAT DELIMITED FIELDS TERMINATED BY '!' COLLECTION ITEMS TERMINATED BY '$' MAP KEYS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE;
接下来就是查询复合数据类型的值
# 先看下描述信息 hive (default)> desc employees; name string salary float subordinates array<string> deductions map<string,float> address struct<street:string,city:string,state:string,zip:int> # 查询array类型数据 hive (default)> select subordinates from employees; ["Mary Smith","Todd Jones"] ["Bill King"] ["lili"] ["Huahua","Xixi"] hive (default)> select subordinates[0] from employees; Mary Smith Bill King lili Huahua # 查询map类型数据 hive (default)> select deductions from employees; {"Federal Taxes":0.2,"StateTaxes":0.05,"Insurance":0.1} {"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1} {"Federal Taxes":0.15,"State Taxes":0.03,"Insurance":0.1} {"Federal Taxes":0.15,"State Taxes":0.03,"Insurance":0.1} hive (default)> select deductions['State Taxes'] from employees; 0.05 0.05 0.03 0.03 # 查询struct类型数据 hive (default)> select address from employees; {"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600} {"street":"100 Ontario St.","city":"Chicago","state":"IL","zip":60601} {"street":"200 Chicago Ave.","city":"Oak Park","state":"IL","zip":60700} {"street":"300 Obscure Dr.","city":"Obscuria","state":"IL","zip":60100} hive (default)> select address.city from employees; Chicago Chicago Oak Park Obscuria
-
使用正则表达式来指定列
# 需要先设置属性才能使用正则 set hive.support.quoted.identifiers=none; # 查询所有`s`开头的列的数据 hive (default)> select name,`s.*` from employees; John Doe 100000.0 ["Mary Smith","Todd Jones"] Mary Smith 80000.0 ["Bill King"] Todd Jones 70000.0 ["lili"] Bill King 60000.0 ["Huahua","Xixi"]
-
使用列值进行计算
hive (default)> select upper(name),salary,deductions['Federal Taxes'],round(salary*(1-deductions['Federal Taxes'])) from employees; JOHN DOE 100000.0 0.2 80000.0 MARY SMITH 80000.0 0.2 64000.0 TODD JONES 70000.0 0.15 59500.0 BILL KING 60000.0 0.15 51000.0
当进行算术运算时,需要注意数据溢出或数据下溢问题,如果用户比较担心溢出和下溢,那么可以考虑在表模式中定义使用范围更广的数据类型。不过这样做的缺点是每个数据值会占用更多额外的内存。也可以使用特定的表达式将值转换为范围更广的数据类型。
-
使用函数
-
查看month 相关的函数
show functions like ‘month’
-
查看 add_months 函数的用法
desc function add_months;
- 查看 add_months 函数的详细说明并举例
desc function extended add_months;
-