Hive中对array类型字段的处理

版权声明:本文为博主原创文章,转载联系 [email protected] https://blog.csdn.net/qq_31573519/article/details/86612933

现在有一张表,2个字段,字段A:id, 字段B:array类型:

col_name data_type
gameid int
businesstype array
select gameid, businesstype from  kua.tablea where date=20190122 limit 5;
+-----------+---------------------------+--+
|  gameid   |       businesstype        |
+-----------+---------------------------+--+
| 62230904  | ["SDK","FUNGAME"]         |
| 62230006  | ["MIMO","break"]          |
| 62230003  | ["SDK","MIMO","FUNGAME"]  |
| 62230009  | []                        |
| 62230011  | ["MIMO","break"]          |
+-----------+---------------------------+--+

下面来看看array字段要怎么处理:

1.取B字段中的第一个元素值:

select gameid, businesstype[0] from kua.tablea where date=20190122 limit 5;
+-----------+-------+--+
|  gameid   |  _c1  |
+-----------+-------+--+
| 62230904  | SDK   |
| 62230006  | MIMO  |
| 62230003  | SDK   |
| 62230009  | NULL  |
| 62230011  | MIMO  |
+-----------+-------+--+
如果字段中含有数据,那么正常返回第一个值;如果array是空的,那么返回NULL

2.判断字段为Array类型的值,是否包含某值

select gameid, array_contains(businesstype, 'SDK') from  kua.tablea where date=20190122 limit 5;
+-----------+--------+--+
|  gameid   |  _c1   |
+-----------+--------+--+
| 62230904  | true   |
| 62230006  | false  |
| 62230003  | true   |
| 62230009  | false  |
| 62230011  | false  |
+-----------+--------+--+
array_contains方法返回一个Boolean对象,可以配合if函数使用,例如 if(array_contains(_cloumn,'A'),'包含A','不包含A')

3.一行转多行,把array中的每个值都拿出来作为一行显示

select t.gameid, singleBusinesstype 
from (
    select gameid, businesstype 
    from  kua.tablea where date=20190122 limit 5 
) t LATERAL VIEW explode(t.businesstype) v as singleBusinesstype
+-----------+---------------------+--+
| t.gameid  | singlebusinesstype  |
+-----------+---------------------+--+
| 62230904  | SDK                 |
| 62230904  | FUNGAME             |
| 62230010  | break               |
| 62230010  | MIMO                |
| 62230008  | MIMO                |
| 62230008  | break               |
| 62230006  | MIMO                |
| 62230006  | break               |
| 62230003  | SDK                 |
| 62230003  | MIMO                |
| 62230003  | FUNGAME             |
+-----------+---------------------+--+
配合 LATERAL VIEW、explode 2个函数,可以实现把一个array类型的值分开。
LATERAL VIEW通常用于一行转多行的处理逻辑,而对应的,在hive中实现多行转一行的处理逻辑可以使用<collect_set> <collect_list>函数来实现, 他们返回的对象都是一个array集合,那么又回到上面array对象的处理逻辑了,差别在在于前者会去重。

猜你喜欢

转载自blog.csdn.net/qq_31573519/article/details/86612933