Hive实现交叉二维分析的小语句

1. 梳理出你要的列和行维度

列维度: 每一周

行维度: 年级 + 学科 + 班型

2. 对数据按周增序进行聚合 (即根据列维度) ,生成list

concat_ws 和 collect_list (collect_set 会去重后再聚合)  顺序随机

sort_array 只能增序,要倒序排的话在子查询里新增一个辅助列来排序即可。

3. 依次取list的元素

即为 按周增序的指标结果

  select 
    term,
    kemu,
    course_applicable_user_type,
    split(hs,',')[0] lesson_order1,
    split(hs,',')[1] lesson_order2,
    split(hs,',')[2] lesson_order3,
    split(hs,',')[3] lesson_order4,
    split(hs,',')[4] lesson_order5
  from 
  (
    select 
           term,
           kemu,
           course_applicable_user_type,
           -- concat_ws(',', collect_list(cast(lesson_order as string))) as lesson_order_set,
           -- concat_ws(',', collect_list(cast(lesson_valid_rate as string))) as index_amount_set,    
           regexp_replace(
                  concat_ws(',',
                           sort_array (
                                   collect_list(
                                                concat_ws(':',
                                                          case when length(cast(lesson_order as string))=1 then concat('0',cast(lesson_order as string)) else cast(lesson_order as string) end,
                                                          cast(lesson_valid_rate as string)
                                                         )
                                                )
                                      )
                            ),'\\d\\d\:',''
                       )hs
    from
    (          
    select
      term,
      kemu,
      course_applicable_user_type,
      lesson_order,
      lesson_valid_rate 
    from tmp                
    )t
    group by term,kemu,course_applicable_user_type
   )t1

  

猜你喜欢

转载自www.cnblogs.com/skyEva/p/10138186.html