hivesql练习(1)分组,聚合,累加,join,打标记,多行转多列

给定订单表 sale_order

id sale_dt user_id sku_id sale_count price amount
1 2019-01-01 1 1001 2 100 200
2 2019-01-02 2 1001 1 100 100
3 2019-02-10 3 1001 2 80 160
4 2019-02-11 2 1002 2 100 200
5 2019-03-01 3 1002 1 100 100
6 2019-03-01 3 1001 1 50 50
7 2019-03-02 3 1003 4 100 400

问题1,根据订单表 sale_order生成如下数据,请写出sql, 最好是hive支持的

月份 销量 销售额
2019-01 3 300
2019-02 4 360
2019-03 6 550

计算:

select
substr(sale_dt,1,7) as `月份`,
sum(sale_count) as `销量`,
sum(amount) as `销售额`
from sale_order
group by substr(sale_dt,1,7)
;

问题2, 根据订单表 sale_order生成如下数据,请写出sql, 最好是hive支持的

user_id 2019_01 2019_02 2019_03
1 200 0 0
2 100 200 0
3 0 160 550

计算:

with tmp as(
select
user_id,
substr(sale_dt,1,7) as dt,
sum(amount) as am
from sale_order
group by user_id,substr(sale_dt,1,7)
)

select
user_id,
max(`2019_01`) as `2019_01`,
max(`2019_02`) as `2019_02`,
max(`2019_03`) as `2019_03`
from
(
select
user_id,
case when dt='2019-01' then am else 0 end as `2019_01`,
case when dt='2019-02' then am else 0 end as `2019_02`,
case when dt='2019-03' then am else 0 end as `2019_03`
from tmp
) t
group by user_id;

问题3,根据订单表 sale_order生成如下数据,请写出sql, 最好是hive支持的

月份 累计销量 累计销售额
2019-01 3 300
2019-02 7 660
2019-03 13 1210

计算:

with tmp as(
select
substr(sale_dt,1,7) as dt,
sum(sale_count) as sum_cnts,
sum(amount) as sum_amt
from sale_order
group by substr(sale_dt,1,7)
)

select
dt as `月份`,
sum(sum_cnts) over(order by dt rows between unbounded preceding and current row) as `累计销量`,
sum(sum_amt) over(order by dt rows between unbounded preceding and current row) as `累计销售额`
from tmp
;

问题4, 另有商品表products如下,请问题,请给出累计销售量和累计销售额排名,并写出sql。

sku_id sku_name
1001 商品1
1002 商品2
1003 商品3

计算:(感觉需求不完整,我个人理解是求每个月每个商品的销售量排名,销售额排名,并且每个月每个商品的销售量和销售额是累加的)

with tmp as(
select
substr(sale_dt,1,7) as dt,
sku_name,
sum(sale_count) as sum_cnts,
sum(amount) as sum_amt
from sale_order
join products on sale_order.sku_id=products.sku_id
group by substr(sale_dt,1,7),sku_name
)

select
dt,
sku_name,
row_number() over(partition by dt order by `累计销售量` desc) as `累计销售量排名`,
row_number() over(partition by dt order by `累计销售额` desc) as `累计销售额排名`
from
(
select
dt,
sku_name,
sum(sum_cnts) over(partition by sku_name order by dt rows between unbounded preceding and current row) as `累计销售量`,
sum(sum_amt) over(partition by sku_name order by dt rows between unbounded preceding and current row) as `累计销售额`
from tmp
) t
;

问题5,根据如上数据,请写出一些结论性的观点。

猜你喜欢

转载自blog.csdn.net/weixin_47699191/article/details/114679607