在hive中,窗口函数(又叫开窗函数)具有强大的功能,掌握好窗口函数,能够帮助我们非常方便的解决很多问题。首先我们要了解什么是窗口函数,简单的说窗口函数是hive中一种可以按指定窗口大小计算的函数,例如,sum(),avg(),min(),max()等聚合函数,还有rank(),row_number() 可用作排序使用的窗口函数。下面一一对它们做介绍。
首先,要介绍一些在窗口函数中经常使用的函数或关键字,用来控制窗口函数中窗口的大小。
over():用来指定窗口函数的窗口大小,这个窗口可以随着数据行的变化而发生变化。
current_row:当前行
n preceding :往前n行数据
n following:往后n行数据
unbounded:起点,unbounded preceding 表示从前面的起点开始,unbounded following 表示到数据行的终点。
如果不指定,默认从起始行到当前行。
1.sum(),avg(),min(),max()
数据准备
create table if not exists buy_info ( name string, buy_date string, buy_num int ) row format delimited fields terminated by '|'
select * from buy_info; 信息如下
liulei 2015-04-11 5
liulei 2015-04-12 7
liulei 2015-04-13 3
liulei 2015-04-14 2
liulei 2015-04-15 4
liulei 2015-04-16 4
select name ,buy_time ,num, sum(num) over(partition by name order by buy_date asc) as info1 ,--先按姓名分组,再按购买时间升序排序,最后求和,默认是从起始行到当前行
sum(num) over(partition by name order by buy_date rows between unbounded preceding and unbounded and current_row) as info2, --从起点到当前行,和1结果一样
sum(num) over(partition by name order by buy_date rows between 3 preceding and current_row) as info3,--从当前行往前数3行到当前行
sum(num) over(partition by name) as info4 , --分组内所有行
sum(num) over(partition by name order by buy_date rows between 1 preceding and 1 following) as info5,--当前行+往前一行+往后一行
sum(num) over(partition by name order by buy_date rows between 1 prededing and unbounded following) as info6 --从当前行往前数一行到最后一行
查询结果如下
name | buy_date | buy_num | info1 | info2 | Info3 | info4 | info5 | info6 |
liulei | 2015-04-11 | 5 | 5 | 5 | 5 | 25 | 5 | 25 |
liulei | 2015-04-12 | 7 | 12 | 12 | 12 | 25 | 15 | 25 |
liulei | 2015-04-13 | 3 | 15 | 15 | 15 | 25 | 12 | 20 |
liulei | 2015-04-14 | 2 | 17 | 17 | 17 | 25 | 9 | 13 |
liulei | 2015-04-15 | 4 | 21 | 21 | 16 | 25 | 10 | 10 |
liulei | 2015-04-16 | 4 | 25 | 25 | 13 | 25 | 8 | 8 |