hive去重方式
1.distinct 去重字段列表
特点:对distinct后面的字段列表去重,无参考系
select distinct case_id, role, judgename
from judgeInfo;
2.row_number() over(partition by 去重字段列表 order by 参考系字段)
特点:有参考系,即对某字段排序,取序号为特点值的记录。
如,去重,取最新记录:
select case_id, role, judgename,dt from
(select case_id, role, judgename,dt,
row_number() over(partition by case_id, role, judgename order by dt desc) rnk
from judgeInfo) a
where rnk=1;