hive去重方式

hive去重方式

1.distinct 去重字段列表

特点:对distinct后面的字段列表去重,无参考系

select distinct case_id, role, judgename 
from judgeInfo;
2.row_number() over(partition by 去重字段列表 order by 参考系字段)

特点:有参考系,即对某字段排序,取序号为特点值的记录。
如,去重,取最新记录:

select case_id, role, judgename,dt from 
	(select case_id, role, judgename,dt,
	row_number() over(partition by case_id, role, 		judgename order by dt desc) rnk
	from judgeInfo) a
where rnk=1;

猜你喜欢

转载自blog.csdn.net/AnlaGodness/article/details/105449250