数据 140g, 按照字段time 降序排列 选出最大的前50个。
使用 一般方法 select * from table order by time desc limit 50. 执行了1小时6分钟完全算出。
任务数1个 map数 1783 reduce 1
而 select * from (select * from table distribute by time sort by time desc limit 50 ) t order by time desc limit 50;
需要5分钟算出。结果一致。
任务数2个 分别是:
map 1783 reduce 245
map 245 reduce 1
select sale_ord_id,ivc_title,row_number(ivc_tm) as rn
from
(select sale_ord_id,ivc_tm,ivc_title
from gdm_mXX_inv_actual_det_sum_da
where dt='2014-12-09'
and valid_flag=1
distribute by sale_ord_id
sort by ivc_tm desc) a
where row_number(ivc_tm)=1
limit 50