-- 表TB1 START_ID END_ID ---------- ---------- 1 3 4 6 7 9 10 12 13 15 16 18 19 21 22 24 25 27 28 30 -- 表TB2 TID ---------- 1 2 3 31 -- 查询TB2的结果是在TB1的范围中 -- 期望结果: TID ---------- 1 2 3
简单的写法:
SELECT t2.tid FROM tb1 t1, tb2 t2 WHERE t2.tid BETWEEN t1.start_id AND t1.end_id
俩个表数据少的情况,该写法没有什么问题,数据稍微大的话,再看看什么结果。构造tb1的数据1w条,构造tb2的数据10w条。
插入语句:
INSERT INTO tb1 SELECT s ,e FROM (SELECT LEVEL s, LEVEL + 2 e FROM DUAL CONNECT BY LEVEL <= 30000) m WHERE MOD(m.s-1, 3) = 0; INSERT INTO tb2 SELECT LEVEL FROM DUAL CONNECT BY LEVEL <= 100000;
执行上面sql,查看autotrace
SELECT t2.tid FROM tb1 t1, tb2 t2 WHERE t2.tid BETWEEN t1.start_id AND t1.end_id; 30074行が選択されました。 経過: 00:02:18.07 実行計画 ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1538 Card=2640000 Bytes=102960000) 1 0 MERGE JOIN (Cost=1538 Card=2640000 Bytes=102960000) 2 1 SORT (JOIN) (Cost=90 Card=10000 Bytes=260000) 3 2 TABLE ACCESS (FULL) OF 'TB1' (TABLE) (Cost=13 Card=10000 Bytes=260000) 4 1 FILTER 5 4 SORT (JOIN) (Cost=571 Card=105600 Bytes=1372800) 6 5 TABLE ACCESS (FULL) OF 'TB2' (TABLE) (Cost=54 Card=105600 Bytes=1372800) 統計 ---------------------------------------------------------- 9 recursive calls 1 db block gets 352 consistent gets 0 physical reads 176 redo size 481806 bytes sent via SQL*Net to client 22547 bytes received via SQL*Net from client 2006 SQL*Net roundtrips to/from client 4 sorts (memory) 0 sorts (disk) 30074 rows processed
上面SQL执行了2分18秒,效率很不好,看一下执行计划,tb1和tb2进行了FILTER操作,(FILTER类似NESTED LOOP,它内部维护一个hash table,当一个值满足条件时,把这个值放到hash中,下次遇到相同的值时,直接去hash中去取,避免再一次全表扫描,所以效率优于NESTED LOOP。)。tb1有10000条记录,tb2有100000条记录,最坏的情况10000*100000次全表扫描,这就是效率慢的原因。
思路:为了避免嵌套循环,考虑使用hash join 来减少全表扫描次数,由于hash join只能用于等值连接,将tb1表数据缺失的条件构造出来,使Oracle选择hash join。
优化后的SQL
SELECT m2.tid FROM (SELECT t1.start_id + t2.lv tid FROM tb1 t1, (SELECT LEVEL - 1 lv FROM (SELECT MAX(end_id - start_id) + 1 g FROM tb1) CONNECT BY LEVEL <= g) t2 WHERE t1.end_id >= t1.start_id + t2.lv) m1, tb2 m2 WHERE m1.tid = m2.tid; 30074行が選択されました。 経過: 00:00:00.02 実行計画 ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=83 Card=960 Bytes= 49920) 1 0 HASH JOIN (Cost=83 Card=960 Bytes=49920) 2 1 NESTED LOOPS (Cost=27 Card=500 Bytes=19500) 3 2 VIEW (Cost=13 Card=1 Bytes=13) 4 3 CONNECT BY (WITHOUT FILTERING) 5 4 COUNT 6 5 VIEW (Cost=13 Card=1 Bytes=13) 7 6 SORT (AGGREGATE) 8 7 TABLE ACCESS (FULL) OF 'TB1' (TABLE) (Cost=13 Card=10000 Bytes=260000) 9 2 TABLE ACCESS (FULL) OF 'TB1' (TABLE) (Cost=13 Card=500Bytes=13000) 10 1 TABLE ACCESS (FULL) OF 'TB2' (TABLE) (Cost=54 Card=105600 Bytes=1372800) 統計 ---------------------------------------------------------- 14 recursive calls 0 db block gets 2583 consistent gets 0 physical reads 0 redo size 419088 bytes sent via SQL*Net to client 22547 bytes received via SQL*Net from client 2006 SQL*Net roundtrips to/from client 3 sorts (memory) 0 sorts (disk) 30074 rows processed
上面SQL执行了0.02秒,效率很好,m1和m2进行hash join,分别进行一次全表扫描。