PG in¬ in
- in VS join VS any VS exists
- 模型A
postgres=# create table tbl_a (a integer primary key, b char(128));
CREATE TABLE
Time: 67.026 ms
postgres=# create table tbl_b (a integer primary key, b char(128));
CREATE TABLE
Time: 60.716 ms
postgres=# insert into tbl_a values (generate_series(0,2000000),'a'||generate_series(0,2000000));
INSERT 0 2000001
Time: 4218.271 ms (00:04.218)
postgres=# insert into tbl_b values (generate_series(100000,1100000),'a'||generate_series(100000,1100000));
INSERT 0 1000001
Time: 2135.322 ms (00:02.135)
postgres=#
postgres=# select count(*) from tbl_a inner join tbl_b on (tbl_a.a=tbl_b.a);
count
---------
1000001
(1 row)
Time: 629.656 ms
postgres=# select count(*) from tbl_a where a in (select a from tbl_b);
count
---------
1000001
(1 row)
Time: 613.041 ms
postgres=# select count(*) from tbl_a where a = any (array (select a from tbl_b));
count
---------
1000001
(1 row)
Time: 1391.568 ms (00:01.392)
postgres=#
postgres=# select count(*) from tbl_a where exists (select a from tbl_b where tbl_b.a=tbl_a.a);
count
---------
1000001
(1 row)
Time: 556.529 ms
postgres=#
这个数据模型下是exists > in > join > any
看相关的执行计划
- in
postgres=# explain (analyze,timing,buffers,costs,verbose) select count(*) from tbl_a where a in (select a from tbl_b);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=99469.19..99469.20 rows=1 width=8) (actual time=881.258..881.258 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=48602
-> Merge Join (cost=4175.86..96969.19 rows=1000001 width=0) (actual time=29.684..811.034 rows=1000001 loops=1)
Inner Unique: true
Merge Cond: (tbl_a.a = tbl_b.a)
Buffers: shared hit=48602
-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..77949.36 rows=1999989 width=4) (actual time=0.011..260.072 rows=1100002 loops=1)
Output: tbl_a.a
Heap Fetches: 1100002
Buffers: shared hit=25458
-> Index Only Scan using tbl_b_pkey on public.tbl_b (cost=0.42..38978.24 rows=1000001 width=4) (actual time=0.005..234.057 rows=1000001 loops=1)
Output: tbl_b.a
Heap Fetches: 1000001
Buffers: shared hit=23144
Planning Time: 0.181 ms
Execution Time: 881.288 ms
(17 rows)
Time: 881.762 ms
postgres=#
- join
postgres=# explain (analyze,timing,buffers,costs,verbose) select count(*) from tbl_a inner join tbl_b on (tbl_a.a=tbl_b.a);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=99469.19..99469.20 rows=1 width=8) (actual time=882.490..882.490 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=48602
-> Merge Join (cost=4175.86..96969.19 rows=1000001 width=0) (actual time=29.831..812.149 rows=1000001 loops=1)
Inner Unique: true
Merge Cond: (tbl_a.a = tbl_b.a)
Buffers: shared hit=48602
-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..77949.36 rows=1999989 width=4) (actual time=0.017..260.735 rows=1100002 loops=1)
Output: tbl_a.a
Heap Fetches: 1100002
Buffers: shared hit=25458
-> Index Only Scan using tbl_b_pkey on public.tbl_b (cost=0.42..38978.24 rows=1000001 width=4) (actual time=0.009..234.505 rows=1000001 loops=1)
Output: tbl_b.a
Heap Fetches: 1000001
Buffers: shared hit=23144
Planning Time: 0.170 ms
Execution Time: 882.524 ms
(17 rows)
Time: 883.040 ms
postgres=#
看join方式是采用的merge join,如果默认走hash join的执行计划会比in快。
关闭merge join强制走hash join发现更慢。(此时使用in也一样会变成hash join变慢)
原本走merge join 采用的是Index Only Scan 强制走hash join时变成了Seq Scan。
postgres=# set enable_mergejoin = off;
SET
Time: 0.153 ms
postgres=# show enable_mergejoin;
enable_mergejoin
------------------
off
(1 row)
Time: 0.123 ms
postgres=#
postgres=# explain (analyze,timing,buffers,costs,verbose) select count(*) from tbl_a inner join tbl_b on (tbl_a.a=tbl_b.a);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=134915.90..134915.91 rows=1 width=8) (actual time=1339.586..1339.586 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=61226, temp read=8254 written=8254
-> Hash Join (cost=46816.02..132415.89 rows=1000001 width=0) (actual time=351.648..1269.892 rows=1000001 loops=1)
Inner Unique: true
Hash Cond: (tbl_a.a = tbl_b.a)
Buffers: shared hit=61226, temp read=8254 written=8254
-> Seq Scan on public.tbl_a (cost=0.00..60816.89 rows=1999989 width=4) (actual time=0.010..313.553 rows=2000001 loops=1)
Output: tbl_a.a
Buffers: shared hit=40817
-> Hash (cost=30409.01..30409.01 rows=1000001 width=4) (actual time=319.297..319.297 rows=1000001 loops=1)
Output: tbl_b.a
Buckets: 131072 Batches: 16 Memory Usage: 3225kB
Buffers: shared hit=20409, temp written=2738
-> Seq Scan on public.tbl_b (cost=0.00..30409.01 rows=1000001 width=4) (actual time=0.005..157.206 rows=1000001 loops=1)
Output: tbl_b.a
Buffers: shared hit=20409
Planning Time: 0.118 ms
Execution Time: 1339.627 ms
(19 rows)
Time: 1340.074 ms (00:01.340)
postgres=#
- any
postgres=# explain (analyze,timing,buffers,costs,verbose) select count(*) from tbl_a where a = any (array(select a from tbl_b));
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=30427.79..30427.80 rows=1 width=8) (actual time=1559.216..1559.216 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=3046285
InitPlan 1 (returns $0)
-> Seq Scan on public.tbl_b (cost=0.00..30409.01 rows=1000001 width=4) (actual time=0.010..146.601 rows=1000001 loops=1)
Output: tbl_b.a
Buffers: shared hit=20409
-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..18.75 rows=10 width=0) (actual time=225.485..1492.757 rows=1000001 loops=1)
Output: tbl_a.a
Index Cond: (tbl_a.a = ANY ($0))
Heap Fetches: 1000001
Buffers: shared hit=3046285
Planning Time: 0.097 ms
Execution Time: 1559.249 ms
(14 rows)
Time: 1559.665 ms (00:01.560)
postgres=#
- exists
postgres=# explain (analyze,buffers,costs,timing,verbose) select count(*) from tbl_a where exists (select a from tbl_b where tbl_b.a=tbl_a.a);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=99469.19..99469.20 rows=1 width=8) (actual time=816.748..816.749 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=48602
-> Merge Join (cost=4175.86..96969.19 rows=1000001 width=0) (actual time=26.624..749.396 rows=1000001 loops=1)
Inner Unique: true
Merge Cond: (tbl_a.a = tbl_b.a)
Buffers: shared hit=48602
-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..77949.36 rows=1999989 width=4) (actual time=0.010..224.811 rows=1100002 loops=1)
Output: tbl_a.a
Heap Fetches: 1100002
Buffers: shared hit=25458
-> Index Only Scan using tbl_b_pkey on public.tbl_b (cost=0.42..38978.24 rows=1000001 width=4) (actual time=0.006..205.844 rows=1000001 loops=1)
Output: tbl_b.a
Heap Fetches: 1000001
Buffers: shared hit=23144
Planning Time: 0.153 ms
Execution Time: 816.782 ms
(17 rows)
Time: 817.252 ms
- 模型B
postgres=# create table tbl_c (a integer primary key , c char(128));
CREATE TABLE
postgres=# insert into tbl_c values (generate_series(10000,10010),'');
INSERT 0 11
postgres=# select count(*) from tbl_a where a in (select a from tbl_c);
count
-------
11
(1 row)
Time: 0.189 ms
postgres=# select count(*) from tbl_a where a =any(array (select a from tbl_c));
count
-------
11
(1 row)
Time: 0.160 ms
postgres=# select count(*) from tbl_a inner join tbl_c using (a);
count
-------
11
(1 row)
Time: 0.173 ms
postgres=# select count(*) from tbl_a where exists (select a from tbl_c where tbl_c.a=tbl_a.a);
count
-------
11
(1 row)
Time: 0.181 ms
postgres=#
差异不大 any > join > exists > in
- in
postgres=# explain (analyze,buffers,costs,timing,verbose) select count(*) from tbl_a where a in (select a from tbl_c);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=436.75..436.76 rows=1 width=8) (actual time=0.032..0.032 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=45
-> Nested Loop (cost=0.43..436.40 rows=140 width=0) (actual time=0.011..0.029 rows=11 loops=1)
Inner Unique: true
Buffers: shared hit=45
-> Seq Scan on public.tbl_c (cost=0.00..11.40 rows=140 width=4) (actual time=0.004..0.005 rows=11 loops=1)
Output: tbl_c.a, tbl_c.c
Buffers: shared hit=1
-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..3.04 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=11)
Output: tbl_a.a
Index Cond: (tbl_a.a = tbl_c.a)
Heap Fetches: 11
Buffers: shared hit=44
Planning Time: 0.078 ms
Execution Time: 0.050 ms
(16 rows)
Time: 0.265 ms
postgres=#
- join
postgres=# explain (analyze,buffers,costs,timing,verbose) select count(*) from tbl_a inner join tbl_c using (a);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=436.75..436.76 rows=1 width=8) (actual time=0.031..0.031 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=45
-> Nested Loop (cost=0.43..436.40 rows=140 width=0) (actual time=0.010..0.028 rows=11 loops=1)
Inner Unique: true
Buffers: shared hit=45
-> Seq Scan on public.tbl_c (cost=0.00..11.40 rows=140 width=4) (actual time=0.003..0.004 rows=11 loops=1)
Output: tbl_c.a, tbl_c.c
Buffers: shared hit=1
-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..3.04 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=11)
Output: tbl_a.a
Index Cond: (tbl_a.a = tbl_c.a)
Heap Fetches: 11
Buffers: shared hit=44
Planning Time: 0.066 ms
Execution Time: 0.049 ms
(16 rows)
Time: 0.248 ms
postgres=#
- any
postgres=# explain (analyze,buffers,costs,timing,verbose) select count(*) from tbl_a where a =any(array (select a from tbl_c));
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=30.18..30.19 rows=1 width=8) (actual time=0.030..0.030 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=35
InitPlan 1 (returns $0)
-> Seq Scan on public.tbl_c (cost=0.00..11.40 rows=140 width=4) (actual time=0.003..0.005 rows=11 loops=1)
Output: tbl_c.a
Buffers: shared hit=1
-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..18.75 rows=10 width=0) (actual time=0.014..0.027 rows=11 loops=1)
Output: tbl_a.a
Index Cond: (tbl_a.a = ANY ($0))
Heap Fetches: 11
Buffers: shared hit=35
Planning Time: 0.042 ms
Execution Time: 0.046 ms
(14 rows)
Time: 0.211 ms
postgres=#
- exists
postgres=# explain (analyze,buffers,costs,timing,verbose) select count(*) from tbl_a where exists (select a from tbl_c where tbl_c.a=tbl_a.a);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=436.75..436.76 rows=1 width=8) (actual time=0.030..0.030 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=45
-> Nested Loop (cost=0.43..436.40 rows=140 width=0) (actual time=0.010..0.028 rows=11 loops=1)
Inner Unique: true
Buffers: shared hit=45
-> Seq Scan on public.tbl_c (cost=0.00..11.40 rows=140 width=4) (actual time=0.003..0.004 rows=11 loops=1)
Output: tbl_c.a, tbl_c.c
Buffers: shared hit=1
-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..3.04 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=11)
Output: tbl_a.a
Index Cond: (tbl_a.a = tbl_c.a)
Heap Fetches: 11
Buffers: shared hit=44
Planning Time: 0.069 ms
Execution Time: 0.047 ms
(16 rows)
Time: 0.248 ms
postgres=#
- 模型C
postgres=# select count(*) from tbl_c where a in (select a from tbl_a);
count
-------
11
(1 row)
Time: 0.209 ms
postgres=#
postgres=# select count(*) from tbl_c inner join tbl_a using (a);
count
-------
11
(1 row)
Time: 0.173 ms
postgres=#
postgres=# select count(*) from tbl_c where a = any (array (select a from tbl_a));
count
-------
11
(1 row)
Time: 871.603 ms
postgres=#
postgres=# select count(*) from tbl_c where exists (select null from tbl_a where tbl_a.a=tbl_c.a);
count
-------
11
(1 row)
Time: 0.182 ms
postgres=#
此模型下 join > exists > in > any
- in
postgres=# explain(analyze,buffers,costs,timing,verbose) select count(*) from tbl_c where a in (select a from tbl_a);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=436.75..436.76 rows=1 width=8) (actual time=0.055..0.055 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=45
-> Nested Loop (cost=0.43..436.40 rows=140 width=0) (actual time=0.030..0.050 rows=11 loops=1)
Inner Unique: true
Buffers: shared hit=45
-> Seq Scan on public.tbl_c (cost=0.00..11.40 rows=140 width=4) (actual time=0.009..0.010 rows=11 loops=1)
Output: tbl_c.a, tbl_c.c
Buffers: shared hit=1
-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..3.04 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=11)
Output: tbl_a.a
Index Cond: (tbl_a.a = tbl_c.a)
Heap Fetches: 11
Buffers: shared hit=44
Planning Time: 0.136 ms
Execution Time: 0.089 ms
(16 rows)
Time: 0.525 ms
- join
postgres=# explain(analyze,buffers,costs,timing,verbose) select count(*) from tbl_c inner join tbl_a using (a);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=436.75..436.76 rows=1 width=8) (actual time=0.032..0.032 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=45
-> Nested Loop (cost=0.43..436.40 rows=140 width=0) (actual time=0.011..0.029 rows=11 loops=1)
Inner Unique: true
Buffers: shared hit=45
-> Seq Scan on public.tbl_c (cost=0.00..11.40 rows=140 width=4) (actual time=0.003..0.004 rows=11 loops=1)
Output: tbl_c.a, tbl_c.c
Buffers: shared hit=1
-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..3.04 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=11)
Output: tbl_a.a
Index Cond: (tbl_a.a = tbl_c.a)
Heap Fetches: 11
Buffers: shared hit=44
Planning Time: 0.067 ms
Execution Time: 0.049 ms
(16 rows)
Time: 0.255 ms
postgres=#
- any
postgres=# explain(analyze,buffers,costs,timing,verbose) select count(*) from tbl_c where a = any (array (select a from tbl_a));
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=60828.61..60828.62 rows=1 width=8) (actual time=1029.756..1029.756 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=2040819
InitPlan 1 (returns $0)
-> Seq Scan on public.tbl_a (cost=0.00..60816.89 rows=1999989 width=4) (actual time=0.006..259.035 rows=2000001 loops=1)
Output: tbl_a.a
Buffers: shared hit=40817
-> Bitmap Heap Scan on public.tbl_c (cost=4.13..11.70 rows=10 width=0) (actual time=1029.747..1029.749 rows=11 loops=1)
Recheck Cond: (tbl_c.a = ANY ($0))
Heap Blocks: exact=1
Buffers: shared hit=2040819
-> Bitmap Index Scan on tbl_c_pkey (cost=0.00..4.12 rows=10 width=0) (actual time=1029.739..1029.739 rows=11 loops=1)
Index Cond: (tbl_c.a = ANY ($0))
Buffers: shared hit=2040818
Planning Time: 0.082 ms
Execution Time: 1030.852 ms
(16 rows)
Time: 1031.234 ms (00:01.031)
postgres=#
- exists
postgres=# explain (analyze,buffers,costs,timing,verbose) select count(*) from tbl_c where exists (select null from tbl_a where tbl_a.a=tbl_c.a);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=436.75..436.76 rows=1 width=8) (actual time=0.030..0.030 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=45
-> Nested Loop (cost=0.43..436.40 rows=140 width=0) (actual time=0.010..0.027 rows=11 loops=1)
Inner Unique: true
Buffers: shared hit=45
-> Seq Scan on public.tbl_c (cost=0.00..11.40 rows=140 width=4) (actual time=0.003..0.004 rows=11 loops=1)
Output: tbl_c.a, tbl_c.c
Buffers: shared hit=1
-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..3.04 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=11)
Output: tbl_a.a
Index Cond: (tbl_a.a = tbl_c.a)
Heap Fetches: 11
Buffers: shared hit=44
Planning Time: 0.067 ms
Execution Time: 0.047 ms
(16 rows)
Time: 0.246 ms
postgres=#
- not in VS except VS join VS not exists
- 模型A
postgres=# create table tbl_a (a integer primary key, b char(128));
CREATE TABLE
Time: 67.026 ms
postgres=# create table tbl_b (a integer primary key, b char(128));
CREATE TABLE
Time: 60.716 ms
postgres=# insert into tbl_a values (generate_series(0,2000000),'a'||generate_series(0,2000000));
INSERT 0 2000001
Time: 4218.271 ms (00:04.218)
postgres=# insert into tbl_b values (generate_series(100000,1100000),'a'||generate_series(100000,1100000));
INSERT 0 1000001
Time: 2135.322 ms (00:02.135)
postgres=#
postgres=# select count(*) from (select a from tbl_a except select a from tbl_b) as t;
count
---------
1000000
(1 row)
Time: 1727.102 ms (00:01.727)
postgres=#
postgres=# select count(*) from tbl_a left join tbl_b using (a) where tbl_b.b is null;
count
---------
1000000
(1 row)
Time: 737.321 ms
postgres=#
postgres=# select count(*) from tbl_a where not exists (select null from tbl_b where tbl_a.a=tbl_b.a);
count
---------
1000000
(1 row)
Time: 701.591 ms
postgres=#
此模型下 not exists > join > except > not in
- not in
- except
- join
- not exists
- 模型B 右侧表数据量少
postgres=# create table tbl_c (a integer primary key , c char(128));
CREATE TABLE
postgres=# insert into tbl_c values (generate_series(10000,10010),'');
INSERT 0 11
postgres=# select count(*) from tbl_a where a not in (select a from tbl_c);
count
---------
1999990
(1 row)
Time: 341.151 ms
postgres=#
postgres=# select count(*) from tbl_a left join tbl_c using (a) where tbl_c.a is null;
count
---------
1999990
(1 row)
Time: 401.454 ms
postgres=#
postgres=# select count(*) from (select a from tbl_a except select a from tbl_c) as t;
count
---------
1999990
(1 row)
Time: 1146.650 ms (00:01.147)
postgres=#
postgres=# select count(*) from tbl_a where not exists (select null from tbl_c where tbl_c.a=tbl_a.a);
count
---------
1999990
(1 row)
Time: 402.370 ms
postgres=#
此模型下 not in > join > not exists >except
- not in
postgres=# explain (analyze,buffers,costs,timing,verbose) select count(*) from tbl_a where a not in (select a from tbl_c);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=68328.60..68328.61 rows=1 width=8) (actual time=512.604..512.605 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=40818
-> Seq Scan on public.tbl_a (cost=11.75..65828.61 rows=999994 width=0) (actual time=0.023..379.856 rows=1999990 loops=1)
Output: tbl_a.a, tbl_a.b
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 11
Buffers: shared hit=40818
SubPlan 1
-> Seq Scan on public.tbl_c (cost=0.00..11.40 rows=140 width=4) (actual time=0.003..0.004 rows=11 loops=1)
Output: tbl_c.a
Buffers: shared hit=1
Planning Time: 0.091 ms
Execution Time: 512.643 ms
(14 rows)
Time: 513.050 ms
postgres=#
- except
postgres=# explain(analyze,buffers,costs,timing,verbose) select count(*) from (select a from tbl_a except select a from tbl_c) as t;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=368761.05..368761.06 rows=1 width=8) (actual time=2139.793..2139.794 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=40818, temp read=7229 written=7264
-> Subquery Scan on t (cost=333760.54..363761.08 rows=1999989 width=0) (actual time=1205.992..2006.728 rows=1999990 loops=1)
Output: t.a
Buffers: shared hit=40818, temp read=7229 written=7264
-> SetOp Except (cost=333760.54..343761.19 rows=1999989 width=8) (actual time=1205.991..1817.584 rows=1999990 loops=1)
Output: "*SELECT* 1".a, (0)
Buffers: shared hit=40818, temp read=7229 written=7264
-> Sort (cost=333760.54..338760.86 rows=2000129 width=8) (actual time=1205.984..1447.060 rows=2000012 loops=1)
Output: "*SELECT* 1".a, (0)
Sort Key: "*SELECT* 1".a
Sort Method: external merge Disk: 35280kB
Buffers: shared hit=40818, temp read=7229 written=7264
-> Append (cost=0.00..90830.23 rows=2000129 width=8) (actual time=0.012..666.137 rows=2000012 loops=1)
Buffers: shared hit=40818
-> Subquery Scan on "*SELECT* 1" (cost=0.00..80816.78 rows=1999989 width=8) (actual time=0.011..500.831 rows=2000001 loops=1)
Output: "*SELECT* 1".a, 0
Buffers: shared hit=40817
-> Seq Scan on public.tbl_a (cost=0.00..60816.89 rows=1999989 width=4) (actual time=0.009..290.951 rows=2000001 loops=1)
Output: tbl_a.a
Buffers: shared hit=40817
-> Subquery Scan on "*SELECT* 2" (cost=0.00..12.80 rows=140 width=8) (actual time=0.010..0.012 rows=11 loops=1)
Output: "*SELECT* 2".a, 1
Buffers: shared hit=1
-> Seq Scan on public.tbl_c (cost=0.00..11.40 rows=140 width=4) (actual time=0.007..0.009 rows=11 loops=1)
Output: tbl_c.a
Buffers: shared hit=1
Planning Time: 0.108 ms
Execution Time: 2147.694 ms
(30 rows)
Time: 2148.186 ms (00:02.148)
postgres=#
- join
postgres=# explain (analyze,buffers,costs,timing,verbose) select count(*) from tbl_a left join tbl_c using (a) where tbl_c.a is null;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=87968.55..87968.56 rows=1 width=8) (actual time=751.825..751.826 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=46286
-> Merge Anti Join (cost=0.57..82968.93 rows=1999849 width=0) (actual time=0.024..616.804 rows=1999990 loops=1)
Merge Cond: (tbl_a.a = tbl_c.a)
Buffers: shared hit=46286
-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..77949.36 rows=1999989 width=4) (actual time=0.018..409.008 rows=2000001 loops=1)
Output: tbl_a.a
Heap Fetches: 2000001
Buffers: shared hit=46284
-> Index Only Scan using tbl_c_pkey on public.tbl_c (cost=0.14..17.84 rows=140 width=4) (actual time=0.003..0.006 rows=11 loops=1)
Output: tbl_c.a
Heap Fetches: 11
Buffers: shared hit=2
Planning Time: 0.107 ms
Execution Time: 751.875 ms
(16 rows)
Time: 752.358 ms
postgres=#
- not exists
postgres=# explain(analyze,buffers,costs,timing,verbose) select count(*) from tbl_a where not exists (select null from tbl_c where tbl_c.a=tbl_a.a);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=87968.55..87968.56 rows=1 width=8) (actual time=740.657..740.657 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=46286
-> Merge Anti Join (cost=0.57..82968.93 rows=1999849 width=0) (actual time=0.019..607.766 rows=1999990 loops=1)
Merge Cond: (tbl_a.a = tbl_c.a)
Buffers: shared hit=46286
-> Index Only Scan using tbl_a_pkey on public.tbl_a (cost=0.43..77949.36 rows=1999989 width=4) (actual time=0.014..403.054 rows=2000001 loops=1)
Output: tbl_a.a
Heap Fetches: 2000001
Buffers: shared hit=46284
-> Index Only Scan using tbl_c_pkey on public.tbl_c (cost=0.14..17.84 rows=140 width=4) (actual time=0.003..0.006 rows=11 loops=1)
Output: tbl_c.a
Heap Fetches: 11
Buffers: shared hit=2
Planning Time: 0.104 ms
Execution Time: 740.690 ms
(16 rows)
Time: 741.111 ms
postgres=#
- 模型C 左侧表数据量小
postgres=# select count(*) from tbl_c where a not in (select a from tbl_a);
count
-------
0
(1 row)
Time: 7.407 ms
postgres=#
postgres=# select count(*) from (select a from tbl_c except select a from tbl_a) as t;
count
-------
0
(1 row)
Time: 339.787 ms
postgres=#
postgres=# select count(*) from tbl_c left join tbl_a using (a) where tbl_a.a is null;
count
-------
0
(1 row)
Time: 0.169 ms
postgres=#
postgres=# select count(*) from tbl_c where not exists (select null from tbl_a where tbl_a.a=tbl_c.a);
count
-------
0
(1 row)
Time: 0.184 ms
postgres=#
此模型下 join > not exists > not in > except
- not in
select count(*) from tbl_c where a not in (select a from tbl_a);
- except
- join
- not exists
- 总结
|
in VS join VS any VS exists |
not in VS except VS join VS not exists |
模型A 左表200W 右表100W |
exists > in > join > any |
not exists > join > except > not in |
模型B 左表200W 右表11 |
any > join > exists > in |
not in > join > not exists >except |
模型C 左表11 右表200W |
join > exists > in > any |
join > not exists > not in > except |
以上只是单纯的验证,不能作为性能差异的证据,具体差异还需要结合实际的执行计划分析。
其中 in 和 join类的不能完全等价,要看语义。in可以用于隐含去重,join不能做到。