还算深度解析Filter

1.引出问题

我把Filter下面有两个结果集的查询成为Filter连接，虽然不严谨，但是为了好说明，在本博客后面，我们都叫做Filter连接。其实这个Filter连接和单表那种Filter一样，只不过是用驱动表的记录去筛选被驱动表的记录。

关于这种Filter，网上说算法类似嵌套循环，但是Filter会维护一个内存表，当驱动表扫描数据时，先在内存表里检索是否有相同的记录，如果有，直接在内存表取出缓存的记录，如果没有，再去被驱动表过滤出数据。

这么看，Filter的算法貌似优越于嵌套循环，但是为什么一般消除Filter呢？为什么Filter要缓存数据，而嵌套循环不去缓存呢？我们慢慢解读。

2.环境准备

drop table test1;

create table test1 as select * from dba_objects where owner='SCOTT';

insert into test1 select * from test1;

drop table test2 ;

create table test2 as select rownum as rn, o.* from dba_objects o;

create index ix_test2_id on test2(object_id);

create index ix_test2_rn on test2(rn);

drop table test3 ;

create table test3 as select * from dba_objects;

create index ix_test3_id on test3(object_id);

commit;

3.Filter连接的几种写法及产生原因

3.1 多个被驱动表

select distinct t1.*

from test1 t1

where exists (select 1

from test2 t2

where t1.object_id=t2.rn)

or exists (select 1

from test3 t3

where t1.object_id=t3.object_id);

这种写法，只能走Filter，因为test1表只访问一次，检索条件中两个被驱动表是or，只能用Filter对两个被驱动表进行过滤

3.2 被驱动表有union

select *

from test1 t1

where exists (select 1

from test2 t2

where t1.object_id=t2.rn

union

select 1

from test3 t3

where t1.object_id=t3.object_id);

这种写法同上面3.1类似，因为驱动表只访问一次，只能走Filter。

综上两个案例，Filter产生的原因是驱动表扫描一次，被驱动表多个，嵌套循环，hash连接，Merge Join被驱动表只能是一个，只能走Filter。

3.3检索条件有rownum

select *

from test1 t1

where exists (select 1

from test2 t2

where t1.object_id=t2.object_id

and rownum>0);

这里，因为有rownum，驱动表传值给被驱动部分时，被驱动部分要重新检索，重新生成新的结果集，只能走Filter（其他连接方式需要被驱动部分固定）。

至于rownum为什么会重新检索及rownum的算法，请参考本人另一篇博客，链接如下：

https://blog.csdn.net/songjian1104/article/details/103438802

3.4Filter产生原因总结

Filter产生的原因时如下

①驱动表一个，被驱动表多个；

②被驱动部分依赖驱动部分产生不同的结果集。

性能好坏要根据实际情况而定，但是为什么大家还要消除Filter连接呢？我们下面讨论。

4.Filter消除原因

4.1Filter连接执行计划固定

通过上面原因总结，可以得知，一般来说，Filter产生后，表连接方式就不可以调整了，也就固化了执行计划，执行计划不能适应数据的变化，当时性能较好，但是数据规模变化后，性能可能会很差。

4.2Filter连接性能提升空间小

Filter适应的场景是驱动表数据量小，且重复值较多，被驱动表走索引，这个很类似于嵌套循环场景，但是，因为驱动表数据量本来就小，相比嵌套循环，可能提升空间有限。

相反，当驱动表参与检索的数据量很大时，因为Filter执行计划固定，性能就会差很多，如下面的例子:

select distinct t1.object_name

from test1 t1

where exists (select 1

from test2 t2

where t1.object_name=t2.object_name

union

select 1

from test3 t3

where t1.object_type=t3.object_type);

这种场下，更适合走Hash连接，但是执行计划被固定了，走Filter，性能较差，只能改写SQL。

select t1.object_name

from test1 t1

where exists (select 1

from test2 t2

where t1.object_name=t2.object_name)

union

select t1.object_name

from test1 t1

where exists (select 1

from test3 t3

where t1.object_type=t3.object_type);

4.3嵌套循环为什么不缓存查询结果

首先，Filter缓存查询结果是无奈的选择，因为执行计划一般固定，假设被驱动表无法走索引，相比走全表扫描，只能用数据缓存来提升性能；

而嵌套循环，在返回数据量少时，可以利用索引定位，没准比在缓存的内存表定位数据还要快；

返回数据量大时，嵌套循环和Filter都不适合，用Hash连接时，驱动表和被驱动表各扫描一次，也就没有缓存的必要了。

※宋健※

发布了51 篇原创文章 · 获赞 4 · 访问量 4217

私信关注

还算深度解析Filter

猜你喜欢