随机函数rand作子查询的bug ???

环境信息：

系统版本: CentOS release 6.10

配置：虚机、64G内存、16C逻辑CPU

版本：5.7.12-log MySQL Community Server (GPL)

pt-query-digest mysql-slow.log --since '2020-11-01 00:01:00' --until '2020-11-09 10:00:00' >slow_report1.log

# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count          5     302
# Exec time     11  37083s      5s    673s    123s    246s     88s    107s
# Lock time      1    78ms   153us   586us   257us   366us    62us   247us
# Rows sent      0     302       1       1       1       1       0       1
# Rows examine  91 162.88G  23.16M   2.96G 552.29M   1.08G 395.64M 483.92M
# Query size    17  84.94k     288     288     288     288       0     288
# Query_time distribution
SELECT
            *
            FROM
            t_test_info
            WHERE id >=
            (SELECT
            FLOOR(MAX(id) * RAND())
            FROM
            t_test_info)
            AND area_code = '5400'
            ORDER BY id
            LIMIT 1\G

以上的慢SQL，平均执行时间为123s，看着是很简单的SQL，基本逻辑是：查询t_test_info表id大于等于它本身最大的id乘以个随机数取整，同时区域编号为5400的数据，根据id排序后，取第一行数据。t_test_info的数据量为6w+,并不是非常大。

表结构如下：

CREATE TABLE `t_test_info` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键',
  `con_id` bigint(20) DEFAULT NULL COMMENT '连接id',
  ........,
  `area_code` varchar(8) DEFAULT NULL COMMENT '地区编码',
  `ctime` datetime DEFAULT NULL COMMENT '创建日期',
  ........,
  PRIMARY KEY (`id`) USING BTREE,
  ........,
  KEY `idx_con_id` (`con_id`) USING BTREE,
  KEY `idx_ctime` (`ctime`)
) ENGINE=InnoDB AUTO_INCREMENT=94681 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC;

对应的执行计划：

id select_type      table     partitions  type   partition_key  key    key_len  rows    filtered   Extra
1 PRIMARY         t_test_info          index            PRIMARY    8      1      10      Using where
2 UNCACHEABLE SUBQUERY t_test_info         index            idx_ctime  6      63713    100      Using index

UNCACHEABLE SUBQUERY：对于外层的主表，一个子查询的结果不能被缓存，每次都需要计算（耗时操作）

发现这执行计划存在问题，idx_ctime这个索引是ctime列的，整个SQL都未出现ctime这一列，难道是优化器选错索引？考虑ingore index(idx_ctime) 去干预，发现它就选择了idx_con_id这个索引，忽略考虑ingore index(idx_ctime) 去干预，发现它就选择了idx_con_id索引又会选择下一个。所以从执行计划来看，虽然是使用了索引，但filtered过滤的是100%的数据，跟全表扫描没啥区别。

这里也考虑说select * 全字段数据，换成select id减少SQL执行阶段Sending data的时间，执行时间也是一样的，没有区别。

FORMAT=JSON格式的执行计划，查看资源消耗情况，根据SQL的逻辑，看资源消耗是正常的（先进行子查询外部的查询,得到一个结果集63713行数据,然后这个结果的每一行在跟select子查询的结果集进行匹配）

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "13801.60"
    },
    "ordering_operation": {
      "using_filesort": false,
      "table": {
        "table_name": "t_test_info",
        "access_type": "index",
        "key": "PRIMARY",
        "used_key_parts": [
          "id"
        ],
        "key_length": "8",
        "rows_examined_per_scan": 1,
        "rows_produced_per_join": 6371,
        "filtered": "10.00",
        "cost_info": {
          "read_cost": "12527.34",
          "eval_cost": "1274.26",
          "prefix_cost": "13801.60",
          "data_read_per_join": "12M"
        },
        "used_columns": [
          "id",
          "con_id",
          .......
          "area_code",
          "ctime",
          .......
        ],
        "attached_condition": "((`zw`.`t_test_info`.`id` >= (/* select#2 */ select floor((max(`zw`.`t_test_info`.`id`) * rand())) from `zw`.`t_test_info`)) and (`zw`.`t_test_info`.`area_code` = '5400'))",
        "attached_subqueries": [
          {
            "dependent": false,
            "cacheable": false,
            "query_block": {
              "select_id": 2,
              "cost_info": {
                "query_cost": "13801.60"
              },
              "table": {
                "table_name": "t_test_info",
                "access_type": "index",
                "key": "idx_ctime",
                "used_key_parts": [
                  "gmt_created"
                ],
                "key_length": "6",
                "rows_examined_per_scan": 63713,
                "rows_produced_per_join": 63713,
                "filtered": "100.00",
                "using_index": true,
                "cost_info": {
                  "read_cost": "1059.00",
                  "eval_cost": "12742.60",
                  "prefix_cost": "13801.60",
                  "data_read_per_join": "126M"
                },
                "used_columns": [
                  "id"
                ]

开启optimizer_trace，对该sql执行的过程进行跟踪分析:

set optimizer_trace='enabled=on';
set optimizer_trace_max_mem_size=10000;
SELECT *  FROM
            t_test_info
            WHERE id >=
            (SELECT
            FLOOR(MAX(id) * RAND())
            FROM
            t_test_info)
            AND area_code = '5400'
            ORDER BY id
            LIMIT 1;
select * from information_schema.optimizer_trace;
部分结果：  
    "join_execution": {
        "select#": 1,
        "steps": [
          {
            "subselect_execution": {
              "select#": 2,
              "steps": [
                {
                  "join_execution": {
                    "select#": 2,
                    "steps": [
          },
          
        -- 出现3039次相同的内容，这里感觉是异常的,一直都在执行子查询部分，陷入某个循环里面在空转？
        
          {
            "subselect_execution": {
              "select#": 2,
              "steps": [
                {
                  "join_execution": {
                    "select#": 2,
                    "steps": [
          }

同时使用pstask进行记录堆栈信息

for((a=0;a<1;a--));
do
pstack 19175 >> /tmp/pstack_1110.log
done

这里的19175 是根据performance_schema.threads查询该SQL对应的THREAD_OS_ID。

通过pt-pmp对trace信息汇总 pt-pmp /tmp/pstack_1110.log：

/tmp/pstack_1110.log 部分信息

Thread 1 (process 19175):
#0  0x00000000010afbf7 in row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long) ()
#1  0x0000000000fc357e in ha_innobase::general_fetch(unsigned char*, unsigned int, unsigned int) ()
#2  0x0000000000815551 in handler::ha_index_next(unsigned char*) ()
#3  0x0000000000cd811c in join_read_next(READ_RECORD*) ()
#4  0x0000000000cdcc06 in sub_select(JOIN*, QEP_TAB*, bool) ()
#5  0x0000000000cdbb7a in JOIN::exec() ()
#6  0x0000000000c0e6b0 in subselect_single_select_engine::exec() ()
#7  0x0000000000c10696 in Item_subselect::exec() ()
#8  0x0000000000c0cd69 in Item_singlerow_subselect::val_real() ()
#9  0x0000000000841080 in Arg_comparator::compare_real_fixed() ()
#10 0x0000000000849592 in Item_func_ge::val_int() ()
#11 0x000000000082592c in Item::val_bool() ()
#12 0x000000000084364a in Item_cond_and::val_int() ()
#13 0x0000000000cdb274 in evaluate_join_record(JOIN*, QEP_TAB*) ()
#14 0x0000000000cdcc53 in sub_select(JOIN*, QEP_TAB*, bool) ()
#15 0x0000000000cdbb7a in JOIN::exec() ()
#16 0x0000000000d45db0 in handle_query(THD*, LEX*, Query_result*, unsigned long long, unsigned long long) ()
#17 0x0000000000d06e93 in execute_sqlcom_select(THD*, TABLE_LIST*) ()
#18 0x0000000000d0a82a in mysql_execute_command(THD*, bool) ()
#19 0x0000000000d0c52d in mysql_parse(THD*, Parser_state*) ()
#20 0x0000000000d0d6be in dispatch_command(THD*, COM_DATA const*, enum_server_command) ()
#21 0x0000000000d0e324 in do_command(THD*) ()
#22 0x0000000000dded7c in handle_connection ()
#23 0x00000000012436d4 in pfs_spawn_thread ()
#24 0x0000003f65407aa1 in start_thread () from /lib64/libpthread.so.0
#25 0x0000003f650e8c4d in clone () from /lib64/libc.so.6

根据pt-pmp的统计，以上的次数多达237次。这里感觉是陷入某内部函数循环里面，需要多次判断取值才能返回。要想找到根因，只能看源码了，个人能力有限，先记录下来，有大神感兴趣可以研究一下。

到这里，我考虑将rand函数换成一个常量，查看执行计划并执行，发现是不一样的，执行时间也是毫秒级。

EXPLAIN SELECT
*  FROM  t_test_info   WHERE  id >= (SELECT  FLOOR(MAX(id) * 0.02)  FROM  t_test_info)
AND area_code = '5400'  ORDER BY  id  LIMIT 1;

id select_type  table  partitions type partition_key key    key_len  rows   filtered   Extra
1 PRIMARY     t_test_info       range PRIMARY    PRIMARY  8     31856 10            Using where
2 SUBQUERY                                                         Select tables optimized away

为进一步猜想是随机函数作为子查询的bug（也可能是UNCACHEABLE SUBQUERY问题，我更倾向这个原因，能力有限，还无法定论，大神可以指正一下），那么在当前版本5.7.12上可能存在，在别的版本已经修复，于是我在测试环境找了个8.0.20版本的库，构建了相同表结构，并造了相同数量的数据。

先对比了执行计划，发现执行计划是一样的， 8.0.20版本的执行计划：

id select_type      table     partitions  type   partition_key  key    key_len  rows    filtered   Extra
1 PRIMARY         t_test_info          index            PRIMARY    8      1      10      Using where
2 UNCACHEABLE SUBQUERY t_test_info         index            idx_ctime  6      63713    100      Using index

不同的是，在8.0.20的环境上，执行时间大概是0.03秒，差了几万倍，差别太大了。

回到5.7.12环境上，我根据SQL实现的逻辑，进行了改写，避免rand函数作为子查询，按照子查询改写为join的思路如下：

SELECT 
a.*
FROM
t_test_info a join
(
SELECT
FLOOR(MAX(id) * RAND()) xx
FROM
t_test_info
) b
where  a.id>=b.xx
AND a.area_code = '5400'
ORDER BY
a.id
LIMIT 1;

以上SQL在5.7.12环境上毫秒级就可以出结果，但该SQL会产生笛卡尔积，如果数据量很大，性能也会下降。并非合适的解决方案。

于是再按照避免rand函数作为子查询思路，拆分成两个SQL，先获取一个随机数，以传参的方式，进行查询，这种在代码层也相对较好实现。

SELECT
FLOOR(MAX(id) * RAND()) into @str
FROM
t_test_info;

SELECT
*
FROM
t_test_info
WHERE
id >= @str
AND area_code = '5400'
ORDER BY
id
LIMIT 1;

在8.0以上版本，可以考虑使用with as的方式构建个临时表的方式进行改写，5.7不支持该语法。

WITH cc AS (
SELECT
floor(max(id) * rand())
FROM
t_test_info
) SELECT
*
FROM
t_test_info
WHERE
area_code = '5400'
AND id >= (SELECT * FROM cc)
ORDER BY
id
LIMIT 1;

后记，后续参考了官方资料，更倾向是执行计划走UNCACHEABLE SUBQUERY这个原因导致的

https://dev.mysql.com/doc/refman/5.7/en/explain-output.html

https://dev.mysql.com/doc/refman/5.7/en/query-cache-operation.html

同时参考《MySQL：查询字段数量多少对查询效率的影响》

随机函数rand作子查询的bug ???

猜你喜欢