环境信息:
系统版本: CentOS release 6.10
配置:虚机、64G内存、16C逻辑CPU
版本:5.7.12-log MySQL Community Server (GPL)
pt-query-digest mysql-slow.log --since '2020-11-01 00:01:00' --until '2020-11-09 10:00:00' >slow_report1.log
# Attribute pct total min max avg 95% stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count 5 302 # Exec time 11 37083s 5s 673s 123s 246s 88s 107s # Lock time 1 78ms 153us 586us 257us 366us 62us 247us # Rows sent 0 302 1 1 1 1 0 1 # Rows examine 91 162.88G 23.16M 2.96G 552.29M 1.08G 395.64M 483.92M # Query size 17 84.94k 288 288 288 288 0 288 # Query_time distribution SELECT * FROM t_test_info WHERE id >= (SELECT FLOOR(MAX(id) * RAND()) FROM t_test_info) AND area_code = '5400' ORDER BY id LIMIT 1\G
以上的慢SQL,平均执行时间为123s,看着是很简单的SQL,基本逻辑是:查询t_test_info表id大于等于它本身最大的id乘以个随机数取整,同时区域编号为5400的数据,根据id排序后,取第一行数据。t_test_info的数据量为6w+,并不是非常大。
表结构如下:
CREATE TABLE `t_test_info` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键', `con_id` bigint(20) DEFAULT NULL COMMENT '连接id', ........, `area_code` varchar(8) DEFAULT NULL COMMENT '地区编码', `ctime` datetime DEFAULT NULL COMMENT '创建日期', ........, PRIMARY KEY (`id`) USING BTREE, ........, KEY `idx_con_id` (`con_id`) USING BTREE, KEY `idx_ctime` (`ctime`) ) ENGINE=InnoDB AUTO_INCREMENT=94681 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC;
对应的执行计划:
id select_type table partitions type partition_key key key_len rows filtered Extra 1 PRIMARY t_test_info index PRIMARY 8 1 10 Using where 2 UNCACHEABLE SUBQUERY t_test_info index idx_ctime 6 63713 100 Using index
UNCACHEABLE SUBQUERY:对于外层的主表,一个子查询的结果不能被缓存,每次都需要计算(耗时操作)
发现这执行计划存在问题,idx_ctime这个索引是ctime列的,整个SQL都未出现ctime这一列,难道是优化器选错索引?考虑ingore index(idx_ctime) 去干预,发现它就选择了idx_con_id这个索引,忽略考虑ingore index(idx_ctime) 去干预,发现它就选择了idx_con_id索引又会选择下一个。所以从执行计划来看,虽然是使用了索引,但filtered过滤的是100%的数据,跟全表扫描没啥区别。
这里也考虑说select * 全字段数据,换成select id减少SQL执行阶段Sending data的时间,执行时间也是一样的,没有区别。
FORMAT=JSON格式的执行计划,查看资源消耗情况,根据SQL的逻辑,看资源消耗是正常的(先进行子查询外部的查询,得到一个结果集63713行数据,然后这个结果的每一行在跟select子查询的结果集进行匹配)
{ "query_block": { "select_id": 1, "cost_info": { "query_cost": "13801.60" }, "ordering_operation": { "using_filesort": false, "table": { "table_name": "t_test_info", "access_type": "index", "key": "PRIMARY", "used_key_parts": [ "id" ], "key_length": "8", "rows_examined_per_scan": 1, "rows_produced_per_join": 6371, "filtered": "10.00", "cost_info": { "read_cost": "12527.34", "eval_cost": "1274.26", "prefix_cost": "13801.60", "data_read_per_join": "12M" }, "used_columns": [ "id", "con_id", ....... "area_code", "ctime", ....... ], "attached_condition": "((`zw`.`t_test_info`.`id` >= (/* select#2 */ select floor((max(`zw`.`t_test_info`.`id`) * rand())) from `zw`.`t_test_info`)) and (`zw`.`t_test_info`.`area_code` = '5400'))", "attached_subqueries": [ { "dependent": false, "cacheable": false, "query_block": { "select_id": 2, "cost_info": { "query_cost": "13801.60" }, "table": { "table_name": "t_test_info", "access_type": "index", "key": "idx_ctime", "used_key_parts": [ "gmt_created" ], "key_length": "6", "rows_examined_per_scan": 63713, "rows_produced_per_join": 63713, "filtered": "100.00", "using_index": true, "cost_info": { "read_cost": "1059.00", "eval_cost": "12742.60", "prefix_cost": "13801.60", "data_read_per_join": "126M" }, "used_columns": [ "id" ]
开启optimizer_trace,对该sql执行的过程进行跟踪分析:
set optimizer_trace='enabled=on'; set optimizer_trace_max_mem_size=10000; SELECT * FROM t_test_info WHERE id >= (SELECT FLOOR(MAX(id) * RAND()) FROM t_test_info) AND area_code = '5400' ORDER BY id LIMIT 1; select * from information_schema.optimizer_trace; 部分结果: "join_execution": { "select#": 1, "steps": [ { "subselect_execution": { "select#": 2, "steps": [ { "join_execution": { "select#": 2, "steps": [ }, -- 出现3039次相同的内容,这里感觉是异常的,一直都在执行子查询部分,陷入某个循环里面在空转? { "subselect_execution": { "select#": 2, "steps": [ { "join_execution": { "select#": 2, "steps": [ }
同时使用pstask进行记录堆栈信息
for((a=0;a<1;a--)); do pstack 19175 >> /tmp/pstack_1110.log done
这里的19175 是根据performance_schema.threads查询该SQL对应的THREAD_OS_ID。
通过pt-pmp对trace信息汇总 pt-pmp /tmp/pstack_1110.log:
/tmp/pstack_1110.log 部分信息
Thread 1 (process 19175): #0 0x00000000010afbf7 in row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long) () #1 0x0000000000fc357e in ha_innobase::general_fetch(unsigned char*, unsigned int, unsigned int) () #2 0x0000000000815551 in handler::ha_index_next(unsigned char*) () #3 0x0000000000cd811c in join_read_next(READ_RECORD*) () #4 0x0000000000cdcc06 in sub_select(JOIN*, QEP_TAB*, bool) () #5 0x0000000000cdbb7a in JOIN::exec() () #6 0x0000000000c0e6b0 in subselect_single_select_engine::exec() () #7 0x0000000000c10696 in Item_subselect::exec() () #8 0x0000000000c0cd69 in Item_singlerow_subselect::val_real() () #9 0x0000000000841080 in Arg_comparator::compare_real_fixed() () #10 0x0000000000849592 in Item_func_ge::val_int() () #11 0x000000000082592c in Item::val_bool() () #12 0x000000000084364a in Item_cond_and::val_int() () #13 0x0000000000cdb274 in evaluate_join_record(JOIN*, QEP_TAB*) () #14 0x0000000000cdcc53 in sub_select(JOIN*, QEP_TAB*, bool) () #15 0x0000000000cdbb7a in JOIN::exec() () #16 0x0000000000d45db0 in handle_query(THD*, LEX*, Query_result*, unsigned long long, unsigned long long) () #17 0x0000000000d06e93 in execute_sqlcom_select(THD*, TABLE_LIST*) () #18 0x0000000000d0a82a in mysql_execute_command(THD*, bool) () #19 0x0000000000d0c52d in mysql_parse(THD*, Parser_state*) () #20 0x0000000000d0d6be in dispatch_command(THD*, COM_DATA const*, enum_server_command) () #21 0x0000000000d0e324 in do_command(THD*) () #22 0x0000000000dded7c in handle_connection () #23 0x00000000012436d4 in pfs_spawn_thread () #24 0x0000003f65407aa1 in start_thread () from /lib64/libpthread.so.0 #25 0x0000003f650e8c4d in clone () from /lib64/libc.so.6
根据pt-pmp的统计,以上的次数多达237次。这里感觉是陷入某内部函数循环里面,需要多次判断取值才能返回。要想找到根因,只能看源码了,个人能力有限,先记录下来,有大神感兴趣可以研究一下。
到这里,我考虑将rand函数换成一个常量,查看执行计划并执行,发现是不一样的,执行时间也是毫秒级。
EXPLAIN SELECT * FROM t_test_info WHERE id >= (SELECT FLOOR(MAX(id) * 0.02) FROM t_test_info) AND area_code = '5400' ORDER BY id LIMIT 1; id select_type table partitions type partition_key key key_len rows filtered Extra 1 PRIMARY t_test_info range PRIMARY PRIMARY 8 31856 10 Using where 2 SUBQUERY Select tables optimized away
为进一步猜想是随机函数作为子查询的bug(也可能是UNCACHEABLE SUBQUERY问题,我更倾向这个原因,能力有限,还无法定论,大神可以指正一下),那么在当前版本5.7.12上可能存在,在别的版本已经修复,于是我在测试环境找了个8.0.20版本的库,构建了相同表结构,并造了相同数量的数据。
先对比了执行计划,发现执行计划是一样的, 8.0.20版本的执行计划:
id select_type table partitions type partition_key key key_len rows filtered Extra 1 PRIMARY t_test_info index PRIMARY 8 1 10 Using where 2 UNCACHEABLE SUBQUERY t_test_info index idx_ctime 6 63713 100 Using index
不同的是,在8.0.20的环境上,执行时间大概是0.03秒,差了几万倍,差别太大了。
回到5.7.12环境上,我根据SQL实现的逻辑,进行了改写,避免rand函数作为子查询,按照子查询改写为join的思路如下:
SELECT a.* FROM t_test_info a join ( SELECT FLOOR(MAX(id) * RAND()) xx FROM t_test_info ) b where a.id>=b.xx AND a.area_code = '5400' ORDER BY a.id LIMIT 1;
以上SQL在5.7.12环境上毫秒级就可以出结果,但该SQL会产生笛卡尔积,如果数据量很大,性能也会下降。并非合适的解决方案。
于是再按照避免rand函数作为子查询思路,拆分成两个SQL,先获取一个随机数,以传参的方式,进行查询,这种在代码层也相对较好实现。
SELECT FLOOR(MAX(id) * RAND()) into @str FROM t_test_info; SELECT * FROM t_test_info WHERE id >= @str AND area_code = '5400' ORDER BY id LIMIT 1;
在8.0以上版本,可以考虑使用with as的方式构建个临时表的方式进行改写,5.7不支持该语法。
WITH cc AS ( SELECT floor(max(id) * rand()) FROM t_test_info ) SELECT * FROM t_test_info WHERE area_code = '5400' AND id >= (SELECT * FROM cc) ORDER BY id LIMIT 1;
后记,后续参考了官方资料,更倾向是执行计划走UNCACHEABLE SUBQUERY这个原因导致的
https://dev.mysql.com/doc/refman/5.7/en/explain-output.html
https://dev.mysql.com/doc/refman/5.7/en/query-cache-operation.html
同时参考《MySQL:查询字段数量多少对查询效率的影响》