背景:产品需求经常变更的情况下,造成了线上数据库的表结构需要不停地进行变更,若直接 alter table(包括create index等)会导致锁表,后续对相关表的读写操作都会进入到 "Waiting for table metadata lock" 锁等待队列中,严重影响高负荷业务系统的运行。
注:关于5.5引入的metadata锁,即使是select操作也会产生metadata锁(保护查询过程中表结构不被破坏)。
下面我们分别讲述5.6和5.5以下如何正确进行OSC(Online Schema Change)的方法。
一、MySQL 5.6 官方 Online DDL
首先介绍下5.6官方引入的Online DDL。
通过设置 ALGORITHM 和 LOCK 可以自由决定 online ddl 过程中对性能和并发的倾向。
(1)执行算法
ALGORITHM [=] {DEFAULT|INPLACE|COPY}
DEFAULT:显示地指定该参数与不去指定效果相同。默认 old_alter_table : OFF 时优先尝试使用INPLACE,如果不被支持则转变为COPY
INPLACE:原地更新,避免重建表。相较COPY减少了IO和CPU消耗,为共享锁方式。性能较好但仅支持添加、删除索引的DDL操作
COPY:复制原始表。如果是大表复制到临时表会占用buffer pool(过大的话甚至会转为磁盘存储,性能更低),内存大量消耗,影响性能。
以添加索引为例,简析 INPLACE 和 COPY 方式的内部执行过程
(1)COPY方式
新建带新元素的临时表,同原表
S锁原表,从而禁止DML,而允许select
将原表数据拷贝到临时表
S锁升级为X锁,将临时表命名为原表名,(rename是修改数据字典,很快)
(2)INPLACE方式
原地更新方式。只能处理二级索引,若需要添加主键索引,即使采用INPLACE方式也会转化为COPY方式
创建二级索引的数据字典
原表加S锁
。。。。
(2)Locking Options for Online DDL
LOCK [=] {DEFAULT|NONE|SHARED|EXCLUSIVE}
DEFAULT:根据给定的ALGORITHM提供尽可能大的并发性:选取的支持优先级:NONE > SHARED > EXCLUSIVE
NONE:无锁,可支持其他事务的并发读写
SHARED:共享锁,支持其他事务的并发读,但堵塞写
EXCLUSIVE:排他锁,堵塞其他事务的读写(即使ALGORITHM中支持并发操作)
5.6 Online DDL执行过程:
1、Prepare阶段
0)语法检查,合理性、冲突检查
1)对原表创建临时frm文件
2)在原表上加表级排他meta data锁(Exclusive-MDL),禁止读写。(所以,一般在执行 online ddl前,需要查看是否有大查询的存在)
3)根据alter table类型确定执行方式:inplace(Online-rebuild、Online-norebuild)或者是copy
4)更新数据字典的内存对象,系统表中创建索引
5)分配row_log对象记录增量日志,增量日志用于记录:DDL操作过程中,记录DML操作对数据的修改。log大小由 innodb_online_alter_log_max_size 决定,操作过程中日志量过大,超过该值时,会导致DDL操作报错。
6)若执行方式为rebuild,则生成临时ibd文件,提交数据字典操作的事务,释放数据字典的锁
2、DDL执行阶段
1)降级Exclusive-MDL锁,允许读写
2)扫描原表的聚簇索引每条记录
3)遍历新表的聚簇索引和二级索引,逐一处理
4)根据记录构造对应的索引项
5)将构造的索引项插入sort_buffer块,注意排序操作可能需要用到tmpdir,过小会报错。
6)利用sort_buffer构造新的索引
7)若执行方式为rebuild,则还需要处理DDL执行过程中产生的增量,应用row_log,将新数据加入到ibd文件中
3、Commit阶段
1)升级Exclusive-MDL锁,禁止读写
2)前一次应用日志到本阶段升级Exclusive-MDL锁这段时间之间的row_log中,可能新产生了日志,再次应用之。
3)更新innodb的数据字典表
4)提交事务(刷事务的redo日志)
5)修改统计信息(数据字典、索引信息等)
6)rename临时ibd文件、frm文件
7)变更完成。
几个关键参数:
innodb_online_alter_log_max_size:DDL操作期间产生的日志,保存在内存中,大小由该参数控制,默认128M。可基于会话级别动态调整。
如果产生的日志大于该值,则会抛出如下错误:
Error:1799SQLSTATE:HY000(ER_INNODB_ONLINE_LOG_TOO_BIG)
Message: Creating index 'idx_aaa' required more than 'innodb_online_alter_log_max_size' bytes of modification log. Please try again.
tmpdir:DDL执行阶段,构造索引过程中排序时内存空间不足时,需要的临时空间
二、5.5及以前版本的OSC方法
5.6版本之前,在线变更表结构一般使用第三方工具,例如 OAK的oak-online-alter-table 或者 pt-online-schema-change等。
oak-online-alter-table
oak-online-alter-table采用的是copy的方式执行DDL,执行期间新增的DML产生的数据通过一个触发器同步到临时表。
使用oak-online-alter-table的注意点:
(1)主键必须为单列索引(联合索引为主键不可以,否则触发mysql的一个bug)
(2)不能存在外键
(3)不能存在触发器(对于已有触发器,先备份再删除,再执行oak ddl)。是因为OAK本身也有触发器,用于在DDL过程中将原表上新产生的DML操作传递到临时表中。
注:是否存在外键和触发器的检查SQL
Select * from information_schema.key_column_usage where Table_schema=@dbname and table_name=@tablename and Referenced_table_name is not null ;
Select * from information_schema.key_column_usage where Referenced_table_schema=@dbname and Referenced_table_name=@tablename;(4)执行前检查是否存在大查询,导致Online DDL失败
(5)执行前预估执行时间,选择业务低谷期执行
(6)执行完之后,需要进行数据校验,检查原表和复制的临时表的数据一致性(是因为DDL如果改变表了字段类型,可能导致数据变化)
案例:如何使用OAK执行一次online ddl
1)使用sysbench创建测试表,表结构如下
mysql> use sysbench;
mysql> show tables;
+--------------------+
| Tables_in_sysbench |
+--------------------+
| sbtest1 |
+--------------------+
1 row in set (0.00 sec)
mysql> show create table sbtest1 \G
*************************** 1. row ***************************
Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`k` int(10) unsigned NOT NULL DEFAULT '0',
`c` char(120) NOT NULL DEFAULT '',
`pad` char(60) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=20001 DEFAULT CHARSET=utf8mb4 MAX_ROWS=1000000
表数据2万行
mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 20000 |
+----------+
1 row in set (0.01 sec)
2)检查外键、触发器情况。均无。
mysql> select TRIGGER_SCHEMA,TRIGGER_NAME,EVENT_OBJECT_SCHEMA,EVENT_OBJECT_TABLE from information_schema.TRIGGERS where EVENT_OBJECT_SCHEMA='sysbench';
Empty set (0.00 sec)
mysql>
mysql> Select * from information_schema.key_column_usage where Table_schema="sysbench" and table_name="sbtest1" and Referenced_table_name is not null;
Empty set (0.01 sec)
mysql>
mysql> Select * from information_schema.key_column_usage where Referenced_table_schema="sysbench" and Referenced_table_name="sbtest1";
Empty set (0.05 sec)
3)使用OAK工具包内的oak-online-alter-table进行在线DDL操作(以增加表sbtest1的字段:last_update_time 和索引:lut 为例)
每次从原表中取出的行数: -c CHUNK_SIZE, --chunk-size=CHUNK_SIZENumber of rows to act on in chunks. Default: 1000
[root@237_12 ~]# oak-online-alter-table -uroot --ask-pass -S /tmp/mysqld.sock -d sysbench -t sbtest1 -g new_sbtest1 -a "add last_update_time timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,add key lut(last_update_time)" --sleep=300 --skip-delete-pass
-- Connecting to MySQL
Password:
-- Table sysbench.sbtest1 is of engine innodb
-- Checking for UNIQUE columns on sysbench.sbtest1, by which to chunk
-- Possible UNIQUE KEY column names in sysbench.sbtest1:
-- - id
-- Table sysbench.new_sbtest1 has been created
-- Table sysbench.new_sbtest1 has been altered
-- Checking for UNIQUE columns on sysbench.new_sbtest1, by which to chunk
-- Possible UNIQUE KEY column names in sysbench.new_sbtest1:
-- - id
-- Checking for UNIQUE columns on sysbench.sbtest1, by which to chunk
-- - Found following possible unique keys:
-- - id (int)
-- Chosen unique key is 'id'
-- Shared columns: c, pad, k, id
-- Created AD trigger
-- Created AU trigger
-- Created AI trigger
-- Attempting to lock tables
-- Tables locked WRITE
-- id (min, max) values: ([1L], [20000L])
-- Tables unlocked
-- - Reminder: altering sysbench.sbtest1: add last_update_time timestamp...
-- Copying range (1), (1000), progress: 0%
-- + Will sleep for 0.3 seconds
-- Copying range (1000), (2000), progress: 5%
-- + Will sleep for 0.3 seconds
-- Copying range (2000), (3000), progress: 10%
-- + Will sleep for 0.3 seconds
-- Copying range (3000), (4000), progress: 15%
-- + Will sleep for 0.3 seconds
-- Copying range (4000), (5000), progress: 20%
-- + Will sleep for 0.3 seconds
-- Copying range (5000), (6000), progress: 25%
-- + Will sleep for 0.3 seconds
-- Copying range (6000), (7000), progress: 30%
-- + Will sleep for 0.3 seconds
-- Copying range (7000), (8000), progress: 35%
-- + Will sleep for 0.3 seconds
-- Copying range (8000), (9000), progress: 40%
-- + Will sleep for 0.3 seconds
-- Copying range (9000), (10000), progress: 45%
-- + Will sleep for 0.3 seconds
-- Copying range (10000), (11000), progress: 50%
-- + Will sleep for 0.3 seconds
-- Copying range (11000), (12000), progress: 55%
-- + Will sleep for 0.3 seconds
-- Copying range (12000), (13000), progress: 60%
-- + Will sleep for 0.3 seconds
-- Copying range (13000), (14000), progress: 65%
-- + Will sleep for 0.3 seconds
-- Copying range (14000), (15000), progress: 70%
-- + Will sleep for 0.3 seconds
-- Copying range (15000), (16000), progress: 75%
-- + Will sleep for 0.3 seconds
-- Copying range (16000), (17000), progress: 80%
-- + Will sleep for 0.3 seconds
-- Copying range (17000), (18000), progress: 85%
-- + Will sleep for 0.3 seconds
-- Copying range (18000), (19000), progress: 90%
-- + Will sleep for 0.3 seconds
-- Copying range (19000), (20000), progress: 95%
-- + Will sleep for 0.3 seconds
-- Copying range 100% complete. Number of rows: 20000
-- Ghost table creation completed. Note that triggers on sysbench.sbtest1 were not removed
[root@237_12 ~]#
此时模拟DDL操作期间原表有新数据插入
mysql> insert into sbtest1 values(99999,99999,"c99999","pad99999");
Query OK, 1 row affected (0.00 sec)
mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 20001 |
+----------+
1 row in set (0.01 sec)
待online DDL操作完成之后,查看new_sbtest1表的数据量:
mysql> select count(*) from new_sbtest1;
+----------+
| count(*) |
+----------+
| 20001 |
+----------+
1 row in set (0.01 sec)
我们注意观察oak的输出日志:Copying range 100% complete. Number of rows: 20000
说明在执行DDL操作之前的原表数据是通过COPY操作复制到新表上去。
而从开始执行DDL到rename这个时间段内新的DML带来的数据变更通过触发器来同步到新表中去。
OAK触发器信息如下:
mysql> select TRIGGER_SCHEMA,TRIGGER_NAME,EVENT_OBJECT_SCHEMA,EVENT_OBJECT_TABLE from information_schema.TRIGGERS where EVENT_OBJECT_SCHEMA='sysbench';
+----------------+----------------+---------------------+--------------------+
| TRIGGER_SCHEMA | TRIGGER_NAME | EVENT_OBJECT_SCHEMA | EVENT_OBJECT_TABLE |
+----------------+----------------+---------------------+--------------------+
| sysbench | sbtest1_AI_oak | sysbench | sbtest1 |
| sysbench | sbtest1_AU_oak | sysbench | sbtest1 |
| sysbench | sbtest1_AD_oak | sysbench | sbtest1 |
+----------------+----------------+---------------------+--------------------+
3 rows in set (0.00 sec)
4)数据一致性校验
1、查看表结构及索引信息
mysql> desc sbtest1;
+-------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| k | int(10) unsigned | NO | MUL | 0 | |
| c | char(120) | NO | | | |
| pad | char(60) | NO | | | |
+-------+------------------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
mysql> desc new_sbtest1;
+------------------+------------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+------------------+------+-----+-------------------+-----------------------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| k | int(10) unsigned | NO | MUL | 0 | |
| c | char(120) | NO | | | |
| pad | char(60) | NO | | | |
| last_update_time | timestamp | NO | MUL | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+------------------+------------------+------+-----+-------------------+-----------------------------+
5 rows in set (0.00 sec)
2、新、旧表数据总量的校验,上面的 3)已经展示过
3、对比类型为int的两组字段 id 和 k 的检验和
mysql> select sum(crc32(concat(ifnull(id,'NULL'),ifnull(k,'NULL')))) as sum_old from sbtest1;
+----------------+
| sum_old |
+----------------+
| 42815177029049 |
+----------------+
1 row in set (0.02 sec)
mysql> select sum(crc32(concat(ifnull(id,'NULL'),ifnull(k,'NULL')))) as sum_new from new_sbtest1;
+----------------+
| sum_new |
+----------------+
| 42815177029049 |
+----------------+
1 row in set (0.02 sec)
5)rename(该阶段虽然会存在锁表的情况,但只需要修改数据字典所以时间非常快)
mysql> use sysbench;
Database changed
mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)
mysql> rename table sbtest1 to old_sbtest1,new_sbtest1 to sbtest1;
Query OK, 0 rows affected (0.02 sec)
mysql> show tables;
+--------------------+
| Tables_in_sysbench |
+--------------------+
| old_sbtest1 |
| sbtest1 |
+--------------------+
2 rows in set (0.00 sec)
删除OAK的3个触发器,以及原表old_sbtest1
mysql> drop trigger sbtest1_AI_oak;
Query OK, 0 rows affected (0.00 sec)
mysql> drop trigger sbtest1_AU_oak;
Query OK, 0 rows affected (0.01 sec)
mysql> drop trigger sbtest1_AD_oak;
Query OK, 0 rows affected (0.00 sec)
mysql> drop table old_sbtest1;
Query OK, 0 rows affected (0.01 sec)
mysql> show tables;
+--------------------+
| Tables_in_sysbench |
+--------------------+
| sbtest1 |
+--------------------+
1 row in set (0.00 sec)
至此,使用OAK工具进行Online DDL操作完毕。
(1)官方手册:Online DDL Overview
https://dev.mysql.com/doc/refman/5.6/en/innodb-create-index-overview.html
中文译本:http://blog.csdn.net/paololiu/article/details/53765818
(2)官方手册:pt-online-schema-change
https://www.percona.com/doc/percona-toolkit/2.1/pt-online-schema-change.html
(3)官方手册:oak-online-alter-table
http://openarkkit.googlecode.com/svn/trunk/openarkkit/doc/html/oak-online-alter-table.html
(4)pt-online-schema-change VS oak-online-alter-table:
http://www.cnblogs.com/gomysql/p/3777607.html
(5)Github提供的gh-ost:
http://www.oschina.net/news/76606/gh-ost-github-s-online-migration-tool-for-mysql
http://www.jianshu.com/p/70bc5c06b289