os: centos 7.4
postgresql: 9.6.9
pg_rman: REL9_6_STABLE
备份就是为了恢复,如果不能恢复,那备份将毫无意义。
$ pg_rman --help
pg_rman manage backup/recovery of PostgreSQL database.
Usage:
pg_rman OPTION init
pg_rman OPTION backup
pg_rman OPTION restore
pg_rman OPTION show [DATE]
pg_rman OPTION show detail [DATE]
pg_rman OPTION validate [DATE]
pg_rman OPTION delete DATE
pg_rman OPTION purge
pg_rman restore 全恢复
删除 $PGDATA 下的文件
# systemctl stop postgresql-9.6.service
# ps -ef|grep -i post |grep -v grep
# su - postgres
$ cd $PGDATA/..
$ rm -rf ./data
使用 pg_rman restore 使用最近的全备
$ pg_rman show detail
======================================================================================================================
StartTime EndTime Mode Data ArcLog SrvLog Total Compressed CurTLI ParentTLI Status
======================================================================================================================
2018-12-03 13:07:24 2018-12-03 13:07:29 INCR 3186kB 67MB ---- 8138kB true 1 0 OK
2018-12-03 13:04:57 2018-12-03 13:05:18 FULL 406MB 67MB 28kB 443MB false 1 0 OK
2018-12-03 13:01:14 2018-12-03 13:01:20 INCR 2932kB 100MB ---- 11MB true 1 0 OK
2018-12-03 12:16:05 2018-12-03 12:16:10 INCR 3203kB 67MB ---- 8083kB true 1 0 OK
2018-12-03 11:53:39 2018-12-03 11:54:00 FULL 405MB 83MB 2231kB 461MB false 1 0 OK
$ pg_rman restore
WARNING: pg_controldata file "/var/lib/pgsql/9.6/data/global/pg_control" does not exist
INFO: the recovery target timeline ID is not given
INFO: use timeline ID of latest full backup as recovery target: 1
INFO: calculating timeline branches to be used to recovery target point
INFO: searching latest full backup which can be used as restore start point
INFO: found the full backup can be used as base in recovery: "2018-12-03 13:04:57"
INFO: copying online WAL files and server log files
INFO: clearing restore destination
INFO: validate: "2018-12-03 13:04:57" backup, archive log files and server log files by SIZE
INFO: backup "2018-12-03 13:04:57" is valid
INFO: restoring database files from the full mode backup "2018-12-03 13:04:57"
INFO: searching incremental backup to be restored
INFO: validate: "2018-12-03 13:07:24" backup and archive log files by SIZE
INFO: backup "2018-12-03 13:07:24" is valid
INFO: restoring database files from the incremental mode backup "2018-12-03 13:07:24"
INFO: searching backup which contained archived WAL files to be restored
INFO: backup "2018-12-03 13:07:24" is valid
INFO: restoring WAL files from backup "2018-12-03 13:07:24"
INFO: restoring online WAL files and server log files
INFO: generating recovery.conf
INFO: restore complete
HINT: Recovery will start automatically when the PostgreSQL server is started.
注意下面这个日志,全备份+增量备份
INFO: found the full backup can be used as base in recovery: “2018-12-03 13:04:57”
INFO: searching incremental backup to be restored
INFO: validate: “2018-12-03 13:07:24” backup and archive log files by SIZE
查看 $PGDATA
$ cd $PGDATA/
$ ls -l
total 60
-rw-r--r-- 1 postgres postgres 215 Dec 3 14:26 backup_label
drwx------ 11 postgres postgres 123 Dec 3 14:26 base
drwx------ 2 postgres postgres 4096 Dec 3 14:26 global
drwx------ 2 postgres postgres 18 Dec 3 14:26 pg_clog
drwx------ 2 postgres postgres 6 Dec 3 14:26 pg_commit_ts
drwx------ 2 postgres postgres 6 Dec 3 14:26 pg_dynshmem
-rw------- 1 postgres postgres 4260 Dec 3 14:26 pg_hba.conf
-rw------- 1 postgres postgres 1636 Dec 3 14:26 pg_ident.conf
drwx------ 2 postgres postgres 6 Dec 3 14:26 pg_log
drwx------ 4 postgres postgres 68 Dec 3 14:26 pg_logical
drwx------ 4 postgres postgres 36 Dec 3 14:26 pg_multixact
drwx------ 2 postgres postgres 18 Dec 3 14:26 pg_notify
drwx------ 2 postgres postgres 6 Dec 3 14:26 pg_replslot
drwx------ 2 postgres postgres 6 Dec 3 14:26 pg_serial
drwx------ 2 postgres postgres 6 Dec 3 14:26 pg_snapshots
drwx------ 2 postgres postgres 6 Dec 3 14:26 pg_stat
drwx------ 2 postgres postgres 6 Dec 3 14:26 pg_stat_tmp
drwx------ 2 postgres postgres 18 Dec 3 14:26 pg_subtrans
drwx------ 2 postgres postgres 6 Dec 3 14:26 pg_tblspc
drwx------ 2 postgres postgres 6 Dec 3 14:26 pg_twophase
-rw------- 1 postgres postgres 4 Dec 3 14:26 PG_VERSION
drwx------ 3 postgres postgres 28 Dec 3 14:26 pg_xlog
-rw------- 1 postgres postgres 88 Dec 3 14:26 postgresql.auto.conf
-rw------- 1 postgres postgres 22454 Dec 3 14:26 postgresql.conf
-rw------- 1 postgres postgres 60 Dec 3 14:26 postmaster.opts
-rw-r--r-- 1 postgres postgres 118 Dec 3 14:26 recovery.conf
$ cat recovery.conf
# recovery.conf generated by pg_rman 1.3.7
restore_command = 'cp /mnt/walbackup/%f %p'
recovery_target_timeline = '1'
查看 recovery.conf,恢复完成后就直接成了master,建议修改为如下内容
restore_command = 'cp /mnt/walbackup/%f %p'
recovery_target_timeline = '1'
recovery_target_action = 'pause'
standby_mode = on
启动postgresql,观察日志
# systemctl start postgresql-9.6.service
# tail -f /var/lib/pgsql/9.6/data/pg_log/postgresql-2018-12-03.csv
2018-12-03 14:47:28.672 CST,,,27907,,5c04d180.6d03,1,,2018-12-03 14:47:28 CST,,0,LOG,00000,"ending log output to stderr",,"Future log output will go to log destination ""csvlog"".",,,,,,,""
2018-12-03 14:47:28.673 CST,,,27910,,5c04d180.6d06,1,,2018-12-03 14:47:28 CST,,0,LOG,00000,"database system was interrupted; last known up at 2018-12-03 13:07:24 CST",,,,,,,,,""
2018-12-03 14:47:29.454 CST,,,27910,,5c04d180.6d06,2,,2018-12-03 14:47:28 CST,,0,LOG,00000,"entering standby mode",,,,,,,,,""
2018-12-03 14:47:29.468 CST,,,27910,,5c04d180.6d06,3,,2018-12-03 14:47:28 CST,,0,LOG,00000,"restored log file ""0000000100000003000000D9"" from archive",,,,,,,,,""
2018-12-03 14:47:29.699 CST,,,27910,,5c04d180.6d06,4,,2018-12-03 14:47:28 CST,1/0,0,LOG,00000,"redo starts at 3/D9000028",,,,,,,,,""
2018-12-03 14:47:29.701 CST,,,27910,,5c04d180.6d06,5,,2018-12-03 14:47:28 CST,1/0,0,LOG,00000,"consistent recovery state reached at 3/D9000130",,,,,,,,,""
2018-12-03 14:47:29.701 CST,,,27907,,5c04d180.6d03,2,,2018-12-03 14:47:28 CST,,0,LOG,00000,"database system is ready to accept read only connections",,,,,,,,,""
2018-12-03 14:47:29.718 CST,,,27910,,5c04d180.6d06,6,,2018-12-03 14:47:28 CST,1/0,0,LOG,00000,"restored log file ""0000000100000003000000DA"" from archive",,,,,,,,,""
2018-12-03 14:47:29.899 CST,,,27910,,5c04d180.6d06,7,,2018-12-03 14:47:28 CST,1/0,0,LOG,00000,"restored log file ""0000000100000003000000DB"" from archive",,,,,,,,,""
2018-12-03 14:47:30.138 CST,,,27910,,5c04d180.6d06,8,,2018-12-03 14:47:28 CST,1/0,0,LOG,00000,"restored log file ""0000000100000003000000DC"" from archive",,,,,,,,,""
2018-12-03 14:57:29.795 CST,,,27913,,5c04d181.6d09,1,,2018-12-03 14:47:29 CST,,0,LOG,00000,"restartpoint starting: time",,,,,,,,,""
2018-12-03 14:57:30.088 CST,,,27913,,5c04d181.6d09,2,,2018-12-03 14:47:29 CST,,0,LOG,00000,"restartpoint complete: wrote 0 buffers (0.0%); 1 transaction log file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s, total=0.293 s; sync files=0, longest=0.000 s, average=0.000 s; distance=32768 kB, estimate=32768 kB",,,,,,,,,""
2018-12-03 14:57:30.088 CST,,,27913,,5c04d181.6d09,3,,2018-12-03 14:47:29 CST,,0,LOG,00000,"recovery restart point at 3/DB000220","last completed transaction was at log time 2018-12-03 13:07:27.977677+08",,,,,,,,""
“2018-12-03 13:04:57” 基础备份lsn信息
# result
TIMELINEID=1
START_LSN=3/d7000028
STOP_LSN=3/d7000130
START_TIME='2018-12-03 13:04:57'
END_TIME='2018-12-03 13:05:18'
RECOVERY_XID=11034
RECOVERY_TIME='2018-12-03 13:05:17'
“2018-12-03 13:07:24” 增量备份lsn信息
# result
TIMELINEID=1
START_LSN=3/d9000028
STOP_LSN=3/d9000130
START_TIME='2018-12-03 13:07:24'
END_TIME='2018-12-03 13:07:29'
RECOVERY_XID=11038
RECOVERY_TIME='2018-12-03 13:07:27'
可以看到日志输出 consistent recovery state reached at 3/D9000130 对应 “2018-12-03 13:07:24” 增量备份的 STOP_LSN=3/d9000130,紧接着继续应用 wal
有时候restore后启动会碰到如下错误:
invalid primary checkpoint record
invalid secondary checkpoint record
could not locate a valid checkpoint record
此时只能重置xlog,并取消恢复模式
$ pg_resetxlog -f $PGDATA
$ mv $PGDATA/recovery.conf $PGDATA/recovery.done
pg_rman restore 恢复到指定时间
这种恢复一般是用于误操作删除了某个表、函数等。需要通过异机恢复到删除前的时间。
$ pg_rman --help
Restore options:
--recovery-target-time time stamp up to which recovery will proceed
--recovery-target-xid transaction ID up to which recovery will proceed
--recovery-target-inclusive whether we stop just after the recovery target
--recovery-target-timeline recovering into a particular timeline
--hard-copy copying archivelog not symbolic link
这几个参数和 recovery.conf文件中的参数的意思是一致的,可以具体参考 $PGHOME/share/postgresql.conf.sample
–recovery-target-timeline TIMELINE
指定恢复的时间线,不指定,则用$PGDATA/global/pg_control)中的时间线。
–recovery-target-time TIMESTAMP
指定恢复到哪个时间。不指定,则一直持续恢复到最后的时间。
–recovery-target-xid XID
指定恢复到哪个事务ID(XID),不指定,则一直持续恢复到最后的XID。
–recovery-target-inclusive
前面指定的恢复点(recovery-target-time、recovery-target-xid),恢复时是刚好包含这个点,还是刚好在这个点之前停掉,默认是包含这个点(即设置为true的情况)
其实就是数学里包含、不包含指定点的意思
–hard-copy
如果没有指定这个参数,pg_rman实际上是把在归档目录中建一个软链接指向恢复中要用到的WAL日志文件。如果指定了这个参数,则执行拷贝。
强烈建议使用 --hard-copy 方式
个人理解,不管指不指定 --hard-copy 都调整了wal归档文件,这样很不好。应该额外指定个文件夹,将恢复所需的wal归档文件拷贝到指定的文件夹里。
$ pg_rman restore --recovery-target-timeline='1' --recovery-target-time='2018-12-03 13:02:20' --hard-copy
由于使用了 --hard-copy ,发现wal归档目录 /mnt/walbackup 有的 wal 文件都被覆盖了。而不是调整为 ln -s 的方式指向 pg_rman 全量备份和增量备份。
$ vi $PGDATA/recovery.conf
restore_command = 'cp /mnt/walbackup/%f %p'
recovery_target_time = '2018-12-03 13:02:20'
recovery_target_timeline = '1'
recovery_target_action = 'pause'
standby_mode = on
# systemctl start postgresql-9.6.service
# tail -f /var/lib/pgsql/9.6/data/pg_log/postgresql-2018-12-03.csv
2018-12-03 15:31:07.051 CST,,,31159,,5c04dbba.79b7,1,,2018-12-03 15:31:06 CST,,0,LOG,00000,"ending log output to stderr",,"Future log output will go to log destination ""csvlog"".",,,,,,,""
2018-12-03 15:31:07.074 CST,,,31162,,5c04dbbb.79ba,1,,2018-12-03 15:31:07 CST,,0,LOG,00000,"database system was interrupted; last known up at 2018-12-03 13:01:15 CST",,,,,,,,,""
2018-12-03 15:31:07.955 CST,,,31162,,5c04dbbb.79ba,2,,2018-12-03 15:31:07 CST,,0,LOG,00000,"starting point-in-time recovery to 2018-12-03 13:02:20+08",,,,,,,,,""
2018-12-03 15:31:07.992 CST,,,31162,,5c04dbbb.79ba,3,,2018-12-03 15:31:07 CST,,0,LOG,00000,"restored log file ""0000000100000003000000D5"" from archive",,,,,,,,,""
2018-12-03 15:31:08.166 CST,,,31162,,5c04dbbb.79ba,4,,2018-12-03 15:31:07 CST,1/0,0,LOG,00000,"redo starts at 3/D5000028",,,,,,,,,""
2018-12-03 15:31:08.168 CST,,,31162,,5c04dbbb.79ba,5,,2018-12-03 15:31:07 CST,1/0,0,LOG,00000,"consistent recovery state reached at 3/D50000F8",,,,,,,,,""
2018-12-03 15:31:08.169 CST,,,31159,,5c04dbba.79b7,2,,2018-12-03 15:31:06 CST,,0,LOG,00000,"database system is ready to accept read only connections",,,,,,,,,""
2018-12-03 15:31:08.183 CST,,,31162,,5c04dbbb.79ba,6,,2018-12-03 15:31:07 CST,1/0,0,LOG,00000,"restored log file ""0000000100000003000000D6"" from archive",,,,,,,,,""
2018-12-03 15:31:08.352 CST,,,31162,,5c04dbbb.79ba,7,,2018-12-03 15:31:07 CST,1/0,0,LOG,00000,"restored log file ""0000000100000003000000D7"" from archive",,,,,,,,,""
2018-12-03 15:31:08.569 CST,,,31162,,5c04dbbb.79ba,8,,2018-12-03 15:31:07 CST,1/0,0,LOG,00000,"restored log file ""0000000100000003000000D8"" from archive",,,,,,,,,""
2018-12-03 15:31:08.786 CST,,,31162,,5c04dbbb.79ba,9,,2018-12-03 15:31:07 CST,1/0,0,LOG,00000,"recovery stopping before commit of transaction 11034, time 2018-12-03 13:05:17.729183+08",,,,,,,,,""
2018-12-03 15:31:08.786 CST,,,31162,,5c04dbbb.79ba,10,,2018-12-03 15:31:07 CST,1/0,0,LOG,00000,"recovery has paused",,"Execute pg_xlog_replay_resume() to continue.",,,,,,,""
参考:
https://github.com/ossc-db/pg_rman/tree/master
http://ossc-db.github.io/pg_rman/index.html
https://travis-ci.org/ossc-db/pg_rman