最近,一客户单实例mongodb数据库,没有备份的情况下遇到了断电导致的数据文件损坏,由于客户业务需要
及数据的不敏感性,要求尽快恢复业务,使用了Mongdb的自动修复repair命令进行修复。可喜的是,帮助用户尽
快恢复了服务,可悲的是在客户可接受情况下相关数据文件内的数据丢失。这里,对这一过程做个总结,同时说明
repair后为什么数据丢失。
-
正常的mongodb数据查询
> show dbs;
admin 0.000GB
config 0.000GB
dns_testdb 0.009GB
local 0.000GB
> use dns_testdb
switched to db dns_testdb
> db.test_collection.find();
{ "_id" : ObjectId("5fedd03d9d2569ee04ab62e1"), "name" : "elephant", "user_id" : 0, "boolean" : false, "added_at" : ISODate("2020-12-31T13:21:01.226Z"), "number" : 5129 }
{ "_id" : ObjectId("5fedd03d9d2569ee04ab62e2"), "name" : "dog", "user_id" : 1, "boolean" : false, "added_at" : ISODate("2020-12-31T13:21:01.237Z"), "number" : 9699 }
{ "_id" : ObjectId("5fedd03d9d2569ee04ab62e3"), "name" : "lion", "user_id" : 2, "boolean" : false, "added_at" : ISODate("2020-12-31T13:21:01.238Z"), "number" : 1783 }
Type "it" for more
>
2.模拟数据文件损坏
[mongo@centos7 dns_testdb]$ du -sh *
28M collection-8--6736947369024546614.wt
9.5M index-9--6736947369024546614.wt
[mongo@centos7 dns_testdb]$
[mongo@centos7 dns_testdb]$
[mongo@centos7 dns_testdb]$ pwd
/opt/mongo/data/single/dns_testdb
[mongo@centos7 dns_testdb]$ dd if=/dev/null of=/opt/mongo/data/single/dns_testdb/collection-8--6736947369024546614.wt bs=1024k count=5
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000132203 s, 0.0 kB/s
[mongo@centos7 dns_testdb]$
3.重新启动mongodb
> use admin
switched to db admin
> db.shutdownServer();
[mongo@centos7 data]$ mongod --dbpath /opt/mongo/data/single --port 50001 --oplogSize 512 --fork --bind_ip 0.0.0.0 --logpath /opt/mongo/logs/single.log --logappend --journal --directoryperdb --profile=1
about to fork child process, waiting until server is ready for connections.
forked process: 102882
child process started successfully, parent exiting
4.虽然mongodb进程能启动,但是数据文件损坏后的数据集合做数据操作会导致mongod挂掉
[mongo@centos7 data]$ mongo --port 50001
MongoDB shell version v4.2.3
connecting to: mongodb://127.0.0.1:50001/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("09b6c6aa-059d-4a41-9e0d-e6553966399b") }
MongoDB server version: 4.2.3
Server has startup warnings:
> show dbs;
admin 0.000GB
config 0.000GB
dns_testdb 0.037GB
local 0.000GB
> use dns_testdb;
switched to db dns_testdb
> db.test_collection.find();
2020-12-31T08:43:45.115-0500 I NETWORK [js] DBClientConnection failed to receive message from 127.0.0.1:50001 - HostUnreachable: Connection closed by peer
Error: error doing query: failed: network error while attempting to run command 'find' on host '127.0.0.1:50001'
2020-12-31T08:43:45.118-0500 I NETWORK [js] trying reconnect to 127.0.0.1:50001 failed
2020-12-31T08:43:45.118-0500 I NETWORK [js] reconnect 127.0.0.1:50001 failed failed
>
5.观察mongodb日志,提示数据文件损坏并建议使用repair进行修复
2020-12-31T08:43:45.103-0500 E STORAGE [conn1] WiredTiger error (-31802) [1609422225:103947][102882:0x7f96713b5700], file:dns_testdb/collection-8--6736947369024546614.wt, WT_SESSION.open_cursor: __desc_read, 351: dns_testdb/collection-8--6736947369024546614.wt does not appear to be a WiredTiger file: WT_ERROR: non-specific WiredTiger error Raw: [1609422225:103947][102882:0x7f96713b5700], file:dns_testdb/collection-8--6736947369024546614.wt, WT_SESSION.open_cursor: __desc_read, 351: dns_testdb/collection-8--6736947369024546614.wt does not appear to be a WiredTiger file: WT_ERROR: non-specific WiredTiger error
2020-12-31T08:43:45.104-0500 E STORAGE [conn1] Failed to open a WiredTiger cursor. Reason: UnknownError: -31802: WT_ERROR: non-specific WiredTiger error, uri: table:dns_testdb/collection-8--6736947369024546614, config:
2020-12-31T08:43:45.104-0500 E STORAGE [conn1] This may be due to data corruption. Please read the documentation for starting MongoDB with --repair here: http://dochub.mongodb.org/core/repair
2020-12-31T08:43:45.104-0500 F - [conn1] Fatal Assertion 50882 at src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp 101
2020-12-31T08:43:45.104-0500 F - [conn1]
***aborting after fassert() failure
6.按照mongod日志就行修复数据库
[mongo@centos7 data]$ mongod --dbpath /opt/mongo/data/single --port 50001 --oplogSize 512 --fork --bind_ip 0.0.0.0 --logpath /opt/mongo/logs/single.log --logappend --journal --directoryperdb --profile=1 --repair
about to fork child process, waiting until server is ready for connections.
forked process: 102942
child process started successfully, parent exiting
[mongo@centos7 data]$
7.修复过程中,mongod日志提示相关损坏的数据集合及索引被重建
2020-12-31T08:44:45.646-0500 I STORAGE [initandlisten] repairDatabase dns_testdb
2020-12-31T08:44:45.646-0500 I STORAGE [initandlisten] Repairing collection dns_testdb.test_collection
2020-12-31T08:44:45.647-0500 E STORAGE [initandlisten] WiredTiger error (-31802) [1609422285:647413][102942:0x7fca99ec8c40], file:dns_testdb/collection-8--6736947369024546614.wt, WT_SESSION.verify: __desc_read, 351: dns_testdb/collection-8--6736947369024546614.wt does not appear to be a WiredTiger file: WT_ERROR: non-specific WiredTiger error Raw: [1609422285:647413][102942:0x7fca99ec8c40], file:dns_testdb/collection-8--6736947369024546614.wt, WT_SESSION.verify: __desc_read, 351: dns_testdb/collection-8--6736947369024546614.wt does not appear to be a WiredTiger file: WT_ERROR: non-specific WiredTiger error
2020-12-31T08:44:45.647-0500 I STORAGE [initandlisten] Verify failed on uri table:dns_testdb/collection-8--6736947369024546614. Running a salvage operation.
2020-12-31T08:44:45.647-0500 E STORAGE [initandlisten] WiredTiger error (-31802) [1609422285:647930][102942:0x7fca99ec8c40], file:dns_testdb/collection-8--6736947369024546614.wt, WT_SESSION.salvage: __desc_read, 351: dns_testdb/collection-8--6736947369024546614.wt does not appear to be a WiredTiger file: WT_ERROR: non-specific WiredTiger error Raw: [1609422285:647930][102942:0x7fca99ec8c40], file:dns_testdb/collection-8--6736947369024546614.wt, WT_SESSION.salvage: __desc_read, 351: dns_testdb/collection-8--6736947369024546614.wt does not appear to be a WiredTiger file: WT_ERROR: non-specific WiredTiger error
2020-12-31T08:44:45.648-0500 W STORAGE [initandlisten] Salvage failed for uri table:dns_testdb/collection-8--6736947369024546614: Salvage failed: -31802: WT_ERROR: non-specific WiredTiger error. The file will be moved out of the way and a new ident will be created.
2020-12-31T08:44:45.648-0500 W STORAGE [initandlisten] Moving data file /opt/mongo/data/single/dns_testdb/collection-8--6736947369024546614.wt to backup as /opt/mongo/data/single/dns_testdb/collection-8--6736947369024546614.wt.corrupt
2020-12-31T08:44:45.648-0500 W STORAGE [initandlisten] Rebuilding ident dns_testdb/collection-8--6736947369024546614
2020-12-31T08:44:45.708-0500 I STORAGE [initandlisten] Successfully re-created table:dns_testdb/collection-8--6736947369024546614.
2020-12-31T08:44:45.718-0500 I INDEX [initandlisten] index build: starting on dns_testdb.test_collection properties: { v: 2, key: { _id: 1 }, name: "_id_", ns: "dns_testdb.test_collection" } using method: Foreground
2020-12-31T08:44:45.718-0500 I INDEX [initandlisten] build may temporarily use up to 200 megabytes of RAM
2020-12-31T08:44:45.718-0500 I STORAGE [initandlisten] Index build initialized: 2ddee833-ea97-4964-98c0-7137e71a99c9: dns_testdb.test_collection: indexes: 1
2020-12-31T08:44:45.722-0500 I STORAGE [initandlisten] Index builds manager starting: 2ddee833-ea97-4964-98c0-7137e71a99c9: dns_testdb.test_collection
2020-12-31T08:44:45.724-0500 I INDEX [initandlisten] index build: inserted 0 keys from external sorter into index in 0 seconds
2020-12-31T08:44:45.727-0500 I INDEX [initandlisten] index build: done building index _id_ on ns dns_testdb.test_collection
2020-12-31T08:44:45.727-0500 I STORAGE [initandlisten] Index builds manager completed successfully: 2ddee833-ea97-4964-98c0-7137e71a99c9: dns_testdb.test_collection. Index specs requested: 1. Indexes in catalog before build: 1. Indexes in catalog after build: 1
8.修复后重启mongod服务
[mongo@centos7 data]$ mongod --dbpath /opt/mongo/data/single --port 50001 --oplogSize 512 --fork --bind_ip 0.0.0.0 --logpath /opt/mongo/logs/single.log --logappend --journal --directoryperdb --profile=1
about to fork child process, waiting until server is ready for connections.
forked process: 102975
child process started successfully, parent exiting
[mongo@centos7 data]$
9.mongod服务启动后,服务接受正常的数据查询,但是修复后,发生数据文件损坏的集合数据已经丢失
[mongo@centos7 data]$ mongo --port 50001
MongoDB shell version v4.2.3
connecting to: mongodb://127.0.0.1:50001/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("d88894c4-16bf-4013-a993-d29e2493fbdf") }
MongoDB server version: 4.2.3
Server has startup warnings:
> show dbs;
admin 0.000GB
config 0.000GB
dns_testdb 0.000GB
local 0.000GB
> use dns_testdb;
switched to db dns_testdb
> db.test_collection.find();
>
10.总结
mongodb数据库修复命令repair,在无备份且发生数据文件损坏的情况下,会导致损坏数据文件相关集合数据全部丢
失,但是修复后不妨碍mongod服务的正常启动。结合修改过程的日志,不难看出,repair对损坏的数据文件及相关集合
的索引文件进行了重建,重建后的数据文件和集合文件被重新初始化,因此数据丢失。所以,使用mongodb数据库,最
好合理配合使用mongodb的副本集做数据冗余安全策略,在使用mongodb副本集的同时还可以做个延迟同步节点防止
误操作。