Background
hbase集群是基于CDH搭建的,版本2.1.0+cdh6.2.0
1、HBCK
hbck是HBase最基本运维工具。
作用:检查集群上region的一致性。根据检查结果使用相应的命令进行修复。
注意:部分命令在HBase version 2.0+已经不支持了。
使用示例:
# 查看表cloudansys:gps的region状态
hbase hbck 'cloudansys:gps'
2、HFile
作用:检查当前某个具体的HFile的内容/元数据。当业务上发现某个region无法读取,在regionserver上由于文件问题无法打开region或者读取某个文件出现异常时,可用此工具单独来检查HFile是否有问题。
usage: HFile [-a] [-b] [-e] [-f <arg> | -r <arg>] [-h] [-i] [-k] [-m] [-p]
[-s] [-v] [-w <arg>]
-a,--checkfamily Enable family check
-b,--printblocks Print block index meta data
-e,--printkey Print keys
-f,--file <arg> File to scan. Pass full-path; e.g.
hdfs://a:9000/hbase/hbase:meta/12/34
-h,--printblockheaders Print block headers for each block.
-i,--checkMobIntegrity Print all cells whose mob files are missing
-k,--checkrow Enable row order check; looks for out-of-order
keys
-m,--printmeta Print meta data of file
-p,--printkv Print key/value pairs
-r,--region <arg> Region to scan. Pass region name; e.g.
'hbase:meta,,1'
-s,--stats Print statistics
-v,--verbose Verbose output; emits file and meta data
delimiters
-w,--seekToRow <arg> Seek to this row and print all the kvs for this
row only
使用示例:
# 查看namespace下表gps的其中一个HFile的详情,打印KV
hbase org.apache.hadoop.hbase.io.hfile.HFile -v -m -p -f /hbase/data/cloudansys/gps/3c382ab68883d6b345eb879b7d4df918/info/4c24ff2c8c584b49980f6b99c7d3c6a8
3、RowCounter和CellCounter
作用:HBase表行数统计工具,其中CellCounter也可以收集和表相关的更细节的统计数据,包括:表的行数、列族数、qualifier数以及对应出现的次数等。两个工具都可以指定row的起止位置和timestamp来进行范围查询。比hbase shell中count效率高。
使用示例:
# 统计表cloudansys:gps有多少行
hbase(main):031:0> count 'cloudansys:gps', INTERVAL => 2000, CACHE => 500
# 统计表cloudansys:gps有多少行
hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'cloudansys:gps'
# 统计表cloudansys:gps有多少行,并将结果写入HDFS的/tmp/gps.cell目录
hbase org.apache.hadoop.hbase.mapreduce.CellCounter 'cloudansys:gps' /tmp/gps.cell
4、OfflineMetaRepair
作用:离线修复HBase的元数据。
hbck工具是HBase的在线修复工具,如果HBase没有启动是无法使用的。OfflineMetaRepair是在离线状态修复HBase元数据。
Usage: OfflineMetaRepair [opts]
where [opts] are:
-details Display full report of all regions.
-base <hdfs://> Base Hbase Data directory.
-sidelineDir <hdfs://> HDFS path to backup existing meta and root.
-fix Auto fix as many problems as possible.
-fixHoles Auto fix as region holes.
使用示例:
# 重新建立hbase的元数据
hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair
5、Export
作用:用来将表中的内容转储到HDFS上面的序列文件,可指定时间戳。
Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> [<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]
Note: -D properties will be applied to the conf used.
For example:
-D mapreduce.output.fileoutputformat.compress=true
-D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
-D mapreduce.output.fileoutputformat.compress.type=BLOCK
Additionally, the following SCAN properties can be specified
to control/limit what is exported..
-D hbase.mapreduce.scan.column.family=<family1>,<family2>, ...
-D hbase.mapreduce.include.deleted.rows=true
-D hbase.mapreduce.scan.row.start=<ROWSTART>
-D hbase.mapreduce.scan.row.stop=<ROWSTOP>
-D hbase.client.scanner.caching=100
-D hbase.export.visibility.labels=<labels>
For tables with very wide rows consider setting the batch size as below:
-D hbase.export.scanner.batch=10
-D hbase.export.scanner.caching=100
-D mapreduce.job.name=jobName - use the specified mapreduce job name for the export
For MR performance consider the following properties:
-D mapreduce.map.speculative=false
-D mapreduce.reduce.speculative=false
使用示例:
# 把表cloudansys:gps的部分数据导出到HDFS上/tmp/hbase/export/cloudansys/gps目录下
hbase org.apache.hadoop.hbase.mapreduce.Export 'cloudansys:gps' /tmp/hbase/export/cloudansys/gps 1 1598849899000 1598947200000
6、Import
作用:用来将之前被 Export 的数据载入HBase中。
Usage: Import [options] <tablename> <inputdir>
By default Import will load data directly into HBase. To instead generate
HFiles of data to prepare for a bulk data load, pass the option:
-Dimport.bulk.output=/path/for/output
If there is a large result that includes too much Cell whitch can occur OOME caused by the memery sort in reducer, pass the option:
-Dimport.bulk.hasLargeResult=true
To apply a generic org.apache.hadoop.hbase.filter.Filter to the input, use
-Dimport.filter.class=<name of filter class>
-Dimport.filter.args=<comma separated list of args for filter
NOTE: The filter will be applied BEFORE doing key renames via the HBASE_IMPORTER_RENAME_CFS property. Futher, filters will only use the Filter#filterRowKey(byte[] buffer, int offset, int length) method to identify whether the current row needs to be ignored completely for processing and Filter#filterCell(Cell) method to determine if the Cell should be added; Filter.ReturnCode#INCLUDE and #INCLUDE_AND_NEXT_COL will be considered as including the Cell.
To import data exported from HBase 0.94, use
-Dhbase.import.version=0.94
-D mapreduce.job.name=jobName - use the specified mapreduce job name for the import
For performance consider the following options:
-Dmapreduce.map.speculative=false
-Dmapreduce.reduce.speculative=false
-Dimport.wal.durability=<Used while writing data to hbase. Allowed values are the supported durability values like SKIP_WAL/ASYNC_WAL/SYNC_WAL/...>
使用示例:
# 把HDFS上/tmp/hbase/export/cloudansys/gps目录下的数据导入到表cloudansys:gpsimport中
hbase org.apache.hadoop.hbase.mapreduce.Import 'cloudansys:gpsimport' /tmp/hbase/export/cloudansys/gps