HBase常用工具

Background

hbase集群是基于CDH搭建的，版本2.1.0+cdh6.2.0

1、HBCK

hbck是HBase最基本运维工具。
作用：检查集群上region的一致性。根据检查结果使用相应的命令进行修复。
注意：部分命令在HBase version 2.0+已经不支持了。

在这里插入图片描述
使用示例：

# 查看表cloudansys:gps的region状态
hbase hbck 'cloudansys:gps'

2、HFile

作用：检查当前某个具体的HFile的内容/元数据。当业务上发现某个region无法读取，在regionserver上由于文件问题无法打开region或者读取某个文件出现异常时，可用此工具单独来检查HFile是否有问题。

usage: HFile [-a] [-b] [-e] [-f <arg> | -r <arg>] [-h] [-i] [-k] [-m] [-p]
       [-s] [-v] [-w <arg>]
 -a,--checkfamily         Enable family check
 -b,--printblocks         Print block index meta data
 -e,--printkey            Print keys
 -f,--file <arg>          File to scan. Pass full-path; e.g.
                          hdfs://a:9000/hbase/hbase:meta/12/34
 -h,--printblockheaders   Print block headers for each block.
 -i,--checkMobIntegrity   Print all cells whose mob files are missing
 -k,--checkrow            Enable row order check; looks for out-of-order
                          keys
 -m,--printmeta           Print meta data of file
 -p,--printkv             Print key/value pairs
 -r,--region <arg>        Region to scan. Pass region name; e.g.
                          'hbase:meta,,1'
 -s,--stats               Print statistics
 -v,--verbose             Verbose output; emits file and meta data
                          delimiters
 -w,--seekToRow <arg>     Seek to this row and print all the kvs for this
                          row only

使用示例：

# 查看namespace下表gps的其中一个HFile的详情，打印KV
hbase org.apache.hadoop.hbase.io.hfile.HFile -v -m -p -f /hbase/data/cloudansys/gps/3c382ab68883d6b345eb879b7d4df918/info/4c24ff2c8c584b49980f6b99c7d3c6a8

3、RowCounter和CellCounter

作用：HBase表行数统计工具，其中CellCounter也可以收集和表相关的更细节的统计数据，包括：表的行数、列族数、qualifier数以及对应出现的次数等。两个工具都可以指定row的起止位置和timestamp来进行范围查询。比hbase shell中count效率高。

使用示例：

# 统计表cloudansys:gps有多少行
hbase(main):031:0> count 'cloudansys:gps', INTERVAL => 2000, CACHE => 500

# 统计表cloudansys:gps有多少行
hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'cloudansys:gps'

# 统计表cloudansys:gps有多少行，并将结果写入HDFS的/tmp/gps.cell目录
hbase org.apache.hadoop.hbase.mapreduce.CellCounter 'cloudansys:gps' /tmp/gps.cell

4、OfflineMetaRepair

作用：离线修复HBase的元数据。
hbck工具是HBase的在线修复工具，如果HBase没有启动是无法使用的。OfflineMetaRepair是在离线状态修复HBase元数据。

Usage: OfflineMetaRepair [opts]
 where [opts] are:
   -details               Display full report of all regions.
   -base <hdfs://>        Base Hbase Data directory.
   -sidelineDir <hdfs://> HDFS path to backup existing meta and root.
   -fix                   Auto fix as many problems as possible.
   -fixHoles              Auto fix as region holes.

使用示例：

# 重新建立hbase的元数据
hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair

5、Export

作用：用来将表中的内容转储到HDFS上面的序列文件，可指定时间戳。

Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> [<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]

  Note: -D properties will be applied to the conf used.
  For example:
   -D mapreduce.output.fileoutputformat.compress=true
   -D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
   -D mapreduce.output.fileoutputformat.compress.type=BLOCK
  Additionally, the following SCAN properties can be specified
  to control/limit what is exported..
   -D hbase.mapreduce.scan.column.family=<family1>,<family2>, ...
   -D hbase.mapreduce.include.deleted.rows=true
   -D hbase.mapreduce.scan.row.start=<ROWSTART>
   -D hbase.mapreduce.scan.row.stop=<ROWSTOP>
   -D hbase.client.scanner.caching=100
   -D hbase.export.visibility.labels=<labels>
For tables with very wide rows consider setting the batch size as below:
   -D hbase.export.scanner.batch=10
   -D hbase.export.scanner.caching=100
   -D mapreduce.job.name=jobName - use the specified mapreduce job name for the export
For MR performance consider the following properties:
   -D mapreduce.map.speculative=false
   -D mapreduce.reduce.speculative=false

使用示例：

# 把表cloudansys:gps的部分数据导出到HDFS上/tmp/hbase/export/cloudansys/gps目录下
hbase org.apache.hadoop.hbase.mapreduce.Export 'cloudansys:gps' /tmp/hbase/export/cloudansys/gps 1 1598849899000 1598947200000

6、Import

作用：用来将之前被 Export 的数据载入HBase中。

Usage: Import [options] <tablename> <inputdir>
By default Import will load data directly into HBase. To instead generate
HFiles of data to prepare for a bulk data load, pass the option:
  -Dimport.bulk.output=/path/for/output
If there is a large result that includes too much Cell whitch can occur OOME caused by the memery sort in reducer, pass the option:
  -Dimport.bulk.hasLargeResult=true
 To apply a generic org.apache.hadoop.hbase.filter.Filter to the input, use
  -Dimport.filter.class=<name of filter class>
  -Dimport.filter.args=<comma separated list of args for filter
 NOTE: The filter will be applied BEFORE doing key renames via the HBASE_IMPORTER_RENAME_CFS property. Futher, filters will only use the Filter#filterRowKey(byte[] buffer, int offset, int length) method to identify  whether the current row needs to be ignored completely for processing and  Filter#filterCell(Cell) method to determine if the Cell should be added; Filter.ReturnCode#INCLUDE and #INCLUDE_AND_NEXT_COL will be considered as including the Cell.
To import data exported from HBase 0.94, use
  -Dhbase.import.version=0.94
  -D mapreduce.job.name=jobName - use the specified mapreduce job name for the import
For performance consider the following options:
  -Dmapreduce.map.speculative=false
  -Dmapreduce.reduce.speculative=false
  -Dimport.wal.durability=<Used while writing data to hbase. Allowed values are the supported durability values like SKIP_WAL/ASYNC_WAL/SYNC_WAL/...>

使用示例：

# 把HDFS上/tmp/hbase/export/cloudansys/gps目录下的数据导入到表cloudansys:gpsimport中
hbase org.apache.hadoop.hbase.mapreduce.Import 'cloudansys:gpsimport' /tmp/hbase/export/cloudansys/gps

Background

1、HBCK

2、HFile

3、RowCounter和CellCounter

4、OfflineMetaRepair

5、Export

6、Import

猜你喜欢