利用HBase shell设计一个hbase表的初衷
ps、hbase zkcli 和 hbase shell的使用
一、 hbase shell的使用
1.1 hbase 可使用的命令参数如下:
localhost:bin a6$ hbase Usage: hbase [<options>] <command> [<args>] Options: --config DIR Configuration direction to use. Default: ./conf --hosts HOSTS Override the list in 'regionservers' file --auth-as-server Authenticate to ZooKeeper using servers configuration Commands: Some commands take arguments. Pass no args or -h for usage. shell Run the HBase shell hbck Run the hbase 'fsck' tool snapshot Create a new snapshot of a table snapshotinfo Tool for dumping snapshot information wal Write-ahead-log analyzer hfile Store file analyzer zkcli Run the ZooKeeper shell upgrade Upgrade hbase master Run an HBase HMaster node regionserver Run an HBase HRegionServer node zookeeper Run a Zookeeper server rest Run an HBase REST server thrift Run the HBase Thrift server thrift2 Run the HBase Thrift2 server clean Run the HBase clean up script classpath Dump hbase CLASSPATH mapredcp Dump CLASSPATH entries required by mapreduce pe Run PerformanceEvaluation ltt Run LoadTestTool version Print the version CLASSNAME Run the class named CLASSNAME
1.2 创建表:
hbase(main):010:0> create 'test1', 'lf', 'sf' 0 row(s) in 1.3020 seconds => Hbase::Table - test1
其中:
lf: column family of LONG values (binary value)sf: column family of STRING values
1.3 导入数据
put 'test1', 'user1|ts1', 'sf:c1', 'sku1' put 'test1', 'user1|ts2', 'sf:c1', 'sku188' put 'test1', 'user1|ts3', 'sf:s1', 'sku123' put 'test1', 'user2|ts4', 'sf:c1', 'sku2' put 'test1', 'user2|ts5', 'sf:c2', 'sku288' put 'test1', 'user2|ts6', 'sf:s1', 'sku222'一个用户(userX),在什么时间(tsX),作为rowkey;
对什么产品(value:skuXXX),做了什么操作作为列名,比如,
c1: click from homepage; c2: click from ad;
s1: search from homepage; b1: buy
1.4 查询案例
1.4.1 谁的值=sku188
hbase(main):002:0> scan 'test1', FILTER=>"ValueFilter(=,'binary:sku188')" ROW COLUMN+CELL user1|ts2 column=sf:c1, timestamp=1527579481653, value=sku188 1 row(s) in 0.4180 seconds
1.4.2 谁的值包含88
hbase(main):003:0> scan 'test1', FILTER=>"ValueFilter(=,'substring:88')" ROW COLUMN+CELL user1|ts2 column=sf:c1, timestamp=1527579481653, value=sku188 user2|ts5 column=sf:c2, timestamp=1527579481866, value=sku288 2 row(s) in 0.0390 seconds
1.4.3 通过广告点击进来的(column为c2)值包含88的用户
hbase(main):004:0> scan 'test1', FILTER=>"ColumnPrefixFilter('c2') AND ValueFilter(=,'substring:88')" ROW COLUMN+CELL user2|ts5 column=sf:c2, timestamp=1527579481866, value=sku288 1 row(s) in 0.0450 seconds
1.4.4 通过搜索进来的(column为s)值包含123或者222的用户
hbase(main):005:0> scan 'test1', FILTER=>"ColumnPrefixFilter('s') AND ( ValueFilter(=,'substring:123') OR ValueFilter(=,'substring:222') )" ROW COLUMN+CELL user1|ts3 column=sf:s1, timestamp=1527579481758, value=sku123 user2|ts6 column=sf:s1, timestamp=1527579482849, value=sku222 2 row(s) in 0.0210 seconds
1.4.5 rowkey为user1开头的
hbase(main):006:0> scan 'test1', FILTER => "PrefixFilter ('user1')" ROW COLUMN+CELL user1|ts1 column=sf:c1, timestamp=1527579473815, value=sku1 user1|ts2 column=sf:c1, timestamp=1527579481653, value=sku188 user1|ts3 column=sf:s1, timestamp=1527579481758, value=sku123 3 row(s) in 0.0260 seconds
2.
FirstKeyOnlyFilter: 一个rowkey可以有多个version,同一个rowkey的同一个column也会有多个的值, 只拿出key中的第一个column的第一个version
2.1 KeyOnlyFilter: 只要key,不要value
hbase(main):007:0> scan 'test1', FILTER=>"FirstKeyOnlyFilter() AND ValueFilter(=,'binary:sku188') AND KeyOnlyFilter()" ROW COLUMN+CELL user1|ts2 column=sf:c1, timestamp=1527579481653, value= 1 row(s) in 0.0700 second
2.2 从user1|ts2开始,找到所有的rowkey以user1开头的
hbase(main):008:0> scan 'test1', {STARTROW=>'user1|ts2', FILTER => "PrefixFilter ('user1')"} ROW COLUMN+CELL user1|ts2 column=sf:c1, timestamp=1527579481653, value=sku188 user1|ts3 column=sf:s1, timestamp=1527579481758, value=sku123 2 row(s) in 0.0280 seconds
2.3 从user1|ts2开始,找到所有的到rowkey以user2开头
hbase(main):009:0> scan 'test1', {STARTROW=>'user1|ts2', STOPROW=>'user2'} ROW COLUMN+CELL user1|ts2 column=sf:c1, timestamp=1527579481653, value=sku188 user1|ts3 column=sf:s1, timestamp=1527579481758, value=sku123 2 row(s) in 0.0220 seconds
2.4 查询rowkey里面包含ts3的
import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.filter.RowFilter scan 'test1', {FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('ts3'))} ROW COLUMN+CELL user1|ts3 column=sf:s1, timestamp=1409122354954, value=sku123
2.5 查询rowkey里面包含ts的
import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.filter.RowFilter scan 'test1', {FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('ts'))} ROW COLUMN+CELL user1|ts1 column=sf:c1, timestamp=1409122354868, value=sku1 user1|ts2 column=sf:c1, timestamp=1409122354918, value=sku188 user1|ts3 column=sf:s1, timestamp=1409122354954, value=sku123 user2|ts4 column=sf:c1, timestamp=1409122354998, value=sku2 user2|ts5 column=sf:c2, timestamp=1409122355030, value=sku288 user2|ts6 column=sf:s1, timestamp=1409122355970, value=sku222
2.6 在查询中使用正则匹配
加入一条测试数据 put 'test1', 'user2|err', 'sf:s1', 'sku999' 查询rowkey里面以user开头的,新加入的测试数据并不符合正则表达式的规则,故查询不出来 import org.apache.hadoop.hbase.filter.RegexStringComparator import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.filter.RowFilter scan 'test1', {FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'),RegexStringComparator.new('^user\d+\|ts\d+$'))} ROW COLUMN+CELL user1|ts1 column=sf:c1, timestamp=1409122354868, value=sku1 user1|ts2 column=sf:c1, timestamp=1409122354918, value=sku188 user1|ts3 column=sf:s1, timestamp=1409122354954, value=sku123 user2|ts4 column=sf:c1, timestamp=1409122354998, value=sku2 user2|ts5 column=sf:c2, timestamp=1409122355030, value=sku288 user2|ts6 column=sf:s1, timestamp=1409122355970, value=sku222 加入测试数据 put 'test1', 'user1|ts9', 'sf:b1', 'sku1' b1开头的列中并且值为sku1的 scan 'test1', FILTER=>"ColumnPrefixFilter('b1') AND ValueFilter(=,'binary:sku1')" ROW COLUMN+CELL user1|ts9 column=sf:b1, timestamp=1409124908668, value=sku1 SingleColumnValueFilter的使用,b1开头的列中并且值为sku1的 import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator scan 'test1', {COLUMNS => 'sf:b1', FILTER => SingleColumnValueFilter.new(Bytes.toBytes('sf'), Bytes.toBytes('b1'), CompareFilter::CompareOp.valueOf('EQUAL'), Bytes.toBytes('sku1'))} ROW COLUMN+CELL user1|ts9 column=sf:b1, timestamp=1409124908668, value=sku1
二、hbase zkcli 的使用
1.1 启动并查看:
localhost:bin a6$ hbase zkcli SLF4J: Class path contains multiple SLF4J bindings. …… …… …… 2018-05-29 16:14:21,221 INFO [main] zookeeper.ZooKeeper: Client environment:java.compiler=<NA> 2018-05-29 16:14:21,221 INFO [main] zookeeper.ZooKeeper: Client environment:os.name=Mac OS X 2018-05-29 16:14:21,221 INFO [main] zookeeper.ZooKeeper: Client environment:os.arch=x86_64 2018-05-29 16:14:21,221 INFO [main] zookeeper.ZooKeeper: Client environment:os.version=10.13.2 2018-05-29 16:14:21,221 INFO [main] zookeeper.ZooKeeper: Client environment:user.name=a6 2018-05-29 16:14:21,221 INFO [main] zookeeper.ZooKeeper: Client environment:user.home=/Users/a6 2018-05-29 16:14:21,221 INFO [main] zookeeper.ZooKeeper: Client environment:user.dir=/Users/a6/Applications/hbase-1.2.6/bin 2018-05-29 16:14:21,223 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2182 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@588df31b Welcome to ZooKeeper! 2018-05-29 16:14:21,257 INFO [main-SendThread(localhost:2182)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2182. Will not attempt to authenticate using SASL (unknown error) JLine support is enabled 2018-05-29 16:14:21,283 INFO [main-SendThread(localhost:2182)] zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2182, initiating session 2018-05-29 16:14:21,298 INFO [main-SendThread(localhost:2182)] zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2182, sessionid = 0x163aac3f5960008, negotiated timeout = 30000 WATCHER:: WatchedEvent state:SyncConnected type:None path:null [zk: localhost:2182(CONNECTED) 0] ls / [zookeeper, hbase]
1.2 其他命令:
[zk: localhost:2182(CONNECTED) 1] ls /hbase [replication, meta-region-server, rs, splitWAL, backup-masters, table-lock, flush-table-proc, region-in-transition, online-snapshot, master, running, recovering-regions, draining, namespace, hbaseid, table] [zk: localhost:2182(CONNECTED) 2] ls /hbase/table [hbase:meta, hbase:namespace, test, test10, test011, test001, new_emp, emp, test010, test1, test_tb_paysuccess, test8, test9, test6, test7, t1] [zk: localhost:2182(CONNECTED) 3] ls /hbase/table/test1 [] [zk: localhost:2182(CONNECTED) 4] get /hbase/table/test1 �master:61536�'�v*��PBUF cZxid = 0x1925 ctime = Tue May 29 15:37:27 CST 2018 mZxid = 0x192b mtime = Tue May 29 15:37:27 CST 2018 pZxid = 0x1925 cversion = 0 dataVersion = 2 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 31 numChildren = 0 [zk: localhost:2182(CONNECTED) 5]参考: https://blog.csdn.net/vaq37942/article/details/54949428