(一).Hbase基本介绍
1.hbase是建立的hdfs之上,提供高可靠性、高性能、列存储、可伸缩、实时读写的数据库系统
2.hbase特点:
HBase中的存储一切皆是字节
HBase的RowKey会按照字节顺序排序,并且添加索引
HBase会按照row数量自动切割成Region,保持负载均衡与冗余
3.hbase存储结构:
RowKey:是Byte array,是表中每条记录的“主键”,方便快速查找,Rowkey的设计非常重要;
Column Family:列族,拥有一个名称(string),包含一个或者多个相关列;同一列族下的列具有相同的属性
Column:属于某一个columnfamily,familyName:columnName,每条记录可动态添加;
Cell:其中timestamp是时间戳,value是rowkey对应列的值
hbase(main):009:0> scan 'User'
ROW COLUMN+CELL
id001 column=personInfo:name, timestamp=1502368030841, value=xiaoming
id001 column=personInfo:age, timestamp=1502368069926, value=18
id001 column=personInfo:sex, timestamp=1502368093636, value=man
(二).Hbase常用命令
1.进入shell: hbase shell
[hadoop@indb-3-136-hzifc bin]$ echo $HBASE_HOME
/data/program/hbase
[hadoop@indb-3-136-hzifc bin]$ /data/program/hbase/bin/hbase shell
2.查看所有表: list
hbase(main):003:0> list
T
TABLE
S
SYSTEM.CATALOG
S
SYSTEM.FUNCTION
S
SYSTEM.SEQUENCE
S
SYSTEM.STATS
T
TEST.USER
U
User
6 row(s) in 0.0340 seconds
3.查看某个表详情: describe
hbase(main):004:0> describe 'User'
T
Table User is ENABLED
U
User
C
COLUMN FAMILIES DESCRIPTION
{
{NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE',
DATA_BLOCK_ENCODING => 'NONE', TTL => 'FORE
V
VER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE =>
'0'}
1 row(s) in 0.1410 seconds
4.创建表: create
语法:create <table>, {NAME => <family>, VERSIONS => <VERSIONS>}
创建一个User表,可以一个或多个info列族
hbase(main):002:0> create 'User','info1'
0 row(s) in 1.5890 seconds
5.删除指定的列族: delete
语法: alter 表名,'delete' =>'列族'
hbase(main):002:0> alter 'User', 'delete' => 'info'
U
Updating all regions with the new schema...
1/1 regions updated.
D
Done.
0 row(s) in 2.5340 seconds
6.插入数据: put
语法:put <table>,<rowkey>,<family:column>,<value>
hbase(main):005:0> put 'User', 'row1', 'info:name', 'xiaoming'
0 row(s) in 0.1200 seconds
hbase(main):006:0> put 'User', 'row2', 'info:age', '18'
0 row(s) in 0.0170 seconds
hbase(main):007:0> put 'User', 'row3', 'info:sex', 'man'
0 row(s) in 0.0030 seconds
7.根据rowKey查询某个记录: get
语法:get <table>,<rowkey>,[<family:column>,....]
hbase(main):008:0> get 'User', 'row2'
COLUMN CELL
info:age timestamp=1502368069926, value=18
1 row(s) in 0.0280 seconds
hbase(main):028:0> get 'User', 'row3', 'info:sex'
COLUMN CELL
info:sex timestamp=1502368093636, value=man
hbase(main):036:0> get 'User', 'row1', {COLUMN => 'info:name'}
COLUMN CELL
info:name timestamp=1502368030841, value=xiaoming
1 row(s) in 0.0120 seconds
8.查询所有记录: scan
语法:scan <table>, {COLUMNS => [ <family:column>,.... ], LIMIT => num}
扫描所记录
hbase(main):009:0> scan 'User'
ROW COLUMN+CELL
row1 column=info:name, timestamp=1502368030841, value=xiaoming
row2 column=info:age, timestamp=1502368069926, value=18
row3 column=info:sex, timestamp=1502368093636, value=man
3 row(s) in 0.0380 seconds
扫描前2条
hbase(main):037:0> scan 'User', {LIMIT => 2}
R
ROW COLUMN+CELL
row1 column=info:name, timestamp=1502368030841, value=xiaoming
row2 column=info:age, timestamp=1502368069926, value=18
2 row(s) in 0.0170 seconds
范围查询
hbase(main):011:0> scan 'User', {STARTROW => 'row2'}
R
ROW COLUMN+CELL
row2 column=info:age, timestamp=1502368069926, value=18
row3 column=info:sex, timestamp=1502368093636, value=man
2 row(s) in 0.0170 seconds
hbase(main):012:0> scan 'User', {STARTROW => 'row2', ENDROW => 'row2'}
R
ROW COLUMN+CELL
row2 column=info:age, timestamp=1502368069926, value=18
1 row(s) in 0.0110 seconds
hbase(main):013:0> scan 'User', {STARTROW => 'row2', ENDROW => 'row3'}
R
ROW COLUMN+CELL
row2 column=info:age, timestamp=1502368069926, value=18
1 row(s) in 0.0120 seconds
另外,还可以添加TIMERANGE和FITLER等高级功能
STARTROW,ENDROW必须大写,否则报错;查询结果不包含等于ENDROW的结果集
9.统计表记录数: count
语法:count <table>, {INTERVAL => intervalNum, CACHE => cacheNum}
INTERVAL设置多少行显示一次及对应的rowkey,默认1000;CACHE每次去取的缓存区大小,默认是10,调整该参数可提高查询速度
hbase(main):020:0> count 'User'
3 row(s) in 0.0360 seconds
10.删除: delete
删除列
hbase(main):008:0> delete 'User', 'row1', 'info:age'
0 row(s) in 0.0290 seconds
删除所行
hbase(main):014:0> deleteall 'User', 'row2'
0 row(s) in 0.0090 seconds
清空表中所有数据
hbase(main):016:0> truncate 'User'
T
Truncating 'User' table (it may take a while):
- Disabling table...
- Truncating table...
0 row(s) in 3.6610 seconds
11.查看表是否存在: exists
hbase(main):022:0> exists 'User'
T
Table User does exist
0 row(s) in 0.0150 seconds
12.禁用表: disable
hbase(main):014:0> disable 'User'
0 row(s) in 2.2660 seconds
13.启用表: enable
hbase(main):017:0> enable 'User'
0 row(s) in 1.3470 seconds
14.删除表: drop
删除前,必须先disable
hbase(main):031:0> disable 'TEST.USER'
0 row(s) in 2.2640 seconds
hbase(main):033:0> drop 'TEST.USER'
0 row(s) in 1.2490 seconds
(三).scala操作hbase的api
import org.apache.hadoop.hbase.{HTableDescriptor,HColumnDescriptor,HBaseConfiguration,TableName}
import org.apache.hadoop.hbase.client.{ConnectionFactory,Put,Get,Delete,Scan}
import org.apache.hadoop.hbase.util.Bytes
import scala.collection.JavaConversions._
import java.util
val conf=HBaseConfiguration.create()
//Connection 的创建是个重量级的工作,线程安全,是操作hbase的入口
val conn=ConnectionFactory.createConnection(conf)
//从Connection获得 Admin 对象(相当于以前的 HAdmin)
val admin=conn.getAdmin
//本例将操作的表名
val userTable=TableName.valueOf("user_score_table")
val cf1="scoreInfo"
val cf2="addressInfo"
val cn1="math"
val cn2="physics"
val cn3="Addr"
if(admin.tableExists(userTable)){
println("Table exists!")
//admin.disableTable(userTable)
//admin.deleteTable(userTable)
//exit()
}else{
val tableDesc=new HTableDescriptor(userTable)
tableDesc.addFamily(new HColumnDescriptor("scoreInfo".getBytes))
tableDesc.addFamily(new HColumnDescriptor("addressInfo".getBytes))
admin.createTable(tableDesc)
println("Create table success!")
}
//插入一条rowkey 为 IromMan 的数据
val p=new Put("IromMan".getBytes())
//为put操作指定 column 和 value (以前的 put.add 方法被弃用了)
p.addColumn(cf1.getBytes,cn1.getBytes,"98".getBytes) // scoreInfo:math 98
p.addColumn(cf1.getBytes,cn2.getBytes,"87".getBytes) // scoreInfo:physics 87
p.addColumn(cf2.getBytes,cn3.getBytes,"Beijing".getBytes) // addressInfo
table.put(p)
//按rowkey查询数据
val listGet=new util.ArrayList[Get]
val get=new Get(Bytes.toBytes("id002_Thor"))
val get2=new Get(Bytes.toBytes("id003_jack"))
listGet.add(get)
listGet.add(get2)
val resultArr=myTable.get(listGet).flatMap(z=>{
val cellArr=z.rawCells()
val valueArr=cellArr.map(n=>(Bytes.toString(z.getRow()),(Bytes.toString(CellUtil.cloneQualifier(n)),Bytes.toString(CellUtil.cloneValue(n)))))
valueArr
})
userTable.close()
conn.close()