kettle6.1读写hive on hbase记录

版本

kettle 6.1
hbase 1.2.6
hive 2.2.0
hadoop 2.6.5

数据架构

业务数据通过kafka流向业务数据处理引擎，过滤后的数据写入hbase，kettle job定期执行作业，读取hive运算结果写入业务mysql库中，提供给前端展示。

实现

1、hive on hbase

hive与hbase数据互通，这点利用hive自带的hive-hbase-handler-2.2.0.jar即可。

1.1 、首先拷贝hbase的相关包到$HIVE_HOME/lib下覆盖原先的包（记得备份）

hbase-annotations-1.1.1.jar
hbase-client-1.1.1.jar
hbase-common-1.1.1.jar
hbase-common-1.1.1-tests.jar
hbase-hadoop2-compat-1.1.1.jar
hbase-hadoop2-compat-1.1.1-tests.jar
hbase-hadoop-compat-1.1.1.jar
hbase-prefix-tree-1.1.1.jar
hbase-procedure-1.1.1.jar
hbase-protocol-1.1.1.jar
hbase-server-1.1.1.jar

（经过测试发现hbase1.2.6可以不用覆盖1.1.1的jar，也能做基础CR操作）

1.2、 hbase建立库表

在hbase中根据业务需求建立自己所需要的库表。

create 'h_test',{NAME => 'test1', VERSIONS => 3},{NAME => 'test2', VERSIONS => 3}

put 'h_test','1000','test1:col1','firstvalue'

put 'h_test','1000','test2:col2','secondvalue'

1.3、hive中建立扩展表

启动服务不要忘了

nohup hive –service hiveserver2 &

建立扩展表

create external table hive_test(rowkey string, col1 string,col2 string)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 with serdeproperties("hbase.columns.mapping" = ":key,test1:col1,test2:col2")
 tblproperties("hbase.table.name"="h_test");

1.4、执行select语句测试是否正常

（网上一些教程执行MR时会出现异常，我在运行过程中倒是没有碰到，且hive2.0后不建议再使用hadoop MR，

Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

可以使用hive on spark on yarn，如果spark版本为2.0.0加，则需要使用hive2.3.0否则会报sparkListener not found异常，本例为测试环境，未配置hive on spark）。