昨天转载了朋友的博客 里面有分桶表的描述,今天自己好好的练习了,再整理记录一下:
首先是修改了一个文件 在我这的路径是:
/opt/software/hive-1.2.1/conf/hive-site.xml
在这其中添加一些配置:
<property>
<name>hive.exec.dynamic.partition</name>
<value>true</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nostrict</value>
</property>
<property>
<name>hive.enforce.bucketing</name>
<value>true</value>
</property>
在我的数据库中本就有一个表,且表中有数据,字段及数据如下
字段: id int ,name string
hive> select * from a;
OK
1 "张三" 10
2 "李四" 10
3 "张闫" 10
4 "彬彬" 10
5 "肉肉" 10
建立分桶表
hive> create table b1(id int,name string)
> cluster
> clustered by (id) into 3 bucket
> clustered by (id) into 3 buckets
> row format delimited
> fields terminated by ' ';
OK
Time taken: 0.075 seconds
然后向分桶表中添加数据
insert into table b1 select id,name from a;
查看分桶表中数据
hive> select * from b1;
OK
1 "张三"
2 "李四"
3 "张闫"
4 "彬彬"
5 "肉肉"
按照分区分别分别看数据
hive> select * from b1 tablesample (bucket 1 out of 3 on id);
OK
3 "张闫"
Time taken: 0.699 seconds, Fetched: 1 row(s)
hive> select * from b1 tablesample (bucket 2 out of 3 on id);
OK
1 "张三"
4 "彬彬"
Time taken: 0.064 seconds, Fetched: 2 row(s)
hive> select * from b1 tablesample (bucket 3 out of 3 on id);
OK
2 "李四"
5 "肉肉"
Time taken: 0.06 seconds, Fetched: 2 row(s)
第一个桶中放的是id模3得0得数据
id %3 =0
第二个桶中放的是id模3得1得数据
id %3 =1
第三个桶中放的是id模3得2得数据
id %3 =2
整理如上;
如有不足请多多指出;