hive 分桶表

昨天转载了朋友的博客里面有分桶表的描述，今天自己好好的练习了，再整理记录一下：
首先是修改了一个文件在我这的路径是：

/opt/software/hive-1.2.1/conf/hive-site.xml

在这其中添加一些配置：

		<property>
                <name>hive.exec.dynamic.partition</name>
                <value>true</value>
        </property>
        
        <property>
                <name>hive.exec.dynamic.partition.mode</name>
                <value>nostrict</value>
        </property>
        
        <property>
                <name>hive.enforce.bucketing</name>
                <value>true</value>
        </property>

在我的数据库中本就有一个表，且表中有数据，字段及数据如下
字段： id int ，name string

hive> select * from a;
OK
1	"张三"	10
2	"李四"	10
3	"张闫"	10
4	"彬彬"	10
5	"肉肉"	10

建立分桶表

hive> create table b1(id int,name string)
    > cluster
    > clustered by (id) into 3 bucket
    > clustered by (id) into 3 buckets
    > row format delimited
    > fields terminated by ' ';
OK
Time taken: 0.075 seconds

然后向分桶表中添加数据

insert into table b1 select id,name from a;

查看分桶表中数据

hive> select * from b1;
OK
1	"张三"
2	"李四"
3	"张闫"
4	"彬彬"
5	"肉肉"

按照分区分别分别看数据

hive> select * from b1 tablesample  (bucket 1 out of 3 on id);
OK
3	"张闫"
Time taken: 0.699 seconds, Fetched: 1 row(s)
hive> select * from b1 tablesample  (bucket 2 out of 3 on id);
OK
1	"张三"
4	"彬彬"
Time taken: 0.064 seconds, Fetched: 2 row(s)
hive> select * from b1 tablesample  (bucket 3 out of 3 on id);
OK
2	"李四"
5	"肉肉"
Time taken: 0.06 seconds, Fetched: 2 row(s)

第一个桶中放的是id模3得0得数据
id %3 =0
第二个桶中放的是id模3得1得数据
id %3 =1
第三个桶中放的是id模3得2得数据
id %3 =2

									整理如上；
									如有不足请多多指出；

猜你喜欢