hive学习(八)------分桶

分桶的目的实现把原始数据存放到多个文件中，便于抽样查询。
用途：数据抽样查询。
数据格式：
1,tom1,11
2,tom2,22
3,tom3,33
4,tom4,44
5,tom5,55
6,tom6,66
7,tom7,77
8,tom8,88
9,tom9,99

实现过程：

//创建原始数据表
create table bucket
(
id int,
name string,
num int
)
row format delimited
fields terminated by ',';
//开启分桶
set hive.enforce.bucketing=true;
//创建分桶表
create table bucket1
(
id int,
name string,
num int
)
clustered by (num) into 4 buckets
row format delimited
fields terminated by ',';
//导入数据
insert into table bucket1
select id,name,num from bucket;

select * from bucket1 tablesample(bucket x out of y);

解析：
select * from bucket1 tablesample(bucket x out of y);
x:表示从哪个桶开始取数据
y：表示桶的倍数或者因子
如有32个桶，
y为因子时：
x=2,y=4表示从2号桶开始选取数据，选取32/4=8个桶的数据，分别为2，6，10，14，18，22，26，30
x=1,y=8表示从1号桶开始选取数据，选取32/8=4个桶的数据，分别为1，9，17，25
y为倍数时：
x=2,y=256表示从2号桶开始选取数据，选取32/256=1/8的数据，即选取2号桶的1/8的数据。随机选取其中的1/8数据。

结果展示：
在这里插入图片描述

BigDate_小学生

发布了19 篇原创文章 · 获赞 1 · 访问量 320

私信关注

hive学习(八)------分桶

猜你喜欢