hive：union和union all的区别

Union：对两个结果集进行并集操作，不包括重复行，同时进行默认规则的排序；

Union All：对两个结果集进行并集操作，包括重复行，不进行排序；

下面我在hive中创建两张表测试一下：

先本地创建数据文件：

/usr/local/hive/hiveTestFile/student.txt

1       zhangsan
2       lisi
3       wangwu

/usr/local/hive/hiveTestFile/student2.txt

1       zhangsan2
2       lisi2
3       wangwu2

本地数据导入hive：

student表：

load data local inpath '/usr/local/hive/hiveTestFile/student2.txt' into table db_hive_edu.student;

student2表：

load data local inpath '/usr/local/hive/hiveTestFile/student2.txt' into table db_hive_edu.student2;

测试1：使用union all

hive>  select * from student union all select * from student2;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1559981808383_0008, Tracking URL = http://sparkproject1:8088/proxy/application_1559981808383_0008/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1559981808383_0008
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2019-06-08 17:45:18,077 Stage-1 map = 0%,  reduce = 0%
2019-06-08 17:45:25,670 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.4 sec
MapReduce Total cumulative CPU time: 2 seconds 400 msec
Ended Job = job_1559981808383_0008
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2   Cumulative CPU: 2.4 sec   HDFS Read: 537 HDFS Write: 67 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 400 msec
OK
1	zhangsan2
2	lisi2
3	wangwu2
4	zhaoliu
1	zhangsan
2	lisi
3	wangwu
Time taken: 18.885 seconds, Fetched: 7 row(s)
hive>

测试2：使用union

hive> 
    > select * from student union select * from student2;         
FAILED: ParseException line 1:28 missing ALL at 'select' near '<EOF>'
hive>

报错了，hive 0.13版本可能还未支持 union

参考：

https://www.cnblogs.com/peizhe123/p/9870770.html

https://blog.csdn.net/u012492511/article/details/21549431

hive：union和union all的区别

测试1：使用union all

猜你喜欢