HiveQL DQL4—UNION

概述

  当我们想将同一schema的数据组合在一起时,会经常使用set 操作。关系数据库中的常规set操作是INTERSECT、MINUS和UNION/UNION ALL。HQL只支持UNION 和 UNION ALL,二者的区别是UNION ALL不删除重复行,而UNION 会删除重复行。Hive在1.2.0之前只支持UNION ALL。此外,所有的union数据必须有相同的名称和数据类型,否则会执行隐式转换,这可能会导致运行时异常。如果使用了ORDER BY、SORT BY、CLUSTER BY、DISTRIBUTE BY或LIMIT,则它们作用于union后的整个结果集。
union的语法格式如下

select_statement UNION [ALL | DISTINCT] select_statement UNION [ALL | DISTINCT] select_statement ...

  可以在同一个查询中混合使用UNION ALL和UNION DISTINCT。这种形式的处理方式是,一个DISTINCT union会覆盖其左侧的所有ALL union。一个DISTINCT union可以由UNION DISTINCT显式生成,也可以由不带DISTINCT 或 ALL 关键字的UNION隐式生成。

示例

UNION ALL

> SELECT a.name as nm FROM employee a
UNION ALL -- Use column alias to make the same name for union
SELECT b.name as nm FROM employee_hr b;
+----------+
|  _u1.nm  |
+----------+
| Michael  |
| Will     |
| Shelley  |
| Lucy     |
| Michael  |
| Will     |
| Steven   |
| Lucy     |
+----------+

UNION

> SELECT a.name as nm FROM employee a
UNION -- UNION removes duplicated names and slower
SELECT b.name as nm FROM employee_hr b;
+----------+
|  _u1.nm  |
+----------+
| Lucy     |
| Michael  |
| Shelley  |
| Steven   |
| Will     |
+----------+

Order with UNION

> SELECT a.name as nm FROM employee a
UNION ALL
SELECT b.name as nm FROM employee_hr b
ORDER BY nm;
+----------+
|  _u2.nm  |
+----------+
| Lucy     |
| Lucy     |
| Michael  |
| Michael  |
| Shelley  |
| Steven   |
| Will     |
| Will     |
+----------+

对于其他的set操作如INTERCEPT 和 MINUS,hive并不支持,但可以使用join 或 left join实现

使用 join 实现 intercept

> SELECT a.name 
FROM employee a
JOIN employee_hr b
ON a.name = b.name;
+----------+
|  a.name  |
+----------+
| Michael  |
| Will     |
| Lucy     |
+----------+

使用 left join 实现 minus

> SELECT a.name 
FROM employee a
LEFT JOIN employee_hr b
ON a.name = b.name
WHERE b.name IS NULL;
+----------+
|  a.name  |
+----------+
| Shelley  |
+----------+

参考

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Union
书籍 Apache Hive Essentials Second Edition (by Dayong Du) Chapter 4

发布了57 篇原创文章 · 获赞 3 · 访问量 1万+

猜你喜欢

转载自blog.csdn.net/CPP_MAYIBO/article/details/104078463