7.2.3. The GROUP BY and HAVING Clauses

7.2.3. The GROUP BY and HAVING Clauses

7.2.3.GROUP BY和HAVING子句

After passing the WHERE filter, the derived input table might be subject to grouping, using the GROUP BY clause, and elimination of group rows using the HAVING clause.

WHERE筛选后，派生的输入表可能需要使用GROUP BY子句进行分组，并使用HAVING子句消除组行。

SELECT select_list

FROM ...

[WHERE ...]

GROUP BY grouping_column_reference

[, grouping_column_reference]...

The GROUP BY Clause is used to group together those rows in a table that have the same values in all the columns listed. The order in which the columns are listed does not matter. The effect is to combine each set of rows having common values into one group row that represents all rows in the group. This is done to eliminate redundancy in the output and/or compute aggregates that apply to these groups. For instance:

GROUP BY子句用于将表中所有列出的列中具有相同值的行组合在一起。列的列出顺序无关紧要。效果是将具有公共值的每组行组合到一个代表组中所有行的行组中。这样做是为了消除输出和/或计算应用于这些组的聚合中的冗余。例如：

=> SELECT * FROM test1;

x | y

---+---

a | 3

c | 2

b | 5

a | 1

(4 rows)

=> SELECT x FROM test1 GROUP BY x;

---

(3 rows)

In the second query, we could not have written SELECT * FROM test1 GROUP BY x, because there is no single value for the column y that could be associated with each group. The grouped-by columns can be referenced in the select list since they have a single value in each group.

在第二个查询中，不能写作SELECT * FROM test1 GROUP BY x，因为列y没有与每个组关联的单个值。分组列可以在选择列表中引用，因为它们在每个组中都有一个值。

In general, if a table is grouped, columns that are not listed in GROUP BY cannot be referenced except in aggregate expressions. An example with aggregate expressions is:

一般，如果一个表分组了，那在GROUP BY子句中没有列出的表列不能被选择，除非是将列放到聚合表达式中。一个包含聚合表达式的示例：

=> SELECT x, sum(y) FROM test1 GROUP BY x;

x | sum

---+-----

a | 4

b | 5

c | 2

(3 rows)

Here sum is an aggregate function that computes a single value over the entire group. More information about the available aggregate functions can be found in Section 9.20.

此处的sum是一个计算组中值的和的聚合函数。有关可用聚合函数的更多信息，参见9.20节。

Tip

小贴士

Grouping without aggregate expressions effectively calculates the set of distinct values in a column. This can also be achieved using the DISTINCT clause (see Section 7.3.3).

没有聚合表达式的分组有效地列出了一组列中不同的值。也可以使用DISTINCT子句来实现（请参见第7.3.3节）。

Here is another example: it calculates the total sales for each product (rather than the total sales of all products):

又一个示例：该示例计算了每个产品的各自总销售额（而不是所有产品的总销售额）：

SELECT product_id, p.name, (sum(s.units) * p.price) AS sales

FROM products p LEFT JOIN sales s USING (product_id)

GROUP BY product_id, p.name, p.price;

In this example, the columns product_id, p.name, and p.price must be in the GROUP BY clause since they are referenced in the query select list (but see below). The column s.units does not have to be in the GROUP BY list since it is only used in an aggregate expression (sum(...)),which represents the sales of a product. For each product, the query returns a summary row about all sales of the product.

示例中，因为列product_id，p.name和p.price出现在了选择列表中，所以他们必须也在GROUP BY子句中出现（但是，继续看）。列s.units却不需要必须出现在GROUP BY列表中，因为它只是在表示产品销售的聚合函数（sum(...)）中引用。对于每种产品，查询将返回有关该产品所有销售额的摘要行。

If the products table is set up so that, say, product_id is the primary key, then it would be enough to group by product_id in the above example, since name and price would be functionally dependent on the product ID, and so there would be no ambiguity about which name and price value to return for each product ID group.

如果products表将product_id设置为主键，那么在上例中，只是用product_id分组是足够的，因为名称和价格依赖于产品ID，因此，对于每个产品ID组返回哪个名称和价格值都不会有歧义。

In strict SQL, GROUP BY can only group by columns of the source table but PostgreSQL extends this to also allow GROUP BY to group by columns in the select list. Grouping by value expressions instead of simple column names is also allowed.

在严格的SQL标准中，GROUP BY只能按源表的列进行分组，但是PostgreSQL对此进行了扩展，以允许GROUP BY可以按选择列表中的列进行分组。也允许按值表达式而不是简单的列名分组。

If a table has been grouped using GROUP BY, but only certain groups are of interest, the HAVING clause can be used, much like a WHERE clause, to eliminate groups from the result. The syntax is:

如果已使用GROUP BY对表进行了分组，但只对某些组感兴趣，则可以使用HAVING子句（与WHERE子句非常相似）从结果中筛选组。语法为：

SELECT select_list FROM ... [WHERE ...] GROUP BY ...

HAVING boolean_expression

Expressions in the HAVING clause can refer both to grouped expressions and to ungrouped expressions (which necessarily involve an aggregate function).

HAVING子句中的表达式既可以引用已分组的表达式，也可以引用未分组的表达式（它们必然涉及聚合函数）。

Example:

例如：

=> SELECT x, sum(y) FROM test1 GROUP BY x HAVING sum(y) > 3;

x | sum

---+-----

a | 4

b | 5

(2 rows)

=> SELECT x, sum(y) FROM test1 GROUP BY x HAVING x < 'c';

x | sum

---+-----

a | 4

b | 5

(2 rows)

Again, a more realistic example:

然后，更实用的示例：

SELECT product_id, p.name, (sum(s.units) * (p.price - p.cost)) AS

profit

FROM products p LEFT JOIN sales s USING (product_id)

WHERE s.date > CURRENT_DATE - INTERVAL '4 weeks'

GROUP BY product_id, p.name, p.price, p.cost

HAVING sum(p.price * s.units) > 5000;

In the example above, the WHERE clause is selecting rows by a column that is not grouped (the expression is only true for sales during the last four weeks), while the HAVING clause restricts the output to groups with total gross sales over 5000. Note that the aggregate expressions do not necessarily need to be the same in all parts of the query.

上例中，WHERE子句按未分组的列选择行（该表达式仅适用于最近四个星期内的销售额），而HAVING子句将输出限制为销售总额超过5000的组。请注意，聚合表达式可以不必在查询的所有部分都相同。

If a query contains aggregate function calls, but no GROUP BY clause, grouping still occurs: the result is a single group row (or perhaps no rows at all, if the single row is then eliminated by HAVING).The same is true if it contains a HAVING clause, even without any aggregate function calls or GROUP BY clause.

如果查询中包含聚合函数调用，但没有GROUP BY子句，则仍会进行分组：结果是单个行（或者如果使用HAVING筛选掉了单个行，则结果就不会返回行）。如果包含HAVING子句，即使没有任何聚合函数调用或GROUP BY子句也是如此。

丹心明月博客专家

发布了341 篇原创文章 · 获赞 54 · 访问量 88万+

他的留言板关注

7.2.3. The GROUP BY and HAVING Clauses

猜你喜欢