mysql 中的 window function

1.window function中的over()

The OVER() clause allows you to pass an aggregate function down a data set, similar to subqueries in SELECT. The OVER() clause offers significant benefits over subqueries in select -- namely, your queries will run faster, and the OVER() clause has a wide range of additional functions and clauses you can include with it that we will cover later on in this chapter.

SELECT 
    # Select the id, country name, season, home, and away goals
    m.id, 
    c.name AS country, 
    m.season,
    m.home_goal,
    m.away_goal,
    # Use a window to include the aggregate average in each row
    avg(m.home_goal +m.away_goal ) over() AS overall_avg
FROM match AS m
LEFT JOIN country AS c ON m.country_id = c.id;

2. window function 中的 rank用法

Window functions allow you to create a RANK of information according to any variable you want to use to sort your data. When setting this up, you will need to specify what column/calculation you want to use to calculate your rank. This is done by including an ORDER BY clause inside the OVER() clause.

SELECT 
    id,
    RANK() OVER(ORDER BY home_goal) AS rank
FROM match;

EG
In this exercise, you will create a data set of ranked matches according to which leagues, on average, score the most goals in a match.

SELECT 
    #Select the league name and average goals scored
    l.name AS league,
    AVG(m.home_goal + m.away_goal) AS avg_goals,
    # Rank each league according to the average goals
    RANK() OVER(ORDER BY AVG(m.home_goal + m.away_goal)) AS league_rank
FROM league AS l
LEFT JOIN match AS m 
ON l.id = m.country_id
WHERE m.season = '2011/2012'
GROUP BY l.name
# Order the query by the rank you created
ORDER BY league_rank;

来自datacamp

EG:
In the last exercise, the rank generated in your query was organized from smallest to largest. By adding DESC to your window function, you can create a rank sorted from largest to smallest.

SELECT 
    # Select the league name and average goals scored
    l.name AS league,
    avg(m.home_goal + m.away_goal) AS avg_goals,
    # Rank leagues in descending order by average goals
    rank ()over(order by avg(m.home_goal + m.away_goal) desc) AS league_rank
FROM league AS l
LEFT JOIN match AS m 
ON l.id = m.country_id
WHERE m.season = '2011/2012'
GROUP BY l.name
# Order the query by the rank you created
order by league_rank;

3.window function中的 over 与partition by用法

partition by: calculate separate values for different categories
calculate different calculations in the same column

AVG(home_goal) OVER (PARTITION BY season)

3.1 partition by 一列

datacamp练习:
In this exercise, you will be creating a data set of games played by Legia Warszawa (Warsaw League), the top ranked team in Poland, and comparing their individual game performance to the overall average for that season.

Where do you see the more outliers? Are they Legia Warszawa's home or away games?

SELECT
    date,
    season,
    home_goal,
    away_goal,
    CASE WHEN hometeam_id = 8673 THEN 'home' 
         ELSE 'away' END AS warsaw_location,
    #Calculate the average goals scored partitioned by season
    avg(home_goal) over(PARTITION BY season) AS season_homeavg,
    avg(away_goal) over(PARTITION BY season) AS season_awayavg
FROM match
# Filter the data set for Legia Warszawa matches only
WHERE 
    hometeam_id = 8673 
    OR awayteam_id = 8673
ORDER BY (home_goal + away_goal) DESC;

3.2 partition by 多列

The PARTITION BY clause can be used to break out window averages by multiple data points (columns). You can even calculate the information you want to use to partition your data! For example, you can calculate average goals scored by season and by country, or by the calendar year (taken from the date column).

In this exercise, you will calculate the average number home and away goals scored Legia Warszawa, and their opponents, partitioned by the month in each season.

SELECT 
    date,
    season,
    home_goal,
    away_goal,
    CASE WHEN hometeam_id = 8673 THEN 'home' 
         ELSE 'away' END AS warsaw_location,
    #Calculate average goals partitioned by season and month
    avg(home_goal) over(partition by season, 
            EXTRACT(month FROM date)) AS season_mo_home,
    avg(away_goal) over(partition by season, 
            EXTRACT(month FROM date)) AS season_mo_away
FROM match
WHERE 
    hometeam_id = 8673 
    OR awayteam_id = 8673
ORDER BY (home_goal + away_goal) DESC;

4.sliding windows

  • perform calculations relative to the current row
  • can be used to calculate running totals, sums, averages
  • can be partition by one or more columns
ROW BETWEEN <start>  AND <finish>

转载于:https://www.jianshu.com/p/70f8c2c512b0

猜你喜欢

转载自blog.csdn.net/weixin_33847182/article/details/91116570