Table: Activity
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| player_id | int |
| device_id | int |
| event_date | date |
| games_played | int |
+--------------+---------+
(player_id,event_date)是此表的主键。
这张表显示了某些游戏的玩家的活动情况。
每一行是一个玩家的记录,他在某一天使用某个设备注销之前登录并玩了很多游戏(可能是 0)。
编写一个 SQL 查询,报告在首次登录的第二天再次登录的玩家的分数,四舍五入到小数点后两位。换句话说,您需要计算从首次登录日期开始至少连续两天登录的玩家的数量,然后除以玩家总数。
查询结果格式如下所示:
Activity table:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-03-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
+-----------+-----------+------------+--------------+
Result table:
+-----------+
| fraction |
+-----------+
| 0.33 |
+-----------+
只有 ID 为 1 的玩家在第一天登录后才重新登录,所以答案是 1/3 = 0.33
题目来源:力扣(LeetCode)
链接:https://leetcode-cn.com/problems/game-play-analysis-iv
审题:计算首次玩后第二天也登陆的用户数量。然后除以总人数。就是从首次登陆之后连续两天登陆的人数。
思考:首先查出首次登陆日期,然后再给这个日期加一,判断这个日期有没有登陆。
解题:
解题1:
计算每人登陆的时间,使用group by:
(
SELECT player_id, MIN(event_date) AS `date`
FROM Activity
GROUP BY player_id
) AS C
表activity与表C连接,查询首次登陆的行:
SELECT *
FROM Activity AS A
JOIN (
SELECT player_id, MIN(event_date) AS `date`
FROM Activity
GROUP BY player_id
) AS C ON (A.player_id = C.player_id AND A.event_date = C.DATE)
select * from Activity AS A JOIN(
select id,min(date) as 'date' from activity group by id
) AS C on (A.id = C.id and A.date = C.date)
接着再left join表activity ,排除掉首次登录后第二天没有登录的行。这里用left join,并投影处字段A.player_id和B.player_id。部分B.player_id 为NULL, 表示首次登录后第二天没有登录: 查询第二天有登陆的:
SELECT DISTINCT A.player_id AS id1,B.player_id AS id2
FROM Activity AS A
JOIN (
SELECT player_id, MIN(event_date) AS `date`
FROM Activity
GROUP BY player_id
) AS C ON (A.player_id = C.player_id AND A.event_date = C.DATE)
LEFT JOIN Activity AS B ON (A.player_id = B.player_id AND DATEDIFF(B.event_date,A.event_date)=1)
##DATEDIFF() 函数返回两个日期之间的天数。
##distinct 去重
基于这样的结果,计算比例:
SELECT ROUND(SUM(IF(id2 IS NOT NULL,1,0)) / COUNT(DISTINCT id1),2) AS `fraction`
FROM (
SELECT DISTINCT A.player_id AS id1,B.player_id AS id2
FROM Activity AS A
JOIN (
SELECT player_id, MIN(event_date) AS `date`
FROM Activity
GROUP BY player_id
) AS C ON (A.player_id = C.player_id AND A.event_date = C.DATE)
LEFT JOIN Activity AS B ON (A.player_id = B.player_id AND DATEDIFF(B.event_date,A.event_date)=1)
) AS D
解法2:
先求出首次登录后第二天又登录的人数。首次登录日期:
SELECT MIN(event_date) FROM Activity WHERE player_id = xxxx
表activity自连接,排除掉日期不是首次登录日的行。并统计人数:
SELECT COUNT(DISTINCT A.player_id)
FROM Activity AS A
JOIN Activity AS B ON A.player_id = B.player_id AND DATEDIFF(B.event_date,A.event_date)=1
WHERE A.event_date = (
SELECT MIN(event_date)
FROM Activity
WHERE player_id = A.player_id
)
总人数:
SELECT COUNT(DISTINCT player_id) FROM Activity
两者相除:
SELECT ROUND(
(
SELECT COUNT(DISTINCT A.player_id)
FROM Activity AS A
JOIN Activity AS B ON A.player_id = B.player_id AND DATEDIFF(B.event_date,A.event_date)=1
WHERE A.event_date = (
SELECT MIN(event_date)
FROM Activity
WHERE player_id = A.player_id
)
)
/
(
SELECT COUNT(DISTINCT player_id)
FROM Activity
)
,
2
) AS `fraction`
知识点:
distinct:查询的时候去重。
DATEDIFF() :函数返回两个日期之间的天数。
left join:左连接LEFT JOIN的含义就是求两个表的交集外加左表剩下的数据。
ROUND() :函数用于把数值字段舍入为指定的小数位数。
count(*) 它返回检索行的数目, 不论其是否包含 NULL值。
COUNT(DISTINCT 字段),返回不同的非NULL值数目;若找不到匹配的项,则COUNT(DISTINCT)返回 0 。
SUM()
函数用于计算一组值或表达式的总和,
IS NOT NULL:不为null