英文:
SQL running total group by id and limited by a condition (over window)
问题
我在一个SQL表中有以下数据。在[SQL fiddle][1]中可以进行操作。
| id | pointsEarned | createdAt |
|----------------|--------------|--------------------------|
| 234-00000206-0 | 500 | 2023-05-03T09:05:05.034Z |
| 234-00000206-0 | 1000 | 2023-05-12T09:05:05.034Z |
| 234-00000206-0 | 800 | 2023-05-15T09:05:05.034Z |
| 234-00000206-0 | 300 | 2023-05-21T09:05:05.034Z |
| 234-00000206-0 | 1100 | 2023-05-28T09:05:05.034Z |
数据的创建方式如下。
CREATE TABLE activities (
id varchar(14),
pointsEarned int,
createdAt varchar(24)
);
INSERT INTO activities (id, pointsEarned, createdAt)
VALUES ('234-00000206-0', 500, '2023-05-03T09:05:05.034Z');
INSERT INTO activities (id, pointsEarned, createdAt)
VALUES ('234-00000206-0', 1000, '2023-05-12T09:05:05.034Z');
INSERT INTO activities (id, pointsEarned, createdAt)
VALUES ('234-00000206-0', 800, '2023-05-15T09:05:05.034Z');
INSERT INTO activities (id, pointsEarned, createdAt)
VALUES ('234-00000206-0', 300, '2023-05-21T09:05:05.034Z');
INSERT INTO activities (id, pointsEarned, createdAt)
VALUES ('234-00000206-0', 1100, '2023-05-28T09:05:05.034Z');
从这些数据中,我要找出那些获得了超过1,000分的ID,以及发生这种情况的日期。在样本数据中,这应该是在第二行的5月12日。
我尝试了这个SQL语句,但结果当然是5月28日。
SELECT
id,
SUM(pointsEarned) as points,
MAX(createdAt) as lastActivity
FROM
activities
GROUP BY
id
HAVING
points > 1000;
然后我考虑使用窗口函数,并尝试了这个,但出现了一个错误,说语句是错误的。您有没有想法如何修复错误(请参见上面的SQL fiddle链接),以及如何将查询限制为1,000分?
SELECT
id,
SUM(pointsEarned) OVER(ORDER BY createdAt) points
FROM activities;
PS:SQL fiddle使用的是MySQL。最后,我需要用于Google BigQuery的语句。
[1]: http://sqlfiddle.com/#!9/28cc4f/13
英文:
I'm having the following data in an SQL table. See SQL fiddle for playing around.
id | pointsEarned | createdAt |
---|---|---|
234-00000206-0 | 500 | 2023-05-03T09:05:05.034Z |
234-00000206-0 | 1000 | 2023-05-12T09:05:05.034Z |
234-00000206-0 | 800 | 2023-05-15T09:05:05.034Z |
234-00000206-0 | 300 | 2023-05-21T09:05:05.034Z |
234-00000206-0 | 1100 | 2023-05-28T09:05:05.034Z |
The data is created as follows.
CREATE TABLE activities (
id varchar(14),
pointsEarned int,
createdAt varchar(24)
);
INSERT INTO activities (id, pointsEarned, createdAt)
VALUES ('234-00000206-0', 500, '2023-05-03T09:05:05.034Z');
INSERT INTO activities (id, pointsEarned, createdAt)
VALUES ('234-00000206-0', 1000, '2023-05-12T09:05:05.034Z');
INSERT INTO activities (id, pointsEarned, createdAt)
VALUES ('234-00000206-0', 800, '2023-05-15T09:05:05.034Z');
INSERT INTO activities (id, pointsEarned, createdAt)
VALUES ('234-00000206-0', 300, '2023-05-21T09:05:05.034Z');
INSERT INTO activities (id, pointsEarned, createdAt)
VALUES ('234-00000206-0', 1100, '2023-05-28T09:05:05.034Z');
From that data I'm looking for those IDs that have passed 1.000 points and at what date that has happened. In the sample data this would have been May 12th in the second line already.
I tried this SQL statement, but here the result is of course May, 28th.
SELECT
id,
SUM(pointsEarned) as points,
MAX(createdAt) as lastActivity
FROM
activities
GROUP BY
id
HAVING
points > 1000;
Then I was thinking about using window functions and tried this one, but getting an error that the statement is wrong. Any idea how to fix the error (see SQL fiddle link above) and how to limit that query to 1.000 points?
SELECT
id,
SUM(pointsEarned) OVER(ORDER BY createdAt) points
FROM activities;
PS: the SQL fiddle uses MySQL. Finally, I need the statement for Google BigQuery though.
答案1
得分: 3
Your code doesn't work in MySQL 5.6 because in that version of MySQL window functions were not yet a feature available. Furthermore, you shouldn't test on MySQL if you need a Google Bigquery solution, because they're two deeply different products with different tools.
Given these premises, in Google Bigquery you have availability of the QUALIFY
clause, that allows you to filter over the result of window functions. You can use qualify twice to check:
- when the running sum of points is higher than 1000
- when the date corresponds to the first date of your filtered running sum
by partitioning on "id" (assuming you can have multiple ids), and ordering on your date.
WITH cte AS (
SELECT id,
SUM(pointsEarned) OVER(PARTITION BY id ORDER BY createdAt) AS runningPoints,
createdAt
FROM activities
QUALIFY runningPoints > 1000
)
SELECT id, runningPoints, createdAt
FROM cte
QUALIFY createdAt = MIN(createdAt) OVER(PARTITION BY id ORDER BY runningPoints)
This answer was tested on Bigquery and gave the following output:
id | runningPoints | createdAt |
---|---|---|
234-00000206-0 | 1500 | 2023-05-12T09:05:05.034Z |
英文:
Your code doesn't work in MySQL 5.6 because in that version of MySQL window functions were not yet a feature available. Furthermore, you shouldn't test on MySQL if you need a Google Bigquery solution, because they're two deeply different products with different tools.
Given these premises, in Google Bigquery you have availability of the QUALIFY
clause, that allows you to filter over the result of window functions. You can use qualify twice to check:
- when the running sum of points is higher than 1000
- when the date corresponds to the first date of your filtered running sum
by partitioning on "id" (assuming you can have multiple ids), and ordering on your date.
WITH cte AS (
SELECT id,
SUM(pointsEarned) OVER(PARTITION BY id ORDER BY createdAt) AS runningPoints,
createdAt
FROM activities
QUALIFY runningPoints > 1000
)
SELECT id, runningPoints, createdAt
FROM cte
QUALIFY createdAt = MIN(createdAt) OVER(PARTITION BY id ORDER BY runningPoints)
This answer was tested on Bigquery and gave the following output:
id | runningPoints | createdAt |
---|---|---|
234-00000206-0 | 1500 | 2023-05-12T09:05:05.034Z |
答案2
得分: 2
以下是翻译好的内容:
这可以通过首先使用窗口函数 sum()
与 order by
子句来获取累积总和,然后使用 row number()
为每一行分配一个按生成的累积总和排序的ID来实现,然后仅选择 rn = 1 的行:
with cte as (
SELECT *, SUM(pointsEarned) over (partition by id order by createdAt) points
FROM activities
),
cte2 as (
select *, row_number() over (partition by id order by points) as rn
from cte
where points > 1000
)
select id, createdAt
from cte2
where rn = 1
要获取特定 ID 的数据:
with cte as (
SELECT *, SUM(pointsEarned) over (order by createdAt) points
FROM activities
where id = '234-00000206-0'
),
cte2 as (
select *, row_number() over (order by points) as rn
from cte
where points > 1000
)
select id, createdAt
from cte2
where rn = 1
英文:
This can be accomplished by first using the window function sum()
with an order by
clause to obtain the running total, followed by row number()
to assign an id to each row ordered by the generated running total, then select only the row with rn = 1 :
with cte as (
SELECT *, SUM(pointsEarned) over (partition by id order by createdAt) points
FROM activities
),
cte2 as (
select *, row_number() over (partition by id order by points) as rn
from cte
where points > 1000
)
select id, createdAt
from cte2
where rn = 1
To get data of a specific id then :
with cte as (
SELECT *, SUM(pointsEarned) over (order by createdAt) points
FROM activities
where id = '234-00000206-0'
),
cte2 as (
select *, row_number() over (order by points) as rn
from cte
where points > 1000
)
select id, createdAt
from cte2
where rn = 1
答案3
得分: 2
在BigQuery中,您可以通过一次查询和一个窗口函数调用来实现这一点:
select a.*,
sum(pointsEarned) over(partition by id order by createdAt) as runningPoints
from activities a
qualify runningPoints >= 1000 and runningPoints - pointsEarned < 1000
qualify
子句标识了累积总和达到阈值的行。
注意:只要您的pointsEarned
没有负值,就可以正常工作,就像您的数据中所示(否则查询会为每次再次达到阈值的情况返回一行)。
英文:
You can express this in a single pass, with a single window function call. In BigQuery:
select a.*,
sum(pointsEarned) over(partition by id order by createdAt) as runningPoints,
from activities a
qualify runningPoints >= 1000 and runningPoints - pointsEarned < 1000
The qualify
clause identifies the row where the running sum reaches the threshold.
Note: this works as long as you don't have negative pointsEarned
, as shown in your data (otherwise the query would bring one row for each occurence where the threshold is reached again).
答案4
得分: 1
你自己已经回答了这个问题,你的SQL是有效的(也许你在你的示例中使用了旧版本的MYSQL)
这是你的查询:
SELECT
id,
SUM(pointsEarned) OVER(ORDER BY createdAt) points
FROM activities;
结果如下:
结果(MySQL v8.0)
id | pointsEarned | createdAt | points |
---|---|---|---|
234-00000206-0 | 500 | 2023-05-03T09:05:05.034Z | 500 |
234-00000206-0 | 1000 | 2023-05-12T09:05:05.034Z | 1500 |
234-00000206-0 | 800 | 2023-05-15T09:05:05.034Z | 2300 |
234-00000206-0 | 300 | 2023-05-21T09:05:05.034Z | 2600 |
234-00000206-0 | 1100 | 2023-05-28T09:05:05.034Z | 3700 |
现在你只需要选择总分大于1000的第一行,如下所示:
SELECT sum_t.id, sum_t.createdAt
FROM (
SELECT *, SUM(pointsEarned) OVER(ORDER BY createdAt) points
FROM activities
) as sum_t
WHERE sum_t.points > 1000
LIMIT 1
这将导致以下结果:
结果(MySQL v8.0)
id | createdAt |
---|---|
234-00000206-0 | 2023-05-12T09:05:05.034Z |
英文:
You kind of answered this yourself, your SQL was valid (maybe you used an old version of MYSQL in your fiddle)
SELECT
id,
SUM(pointsEarned) OVER(ORDER BY createdAt) points
FROM activities;
which yields
Result (MySQL v8.0)
id | pointsEarned | createdAt | points |
---|---|---|---|
234-00000206-0 | 500 | 2023-05-03T09:05:05.034Z | 500 |
234-00000206-0 | 1000 | 2023-05-12T09:05:05.034Z | 1500 |
234-00000206-0 | 800 | 2023-05-15T09:05:05.034Z | 2300 |
234-00000206-0 | 300 | 2023-05-21T09:05:05.034Z | 2600 |
234-00000206-0 | 1100 | 2023-05-28T09:05:05.034Z | 3700 |
now you only need to select the first row having > 1000 total points like this
SELECT sum_t.id, sum_t.createdAt
FROM (
SELECT *, SUM(pointsEarned) OVER(ORDER BY createdAt) points
FROM activities
) as sum_t
WHERE sum_t.points > 1000
LIMIT 1
which leads to the following result
Result (MySQL v8.0)
id | createdAt |
---|---|
234-00000206-0 | 2023-05-12T09:05:05.034Z |
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论