英文:
MySQL join and group based on date ranges
问题
I have table A
uid dt val_A
10 04/09/2012 34
10 08/09/2012 35
10 10/09/2012 36
100 04/09/2012 40
100 08/09/2012 41
and table B
uid date val_B
10 04/09/2012 1
10 05/09/2012 1
10 06/09/2012 2
10 07/09/2012 2
10 08/09/2012 1
100 07/09/2012 1
100 07/09/2012 3
I want to join them to get table C. I want to join them on uid. Furthermore I want to have a new column val_C which holds the average of val_B where date in B is greater or equal than the corresponding row-value dt in A AND less than the next higher dt value for this uid in table A. It means I want to aggregate the values in B based on date ranges defined in A. The joined table should look like this:
uid dt val_A val_C
10 04/09/2012 34 1.5
10 08/09/2012 35 1
10 10/09/2012 36 0
100 04/09/2012 40 2
100 08/09/2012 41 0
How can this be achieved?
//EDIT
How could a more generalized solution look like where all dates in B2 which are greater than the greatest date in A are being joined & aggregated to the greatest date in A. B2:
uid date val_B
10 04/09/2012 1
10 05/09/2012 1
10 06/09/2012 2
10 07/09/2012 2
10 08/09/2012 1
100 07/09/2012 1
100 07/09/2012 3
100 10/09/2012 4
100 11/09/2012 2
Desired output C2:
uid dt val_A val_C
10 04/09/2012 34 1.5
10 08/09/2012 35 1
10 10/09/2012 36 0
100 04/09/2012 40 2
100 08/09/2012 41 3
英文:
I have table A
uid dt val_A
10 04/09/2012 34
10 08/09/2012 35
10 10/09/2012 36
100 04/09/2012 40
100 08/09/2012 41
and table B
uid date val_B
10 04/09/2012 1
10 05/09/2012 1
10 06/09/2012 2
10 07/09/2012 2
10 08/09/2012 1
100 07/09/2012 1
100 07/09/2012 3
I want to join them to get table C. I want to join them on uid. Furthermore I want to have a new column val_C which holds the average of val_B where date in B is greater or equal than the corresponding row-value dt in A AND less than the next higher dt value for this uid in table A. It means I want to aggregate the values in B based on date ranges defined in A. The joined table should look like this:
uid dt val_A val_C
10 04/09/2012 34 1.5
10 08/09/2012 35 1
10 10/09/2012 36 0
100 04/09/2012 40 2
100 08/09/2012 41 0
How can this be achieved?
//EDIT
How could a more generalized solution look like where all dates in B2 which are greater than the greatest date in A are being joined & aggregated to the greatest date in A. B2:
uid date val_B
10 04/09/2012 1
10 05/09/2012 1
10 06/09/2012 2
10 07/09/2012 2
10 08/09/2012 1
100 07/09/2012 1
100 07/09/2012 3
100 10/09/2012 4
100 11/09/2012 2
Desired output C2:
uid dt val_A val_C
10 04/09/2012 34 1.5
10 08/09/2012 35 1
10 10/09/2012 36 0
100 04/09/2012 40 2
100 08/09/2012 41 3
答案1
得分: 2
如果您使用支持LEAD()函数的MySQL v8+,可以尝试这样做:
WITH cte AS (
SELECT uid, dt, val_A,
IFNULL(LEAD(dt) OVER (PARTITION BY uid ORDER BY uid, dt),dt) dtRg
FROM tableA)
SELECT cte.uid, cte.dt, cte.val_A,
AVG(val_B) AS val_C
FROM cte
LEFT JOIN tableB tb1
ON cte.uid=tb1.uid
AND tb1.dt >= cte.dt
AND tb1.dt < cte.dtRg
GROUP BY cte.uid, cte.dt, cte.val_A
通用表达式(cte)中的查询:
SELECT uid, dt, val_A,
IFNULL(LEAD(dt) OVER (PARTITION BY uid ORDER BY uid, dt),dt) dtRg
FROM tableA
将为您生成如下结果:
如您所见,dtRg列是使用LEAD()函数生成的,该函数根据ORDER BY获取下一行的dt值。在这里了解更多关于LEAD()的信息。
之后,将cte与tableB连接,匹配uid,并且tableB.dt与现有的tableA.dt相同或更大,即现在作为cte.dt,但低于cte.dtRg,即由LEAD()生成的tableA中的下一个日期。最后添加AVG(val_B) AS val_C。
在较旧的MySQL版本上,您可以尝试这样做:
SELECT tA.uid, tA.dt, tA.val_A,
AVG(val_B) AS val_C
FROM
(SELECT uid, dt, val_A,
(SELECT dt FROM tableA ta1
WHERE ta1.uid=ta2.uid
AND ta1.dt > ta2.dt LIMIT 1) AS dtRg
FROM tableA ta2) tA
LEFT JOIN tableB tB
ON tA.uid=tB.uid
AND tB.dt >= tA.dt
AND tB.dt < tA.dtRg
GROUP BY tA.uid, tA.dt, tA.val_A;
区别如下:
- 不使用
LEAD(),而是在SELECT中使用相关子查询来获取相同uid中下一行的dt值。 - 不使用通用表达式,而是使用派生表。
英文:
If you're on MySQL v8+ that supports LEAD() function, then you can try this:
WITH cte AS (
SELECT uid, dt, val_A,
IFNULL(LEAD(dt) OVER (PARTITION BY uid ORDER BY uid, dt),dt) dtRg
FROM tableA)
SELECT cte.uid, cte.dt, cte.val_A,
AVG(val_B) AS val_C
FROM cte
LEFT JOIN tableB tb1
ON cte.uid=tb1.uid
AND tb1.dt >= cte.dt
AND tb1.dt < cte.dtRg
GROUP BY cte.uid, cte.dt, cte.val_A
The query in common table expression (cte):
SELECT uid, dt, val_A,
IFNULL(LEAD(dt) OVER (PARTITION BY uid ORDER BY uid, dt),dt) dtRg
FROM tableA
will give you a result like this:
As you can see, the dtRg column is generated using LEAD() function which takes the next row dt value according to the ORDER BY. Read more about LEAD() here.
After that, join the cte with tableB on matching uid and where tableB.dt is the same or bigger than the existing tableA.dt - which is now as cte.dt, but lower than cte.dtRg - which is the next date in tableA that was generated by LEAD(). And finally adding AVG(val_B) AS val_C
On older MySQL version, you can try this:
SELECT tA.uid, tA.dt, tA.val_A,
AVG(val_B) AS val_C
FROM
(SELECT uid, dt, val_A,
(SELECT dt FROM tableA ta1
WHERE ta1.uid=ta2.uid
AND ta1.dt > ta2.dt LIMIT 1) AS dtRg
FROM tableA ta2) tA
LEFT JOIN tableB tB
ON tA.uid=tB.uid
AND tB.dt >= tA.dt
AND tB.dt < tA.dtRg
GROUP BY tA.uid, tA.dt, tA.val_A;
The difference are as following:
- Instead of using
LEAD(), it uses correlated subquery inSELECTto get the nextdtvalue of next row in the sameuid. - Instead of common table expression, it uses a derived table.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。



评论