英文:
MySQL join and group based on date ranges
问题
I have table A
uid dt val_A
10 04/09/2012 34
10 08/09/2012 35
10 10/09/2012 36
100 04/09/2012 40
100 08/09/2012 41
and table B
uid date val_B
10 04/09/2012 1
10 05/09/2012 1
10 06/09/2012 2
10 07/09/2012 2
10 08/09/2012 1
100 07/09/2012 1
100 07/09/2012 3
I want to join them to get table C
. I want to join them on uid
. Furthermore I want to have a new column val_C
which holds the average of val_B
where date in B
is greater or equal than the corresponding row-value dt
in A
AND less than the next higher dt
value for this uid
in table A
. It means I want to aggregate the values in B
based on date ranges defined in A
. The joined table should look like this:
uid dt val_A val_C
10 04/09/2012 34 1.5
10 08/09/2012 35 1
10 10/09/2012 36 0
100 04/09/2012 40 2
100 08/09/2012 41 0
How can this be achieved?
//EDIT
How could a more generalized solution look like where all dates in B2
which are greater than the greatest date in A
are being joined & aggregated to the greatest date in A
. B2
:
uid date val_B
10 04/09/2012 1
10 05/09/2012 1
10 06/09/2012 2
10 07/09/2012 2
10 08/09/2012 1
100 07/09/2012 1
100 07/09/2012 3
100 10/09/2012 4
100 11/09/2012 2
Desired output C2
:
uid dt val_A val_C
10 04/09/2012 34 1.5
10 08/09/2012 35 1
10 10/09/2012 36 0
100 04/09/2012 40 2
100 08/09/2012 41 3
英文:
I have table A
uid dt val_A
10 04/09/2012 34
10 08/09/2012 35
10 10/09/2012 36
100 04/09/2012 40
100 08/09/2012 41
and table B
uid date val_B
10 04/09/2012 1
10 05/09/2012 1
10 06/09/2012 2
10 07/09/2012 2
10 08/09/2012 1
100 07/09/2012 1
100 07/09/2012 3
I want to join them to get table C
. I want to join them on uid
. Furthermore I want to have a new column val_C
which holds the average of val_B
where date in B
is greater or equal than the corresponding row-value dt
in A
AND less than the next higher dt
value for this uid
in table A
. It means I want to aggregate the values in B
based on date ranges defined in A
. The joined table should look like this:
uid dt val_A val_C
10 04/09/2012 34 1.5
10 08/09/2012 35 1
10 10/09/2012 36 0
100 04/09/2012 40 2
100 08/09/2012 41 0
How can this be achieved?
//EDIT
How could a more generalized solution look like where all dates in B2
which are greater than the greatest date in A
are being joined & aggregated to the greatest date in A
. B2
:
uid date val_B
10 04/09/2012 1
10 05/09/2012 1
10 06/09/2012 2
10 07/09/2012 2
10 08/09/2012 1
100 07/09/2012 1
100 07/09/2012 3
100 10/09/2012 4
100 11/09/2012 2
Desired output C2
:
uid dt val_A val_C
10 04/09/2012 34 1.5
10 08/09/2012 35 1
10 10/09/2012 36 0
100 04/09/2012 40 2
100 08/09/2012 41 3
答案1
得分: 2
如果您使用支持LEAD()
函数的MySQL v8+,可以尝试这样做:
WITH cte AS (
SELECT uid, dt, val_A,
IFNULL(LEAD(dt) OVER (PARTITION BY uid ORDER BY uid, dt),dt) dtRg
FROM tableA)
SELECT cte.uid, cte.dt, cte.val_A,
AVG(val_B) AS val_C
FROM cte
LEFT JOIN tableB tb1
ON cte.uid=tb1.uid
AND tb1.dt >= cte.dt
AND tb1.dt < cte.dtRg
GROUP BY cte.uid, cte.dt, cte.val_A
通用表达式(cte
)中的查询:
SELECT uid, dt, val_A,
IFNULL(LEAD(dt) OVER (PARTITION BY uid ORDER BY uid, dt),dt) dtRg
FROM tableA
将为您生成如下结果:
如您所见,dtRg
列是使用LEAD()
函数生成的,该函数根据ORDER BY
获取下一行的dt
值。在这里了解更多关于LEAD()的信息。
之后,将cte
与tableB
连接,匹配uid
,并且tableB.dt
与现有的tableA.dt
相同或更大,即现在作为cte.dt
,但低于cte.dtRg
,即由LEAD()
生成的tableA
中的下一个日期。最后添加AVG(val_B) AS val_C
。
在较旧的MySQL版本上,您可以尝试这样做:
SELECT tA.uid, tA.dt, tA.val_A,
AVG(val_B) AS val_C
FROM
(SELECT uid, dt, val_A,
(SELECT dt FROM tableA ta1
WHERE ta1.uid=ta2.uid
AND ta1.dt > ta2.dt LIMIT 1) AS dtRg
FROM tableA ta2) tA
LEFT JOIN tableB tB
ON tA.uid=tB.uid
AND tB.dt >= tA.dt
AND tB.dt < tA.dtRg
GROUP BY tA.uid, tA.dt, tA.val_A;
区别如下:
- 不使用
LEAD()
,而是在SELECT
中使用相关子查询来获取相同uid
中下一行的dt
值。 - 不使用通用表达式,而是使用派生表。
英文:
If you're on MySQL v8+ that supports LEAD()
function, then you can try this:
WITH cte AS (
SELECT uid, dt, val_A,
IFNULL(LEAD(dt) OVER (PARTITION BY uid ORDER BY uid, dt),dt) dtRg
FROM tableA)
SELECT cte.uid, cte.dt, cte.val_A,
AVG(val_B) AS val_C
FROM cte
LEFT JOIN tableB tb1
ON cte.uid=tb1.uid
AND tb1.dt >= cte.dt
AND tb1.dt < cte.dtRg
GROUP BY cte.uid, cte.dt, cte.val_A
The query in common table expression (cte
):
SELECT uid, dt, val_A,
IFNULL(LEAD(dt) OVER (PARTITION BY uid ORDER BY uid, dt),dt) dtRg
FROM tableA
will give you a result like this:
As you can see, the dtRg
column is generated using LEAD()
function which takes the next row dt
value according to the ORDER BY
. Read more about LEAD() here.
After that, join the cte
with tableB
on matching uid
and where tableB.dt
is the same or bigger than the existing tableA.dt
- which is now as cte.dt
, but lower than cte.dtRg
- which is the next date in tableA
that was generated by LEAD()
. And finally adding AVG(val_B) AS val_C
On older MySQL version, you can try this:
SELECT tA.uid, tA.dt, tA.val_A,
AVG(val_B) AS val_C
FROM
(SELECT uid, dt, val_A,
(SELECT dt FROM tableA ta1
WHERE ta1.uid=ta2.uid
AND ta1.dt > ta2.dt LIMIT 1) AS dtRg
FROM tableA ta2) tA
LEFT JOIN tableB tB
ON tA.uid=tB.uid
AND tB.dt >= tA.dt
AND tB.dt < tA.dtRg
GROUP BY tA.uid, tA.dt, tA.val_A;
The difference are as following:
- Instead of using
LEAD()
, it uses correlated subquery inSELECT
to get the nextdt
value of next row in the sameuid
. - Instead of common table expression, it uses a derived table.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论