英文:
How to find the number of occurrences of sub-strings in another string in BigQuery with conditionals as a cumulative sum?
问题
所以我的SQL/BigQuery技能仍然需要一些完善,我在这个问题上卡住了一段时间。我有一个标记了一些应用程序计划修改的表ST,其中包含它们的起始日期,这些修改仅适用于具有特定规格的应用程序:
Mod_ID | App_ID | Specs | Start_Date |
---|---|---|---|
P1 | App1 | S1 | 2022-01-24 |
P1 | App2 | S2 | 2022-01-24 |
P2 | App2 | S3 S4 | 2022-03-15 |
P3 | App3 | S4 S5 | 2022-04-10 |
我有另一张表AQ,其中包含一些设备的生产日期,以及它们对应的应用程序和规格的字符串形式:
Device_ID | App_ID | Spec_string | Prod_Date |
---|---|---|---|
ID0001 | App1 | S1 S4 S6 T0 U7 | 2022-02-03 |
ID0002 | App2 | S2 S5 U9 | 2022-02-05 |
ID0003 | App2 | S1 S2 S3 | 2022-03-12 |
ID0004 | App3 | S5 S6 T0 U7 | 2022-04-18 |
我想要计算特定设备的不同规格修改的发生次数,这些修改适用于其App_ID,并且仅适用于AQ.Prod_Date >= ST.Start_Date的情况。
我首先对表ST进行了展开(ST_UNN),最终得到了这样的结果:
Mod_ID | App_ID | Specs_ind | Start_Date |
---|---|---|---|
P1 | App1 | S1 | 2022-01-24 |
P1 | App2 | S2 | 2022-01-24 |
P2 | App2 | S3 | 2022-03-15 |
P2 | App2 | S4 | 2022-03-15 |
P3 | App3 | S4 | 2022-04-10 |
P3 | App3 | S5 | 2022-04-10 |
然后我尝试了以下操作:
SELECT
a.* EXCEPT(Spec_String),
SUM(
CASE
WHEN s.App_ID = a.App_ID AND s.Start_Date <= a.Prod_Date THEN ARRAY_LENGTH(REGEXP_EXTRACT_ALL(Spec_String, Specs_ind))
ELSE
0
END
) OVER(PARTITION BY s.Specs_ind ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS num_changes,
FROM
AQ AS a
LEFT JOIN
ST_UNN AS s
ON
a.App_ID = s.App_ID
但它并没有按预期工作。我期望得到类似于这样的结果:
Device_ID | App_ID | Spec_string | Prod_Date | num_changes |
---|---|---|---|---|
ID0001 | App1 | S1 S4 S6 T0 U7 | 2022-02-03 | 1 |
ID0002 | App2 | S2 S5 U9 | 2022-02-05 | 1 |
ID0003 | App2 | S1 S3 U7 | 2022-03-12 | 0 |
ID0004 | App3 | S5 S6 T0 U7 | 2022-04-18 | 1 |
但我对如何实现最后部分作为累积总和感到有些迷茫,以便设备将计算其生产日期之前发生的所有修改,并基于其规格应用。
英文:
So my SQL/BiqQuery skills still need some refining and I have been stuck on this for some time now. I got a table ST that marks the planned mods to some applications with their starting date, these mods only apply to those with some specific specs:
Mod_ID | App_ID | Specs | Start_Date |
---|---|---|---|
P1 | App1 | S1 | 2022-01-24 |
P1 | App2 | S2 | 2022-01-24 |
P2 | App2 | S3 S4 | 2022-03-15 |
P3 | App3 | S4 S5 | 2022-04-10 |
I got another table AQ that includes the production date for some devices, with their corresponding app and their specifications in the form of a string:
Device_ID | App_ID | Spec_string | Prod_Date |
---|---|---|---|
ID0001 | App1 | S1 S4 S6 T0 U7 | 2022-02-03 |
ID0002 | App2 | S2 S5 U9 | 2022-02-05 |
ID0003 | App2 | S1 S2 S3 | 2022-03-12 |
ID0004 | App3 | S5 S6 T0 U7 | 2022-04-18 |
I want to count the occurrences of the different spec mods for a particuar device that apply to its App_ID and only for those that AQ.Prod_Date>=ST.Start_Date.
I first unnested table ST (ST_UNN) and ended with something like this:
Mod_ID | App_ID | Specs_ind | Start_Date |
---|---|---|---|
P1 | App1 | S1 | 2022-01-24 |
P1 | App2 | S2 | 2022-01-24 |
P2 | App2 | S3 | 2022-03-15 |
P2 | App2 | S4 | 2022-03-15 |
P3 | App3 | S4 | 2022-04-10 |
P3 | App3 | S5 | 2022-04-10 |
and tried the following:
SELECT
a.* EXCEPT(Spec_String),
SUM(
CASE
WHEN s.App_ID = a.App_ID AND s.Start_Date <= a.Prod_Date THEN ARRAY_LENGTH(REGEXP_EXTRACT_ALL(Spec_String, Specs_ind))
ELSE
0
END
) OVER(PARTITION BY s.Specs_ind ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS num_changes,
FROM
AQ AS a
LEFT JOIN
ST_UNN AS s
ON
a.App_ID = s.App_ID
But it didn't work as intended. I expect something like this:
Device_ID | App_ID | Spec_string | Prod_Date | num_changes |
---|---|---|---|---|
ID0001 | App1 | S1 S4 S6 T0 U7 | 2022-02-03 | 1 |
ID0002 | App2 | S2 S5 U9 | 2022-02-05 | 1 |
ID0003 | App2 | S1 S3 U7 | 2022-03-12 | 0 |
ID0004 | App3 | S5 S6 T0 U7 | 2022-04-18 | 1 |
But I am rather lost on how to achieve this last part as a cumulative sum, so that a device will count all mod implementations that have happened before its production date and that apply based on its specifications.
Any feedback is greatly appreciated!
答案1
得分: 1
如果我理解正确,您可以简化一下,只需按device_id等进行分组,而不使用窗口函数。我稍微修改了您的输入,只是为了显示它按预期工作。
WITH
st AS (
SELECT
'P1' AS mod_id,
'App1' AS app_id,
'S1 S4' AS specs,
'2022-01-24' AS start_date
UNION ALL
SELECT
'P1',
'App2',
'S2',
'2022-01-24'
UNION ALL
SELECT
'P2',
'App2',
'S3 S4',
'2022-03-15'
UNION ALL
SELECT
'P3',
'App3',
'S4 S5',
'2022-04-10'
UNION ALL
SELECT
'P4',
'App3',
'S4 S5',
'2022-04-11' ),
st_unn AS (
SELECT
* EXCEPT(specs)
FROM
st,
UNNEST(SPLIT(specs, ' ')) AS spec ),
aq AS (
SELECT
-- 应该为2
'ID0001' AS device_id,
'App1' AS app_id,
'S1 S4 S6 T0 U7' AS spec_string,
'2022-02-03' AS prod_date
UNION ALL
SELECT
-- 应该为1
'ID0002',
'App2',
'S2 S5 U9',
'2022-02-05'
UNION ALL
SELECT
-- 应该为0
'ID0003',
'App2',
'S2 S5 U9',
'2021-03-12'
UNION ALL
SELECT
-- 应该为3
'ID0004',
'App3',
'S4 S5 S6 T0 U7',
'2022-04-18' )
SELECT
device_id,
aq.app_id,
spec_string,
prod_date,
SUM(COALESCE(ARRAY_LENGTH(REGEXP_EXTRACT_ALL(spec_string, spec)), 0))
FROM
aq
LEFT JOIN
st_unn st
ON
aq.prod_date >= st.start_date
AND aq.app_id = st.app_id
GROUP BY
device_id,
aq.app_id,
spec_string,
prod_date
英文:
If I am understanding this correctly, you can simplify a little and just group by device_id, etc. instead of using a window function. I modified your input slightly just to show that this was working as intended
WITH
st AS (
SELECT
'P1' AS mod_id,
'App1' AS app_id,
'S1 S4' AS specs,
'2022-01-24' AS start_date
UNION ALL
SELECT
'P1',
'App2',
'S2',
'2022-01-24'
UNION ALL
SELECT
'P2',
'App2',
'S3 S4',
'2022-03-15'
UNION ALL
SELECT
'P3',
'App3',
'S4 S5',
'2022-04-10'
UNION ALL
SELECT
'P4',
'App3',
'S4 S5',
'2022-04-11' ),
st_unn AS (
SELECT
* EXCEPT(specs)
FROM
st,
UNNEST(SPLIT(specs, ' ')) AS spec ),
aq AS (
SELECT
-- should be 2
'ID0001' AS device_id,
'App1' AS app_id,
'S1 S4 S6 T0 U7' AS spec_string,
'2022-02-03' AS prod_date
UNION ALL
SELECT
-- should be 1
'ID0002',
'App2',
'S2 S5 U9',
'2022-02-05'
UNION ALL
SELECT
-- should be 0
'ID0003',
'App2',
'S2 S5 U9',
'2021-03-12'
UNION ALL
SELECT
-- should be 3
'ID0004',
'App3',
'S4 S5 S6 T0 U7',
'2022-04-18' )
SELECT
device_id,
aq.app_id,
spec_string,
prod_date,
SUM(COALESCE(ARRAY_LENGTH(REGEXP_EXTRACT_ALL(spec_string, spec)), 0))
FROM
aq
LEFT JOIN
st_unn st
ON
aq.prod_date >= st.start_date
AND aq.app_id = st.app_id
GROUP BY
device_id,
aq.app_id,
spec_string,
prod_date
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论