英文:
Oracle 4 weeks running average - missing values must imply 0
问题
我试图创建一个数据的4周移动平均,并将其旋转以用于数据透视表报告。我基本上正在将一个电子表格作为数据库以及其报告迁移,我刚刚开始意识到Excel有多么松散和宽容。
目标是获取所有RC
的最近4周平均值,即使它们在该周没有数据也需要显示0
的滚动平均值。
- 因此,如果
FOO
在过去4周没有数据,它仍需要显示0
的滚动平均值。 - 如果在过去4周的任何部分没有数据,那个值必须被视为
0
处理。
以下是一些经过处理的示例数据,其中包含3周的数据。我提供了2周的输出示例,以便那些想要尝试的人还有一个神秘的周来验证他们的查询。
CREATE TABLE MY_TABLE
(
cy int NOT NULL,
week int NOT NULL,
RC VARCHAR2(10) NOT NULL,
duration_minutes number Not NULL
);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 06, 'HCM', 86);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 06, 'ERP', 12);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 06, 'PBI', 16);
-- 省略部分数据
-- commit;
这个查询不够理想,因为我不得不列出所有的周。我需要一个可以跨所有数据执行此操作的查询。也许可以使用where CY_WEEK between MIN(CY_WEEK) and MAX(CY_WEEK)
这样的子句,但这仍然不能处理丢失的数据。
select * from (
select '2022_10' as CY_WEEK, RC, SUM(s) / 4 as rolling_avg
from (
select RC, SUM(DURATION_MINUTES) as s
from MY_TABLE where CY = 2022 and WEEK between 07 and 10
group by cy,week, RC order by RC
) group by RC union all
select '2022_11' as CY_WEEK, RC, SUM(s) / 4 as rolling_avg
from (
select RC, SUM(DURATION_MINUTES) as s
from MY_TABLE where CY = 2022 and WEEK between 08 and 11
group by cy,week, RC order by RC
) group by RC
) order by CY_WEEK, RC;
当我完成后,非旋转的数据看起来像这样(理想情况下:每周应该有1行,包括所有RC
的4周滚动平均值,即使它们没有数据)。我添加了comment
字段只是为了阐述问题,尽管它不在查询中:
CY_WEEK | RC | ROLLING_AVG | comment |
---|---|---|---|
2022_10 | HCM | 269.75 | |
2022_10 | PALPHABET | 32 | |
2022_10 | PBI | 245.25 | |
2022_10 | SCM | 0 | 该周没有SCM数据 |
2022_10 | PBU | 52.25 | |
2022_10 | PDT | 23 | |
2022_10 | PIO | 187.5 | |
2022_10 | POTHER | 142.25 | |
2022_10 | PWC | 142 | |
2022_10 | TBD | 85.75 | |
2022_11 | HCM | 182.25 | |
2022_11 | PALPHABET | 32 | |
2022_11 | PBI | 198.5 | |
2022_11 | PBU | 112.25 | |
2022_11 | PDT | 23 | |
2022_11 | PIO | 191.5 | |
2022_11 | POTHER | 130 | |
2022_11 | PWC | 142 | |
2022_11 | SCM | 24.25 | |
2022_11 | TBD | 84.75 |
理想格式:
CY_WEEK | HCM | PALPHABET | PBI | SCM | PBU | PDT | PIO | POTHER | PWC | TBD |
---|---|---|---|---|---|---|---|---|---|---|
2022_11 | 269.75 | 32 | 245.25 | 0 | 52.25 | 23 | 187.5 | 142.25 | 142 | 85.75 |
2022_11 | 182.25 | 32 | 198.5 | 24.25 | 112.25 | 23 | 191.5 | 130 | 142 | 85.75 |
英文:
I am trying to create a 4 week running average of data, and then pivot it for a pivot table report. I am basically porting a spreadsheet-as-a-DB and it's reports and I am just starting to realize how sloppy and forgiving excel is.
The goal is to take the running 4 week average, of all the RC
s (even if they have no data for that week).
- So, if
FOO
had no data for the last 4 weeks, it needs to still show a rolling average of0
- If there was no data for any part of the last 4 weeks, that value must be treated as
0
for that RC
Below is some sanitized, sample data which contains 3 weeks of data. I've supplied sample of output for 2 weeks, so those who wanna take a stab at this still have a mystery week to prove out their query.
CREATE TABLE MY_TABLE
(
cy int NOT NULL,
week int NOT NULL,
RC VARCHAR2(10) NOT NULL,
duration_minutes number Not NULL
);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 06, 'HCM', 86);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 06, 'ERP', 12);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 06, 'PBI', 16);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 06, 'PBI', 75);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 06, 'ERP', 24);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 06, 'ERP', 48);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 06, 'CRM', 25);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 07, 'HCM', 43);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 07, 'TBD', 4);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 07, 'POTHER', 9);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 07, 'POTHER', 14);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 07, 'POTHER', 26);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 07, 'HCM', 240);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 07, 'PBI', 187);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 07, 'HCM', 67);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 08, 'PBI', 491);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 08, 'PBI', 192);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 08, 'PBI', 57);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 08, 'PWC', 499);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 08, 'TBD', 105);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 08, 'HCM', 37);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 08, 'POTHER', 380);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 08, 'TBD', 46);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 08, 'POTHER', 140);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 08, 'HCM', 116);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 09, 'PWC', 69);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 09, 'PBI', 54);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 09, 'PALPHABET', 128);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 09, 'HCM', 172);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 09, 'HCM', 96);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 09, 'PDT', 92);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'PIO', 350);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'TBD', 8);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'TBD', 180);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'PIO', 13);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'PIO', 25);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'PIO', 37);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'HCM', 184);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'HCM', 59);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'PIO', 24);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'PIO', 59);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'HCM', 65);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'PBU', 209);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 10, 'PIO', 242);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 11, 'SCM', 97);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 11, 'PBU', 240);
INSERT INTO MY_TABLE(CY, WEEK, RC, DURATION_MINUTES) VALUES (2022, 11, 'PIO', 16);
commit;
This query is not ideal because I have to spell out all of the weeks. I need a query that just does this across all data. Maybe a clause of where CY_WEEK between MIN(CY_WEEK) and MAX(CY_WEEK)
, but this would still not account for missing data.
select * from (
select '2022_10' as CY_WEEK, RC, SUM(s) / 4 as rolling_avg
from (
select RC, SUM(DURATION_MINUTES) as s
from MY_TABLE where CY = 2022 and WEEK between 07 and 10
group by cy,week, RC order by RC
) group by RC union all
select '2022_11' as CY_WEEK, RC, SUM(s) / 4 as rolling_avg
from (
select RC, SUM(DURATION_MINUTES) as s
from MY_TABLE where CY = 2022 and WEEK between 08 and 11
group by cy,week, RC order by RC
) group by RC
) order by CY_WEEK, RC;
The unpivoted data, when I am done, looks like this (Ideally: every week would have 1 line, with the 4 week rolling average, for all RCs, even if they had no data). I've added the comment
field just to articulate the issue, thought it's not in the query:
CY_WEEK | RC | ROLLING_AVG | comment |
---|---|---|---|
2022_10 | HCM | 269.75 | |
2022_10 | PALPHABET | 32 | |
2022_10 | PBI | 245.25 | |
2022_10 | SCM | 0 | no scm data for the week |
2022_10 | PBU | 52.25 | |
2022_10 | PDT | 23 | |
2022_10 | PIO | 187.5 | |
2022_10 | POTHER | 142.25 | |
2022_10 | PWC | 142 | |
2022_10 | TBD | 85.75 | |
2022_11 | HCM | 182.25 | |
2022_11 | PALPHABET | 32 | |
2022_11 | PBI | 198.5 | |
2022_11 | PBU | 112.25 | |
2022_11 | PDT | 23 | |
2022_11 | PIO | 191.5 | |
2022_11 | POTHER | 130 | |
2022_11 | PWC | 142 | |
2022_11 | SCM | 24.25 | |
2022_11 | TBD | 84.75 |
Ideal format:
CY_WEEK | HCM | PALPHABET | PBI | SCM | PBU | PDT | PIO | POTHER | PWC | TBD |
---|---|---|---|---|---|---|---|---|---|---|
2022_11 | 269.75 | 32 | 245.25 | 0 | 52.25 | 23 | 187.5 | 142.25 | 142 | 85.75 |
2022_11 | 182.25 | 32 | 198.5 | 24.25 | 112.25 | 23 | 191.5 | 130 | 142 | 85.75 |
There are a lot of RC
s. It would be great to not have to list them all in the query
答案1
得分: 2
以下是您要的翻译:
您可以使用PARTITION
ed OUTER JOIN
来确保每个rc
/cy
/week
组合都有一行,然后使用分析函数SUM
来获取过去4周的累积总数,然后进行PIVOT
操作:
WITH rcs (rc) AS (
SELECT 'HCM' FROM DUAL UNION ALL
SELECT 'PALPHABET' FROM DUAL UNION ALL
SELECT 'PBI' FROM DUAL UNION ALL
SELECT 'SCM' FROM DUAL UNION ALL
SELECT 'PBU' FROM DUAL UNION ALL
SELECT 'PDT' FROM DUAL UNION ALL
SELECT 'PIO' FROM DUAL UNION ALL
SELECT 'POTHER' FROM DUAL UNION ALL
SELECT 'PWC' FROM DUAL UNION ALL
SELECT 'TBD' FROM DUAL
)
SELECT *
FROM (
SELECT cy,
week,
r.rc,
SUM(COALESCE(duration_minutes, 0)) OVER (
PARTITION BY r.rc
ORDER BY TO_DATE(cy||'-01-01', 'YYYY-MM-DD') + week * 7 - 1
RANGE BETWEEN INTERVAL '21' DAY PRECEDING AND CURRENT ROW
) / 4 AS avg_duration
FROM rcs r
LEFT OUTER JOIN my_table m
PARTITION BY (m.cy, m.week)
ON (r.rc = m.rc)
)
PIVOT (
MAX(avg_duration)
FOR rc IN (
'HCM' AS hcm,
'PALPHABET' AS palphabet,
'PBI' AS pbi,
'SCM' AS scm,
'PBU' AS pbu,
'PDT' AS pdt,
'PIO' AS pio,
'POTHER' AS pother,
'PWC' AS pwc,
'TBD' AS tdb
)
)
ORDER BY cy, week;
对于示例数据,输出如下:
CY | WEEK | HCM | PALPHABET | PBI | SCM | PBU | PDT | PIO | POTHER | PWC | TDB |
---|---|---|---|---|---|---|---|---|---|---|---|
2022 | 6 | 21.5 | 0 | 22.75 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2022 | 7 | 109 | 0 | 69.5 | 0 | 0 | 0 | 0 | 12.25 | 0 | 1 |
2022 | 8 | 147.25 | 0 | 254.5 | 0 | 0 | 0 | 0 | 142.25 | 124.75 | 38.75 |
2022 | 9 | 214.25 | 32 | 268 | 0 | 0 | 23 | 0 | 142.25 | 142 | 38.75 |
2022 | 10 | 269.75 | 32 | 245.25 | 0 | 52.25 | 23 | 187.5 | 142.25 | 142 | 85.75 |
2022 | 11 | 182.25 | 32 | 198.5 | 24.25 | 112.25 | 23 | 191.5 | 130 | 142 | 84.75 |
英文:
You can use a PARTITION
ed OUTER JOIN
to ensure that there is a row for each rc
/cy
/week
combination and then use the analytic SUM
function to get the rolling total for the past 4 weeks and then PIVOT
:
WITH rcs (rc) AS (
SELECT 'HCM' FROM DUAL UNION ALL
SELECT 'PALPHABET' FROM DUAL UNION ALL
SELECT 'PBI' FROM DUAL UNION ALL
SELECT 'SCM' FROM DUAL UNION ALL
SELECT 'PBU' FROM DUAL UNION ALL
SELECT 'PDT' FROM DUAL UNION ALL
SELECT 'PIO' FROM DUAL UNION ALL
SELECT 'POTHER' FROM DUAL UNION ALL
SELECT 'PWC' FROM DUAL UNION ALL
SELECT 'TBD' FROM DUAL
)
SELECT *
FROM (
SELECT cy,
week,
r.rc,
SUM(COALESCE(duration_minutes, 0)) OVER (
PARTITION BY r.rc
ORDER BY TO_DATE(cy||'-01-01', 'YYYY-MM-DD') + week * 7 - 1
RANGE BETWEEN INTERVAL '21' DAY PRECEDING AND CURRENT ROW
) / 4 AS avg_duration
FROM rcs r
LEFT OUTER JOIN my_table m
PARTITION BY (m.cy, m.week)
ON (r.rc = m.rc)
)
PIVOT (
MAX(avg_duration)
FOR rc IN (
'HCM' AS hcm,
'PALPHABET' AS palphabet,
'PBI' AS pbi,
'SCM' AS scm,
'PBU' AS pbu,
'PDT' AS pdt,
'PIO' AS pio,
'POTHER' AS pother,
'PWC' AS pwc,
'TBD' AS tdb
)
)
ORDER BY cy, week;
Which, for the sample data, outputs:
CY | WEEK | HCM | PALPHABET | PBI | SCM | PBU | PDT | PIO | POTHER | PWC | TDB |
---|---|---|---|---|---|---|---|---|---|---|---|
2022 | 6 | 21.5 | 0 | 22.75 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2022 | 7 | 109 | 0 | 69.5 | 0 | 0 | 0 | 0 | 12.25 | 0 | 1 |
2022 | 8 | 147.25 | 0 | 254.5 | 0 | 0 | 0 | 0 | 142.25 | 124.75 | 38.75 |
2022 | 9 | 214.25 | 32 | 268 | 0 | 0 | 23 | 0 | 142.25 | 142 | 38.75 |
2022 | 10 | 269.75 | 32 | 245.25 | 0 | 52.25 | 23 | 187.5 | 142.25 | 142 | 85.75 |
2022 | 11 | 182.25 | 32 | 198.5 | 24.25 | 112.25 | 23 | 191.5 | 130 | 142 | 84.75 |
答案2
得分: 1
Here is the translated code portion without the code:
You'll possibly want to use MTO's answer, since it operates on true dates (so it will handle the week resetting after the year changes) and is cleaner in how it handles the coalesce
s. However, I have adjusted my answer to work with your new data and new fields:
WITH ranges AS
(
select distinct week s FROM my_table
)
select cy, week, COALESCE(HCM, 0) HCM, COALESCE(PALPHABET, 0) PALPHABET, COALESCE(PBI, 0) PBI, COALESCE(SCM, 0) SCM, COALESCE(PBU, 0) PBU, COALESCE(PDT, 0) PDT, COALESCE(PIO, 0) PIO, COALESCE(POTHER, 0) POTHER, COALESCE(PWC, 0) PWC, COALESCE(TBD, 0) TBD
from (
SELECT cy, s week, rc, SUM(duration_minutes) rolling_avg
FROM ranges r left outer join my_table m
ON week > r.s - 4
AND week <= r.s
GROUP BY cy,s,rc
)
pivot
(
sum(rolling_avg/4)
for rc IN ('HCM' HCM, 'PALPHABET' PALPHABET, 'PBI' PBI, 'SCM' SCM, 'PBU' PBU, 'PDT' PDT, 'PIO' PIO, 'POTHER' POTHER, 'PWC' PWC, 'TBD' TBD)
)
ORDER BY cy, week
Original answer:
I think you are looking for something like:
WITH ranges AS
(
select (SELECT MIN(CAST(SUBSTR(CY_WEEK, INSTR(CY_WEEK, '_') + 1) AS INT)) FROM MY_TABLE) + level - 1 s
from dual
connect by level <= (SELECT MAX(CAST(SUBSTR(CY_WEEK, INSTR(CY_WEEK, '_') + 1) AS INT)) FROM MY_TABLE) - (SELECT MIN(CAST(SUBSTR(CY_WEEK, INSTR(CY_WEEK, '_') + 1) AS INT)) FROM MY_TABLE)+1
)
select cy_week, COALESCE(TBD, 0) TBD, COALESCE(HCM, 0) HCM, COALESCE(SCM, 0) SCM, COALESCE(PIO, 0) PIO, COALESCE(PBU, 0) PBU, COALESCE(FOO, 0) FOO
from (
SELECT '2022_'||r.s cy_week, rc, SUM(duration_minutes) rolling_avg
FROM ranges r left outer join my_table m
ON CAST(SUBSTR(CY_WEEK, INSTR(CY_WEEK, '_') + 1) AS INT) > r.s - 4
AND CAST(SUBSTR(CY_WEEK, INSTR(CY_WEEK, '_') + 1) AS INT) <= r.s
GROUP BY s,rc
)
pivot
(
sum(rolling_avg/4)
for rc IN ('TBD' TBD, 'HCM' HCM, 'SCM' SCM, 'PIO' PIO, 'PBU' PBU, 'FOO' FOO)
)
ORDER BY CAST(SUBSTR(CY_WEEK, INSTR(CY_WEEK, '_') + 1) as int)
Please note that the code is provided as is, and any specific database or context-related adjustments may be required when implementing it.
英文:
You'll possibly want to use MTO's answer, since it operates on true dates (so it will handle the week reseting after the year changes) and is cleaner in how in handles the coalesce
s. However, I have adjusted my answer to work with your new data and new fields:
WITH ranges AS
(
select distinct week s FROM my_table
)
select cy, week, COALESCE(HCM, 0) HCM, COALESCE(PALPHABET, 0) PALPHABET, COALESCE(PBI, 0) PBI, COALESCE(SCM, 0) SCM, COALESCE(PBU, 0) PBU, COALESCE(PDT, 0) PDT, COALESCE(PIO, 0) PIO, COALESCE(POTHER, 0) POTHER, COALESCE(PWC, 0) PWC, COALESCE(TBD, 0) TBD
from (
SELECT cy, s week, rc, SUM(duration_minutes) rolling_avg
FROM ranges r left outer join my_table m
ON week > r.s - 4
AND week <= r.s
GROUP BY cy,s,rc
)
pivot
(
sum(rolling_avg/4)
for rc IN ('HCM' HCM, 'PALPHABET' PALPHABET, 'PBI' PBI, 'SCM' SCM, 'PBU' PBU, 'PDT' PDT, 'PIO' PIO, 'POTHER' POTHER, 'PWC' PWC, 'TBD' TBD)
)
ORDER BY cy, week
You can see it in this Fiddle.
Original answer:
I think you are looking for something like:
WITH ranges AS
(
select (SELECT MIN(CAST(SUBSTR(CY_WEEK, INSTR(CY_WEEK, '_') + 1) AS INT)) FROM MY_TABLE) + level - 1 s
from dual
connect by level <= (SELECT MAX(CAST(SUBSTR(CY_WEEK, INSTR(CY_WEEK, '_') + 1) AS INT)) FROM MY_TABLE) - (SELECT MIN(CAST(SUBSTR(CY_WEEK, INSTR(CY_WEEK, '_') + 1) AS INT)) FROM MY_TABLE)+1
)
select cy_week, COALESCE(TBD, 0) TBD, COALESCE(HCM, 0) HCM, COALESCE(SCM, 0) SCM, COALESCE(PIO, 0) PIO, COALESCE(PBU, 0) PBU, COALESCE(FOO, 0) FOO
from (
SELECT '2022_'||r.s cy_week, rc, SUM(duration_minutes) rolling_avg
FROM ranges r left outer join my_table m
ON CAST(SUBSTR(CY_WEEK, INSTR(CY_WEEK, '_') + 1) AS INT) > r.s - 4
AND CAST(SUBSTR(CY_WEEK, INSTR(CY_WEEK, '_') + 1) AS INT) <= r.s
GROUP BY s,rc
)
pivot
(
sum(rolling_avg/4)
for rc IN ('TBD' TBD, 'HCM' HCM, 'SCM' SCM, 'PIO' PIO, 'PBU' PBU, 'FOO' FOO)
)
ORDER BY CAST(SUBSTR(CY_WEEK, INSTR(CY_WEEK, '_') + 1) as int)
You can see it working in this Fiddle. This is a bit messy and can likely be cleaned up a bit, but hopefully it's a good starting point.
The ranges cte gets all the weeks between your first and your last date. However, if you know that all weeks exist in your data, you could simplify this to something like
WITH ranges AS
(SELECT DISTINCT CAST(SUBSTR(CY_WEEK, INSTR(CY_WEEK, '_') + 1) AS INT) s FROM MY_TABLE)
It would also be easier to work with actual dates, since this simply treats the part of the string after the underscore as an integer for comparison. You would need to add logic to handle wrapping (if the week number resets when the year changes). You could also clean this up a bit by putting the substring/cast in one CTE and run the query against that (so that logic is not repeated in a number of places).
Since you explicitly said you want 0
s, I wrote out the COALESCE
s in the SELECT
statement (there might be a better way of doing this). If null
s were fine, you could just use SELECT *
.
答案3
得分: 0
以下是翻译好的内容:
- 创建一个包含4周数据的临时表。也许您想手动提供这些数据,或者从CURRENT_DATE()计算出来。
- 创建一个包含MY_TABLE中所有可能的RC值的临时表:SELECT DISTINCT RC FROM MY_TABLE
- 对这两个表执行CROSS JOIN操作。
- 对CROSS JOIN的结果与MY_TABLE进行LEFT JOIN操作。
英文:
Not enough material to flesh out the whole answer, but the basic strategy would be to
- Create a temporary table with the 4-week data. Maybe you want to supply this manually, or compute it from CURRENT_DATE()
- Create a temporary table with all possible values for RC : SELECT DISTINCT RC FROM MY_TABLE
- Do a CROSS JOIN of those two tables
- Do a LEFT JOIN of pairs from the cross join to MY_TABLE
答案4
得分: 0
以下是您要翻译的内容:
- 创建CTE以生成从0到52的周数
- 创建CTE以汇总每周的DURATION_MINUTES并填充所有周的数据
- 创建CTE,将执行上述数据的4周窗口总和,并设置所需期间的WHERE条件
- 旋转结果
最终数据集应如下所示。
请注意,此部分是直接从您提供的代码中提取的内容,不需要进一步翻译。
英文:
With MY_TABLE data as provided you could do it like here:
- create cte generating weeks from 0 to 52
WITH
weeks AS
( Select LEVEL - 1 "WEEK" From Dual Connect By LEVEL <= 53 ),
- create cte summing the DURATION_MINUTES per weeks and fill in all the weeks data
minutes as
( Select DISTINCT y.CY, t.RC,
To_Number(CASE WHEN w.WEEK - 3 < 0 THEN Nvl(t.CY, y.CY) - 1 ELSE Nvl(t.CY, y.CY) END) ||
LPAD(CASE WHEN w.WEEK - 3 < 0 THEN 53 + w.WEEK - 4 ELSE w.WEEK - 3 END, 2, '0') "WEEK_FROM",
To_Number(y.CY || LPAD(w.WEEK, 2, '0')) "WEEK",
Sum(CASE WHEN To_Number(t.CY || LPAD(t.WEEK, 2, '0')) = To_Number(y.CY || LPAD(w.WEEK, 2, '0'))
THEN t.DURATION_MINUTES
ELSE 0 END) OVER(Partition By t.RC, y.CY, w.WEEK) "WEEK_MINS"
From weeks w
Inner Join (Select CY From MY_TABLE Group By CY ORDER BY CY) y ON(1=1)
Left Join MY_TABLE t ON(1 = 1)
),
- create cte that will do the 4 week window sums of the above data and set a WHERE condition for the period you want. Without it you will get (with this sample data) 53 rows (looks interesting) where you can see when the durations occured.
grid AS
( Select CY, RC, To_Number(WEEK_FROM) "WEEK_FROM", To_Number(WEEK) "WEEK",
Sum(WEEK_MINS)
OVER(Partition By RC Order By WEEK Rows Between 3 Preceding And Current Row) "FOUR_WEEK_MINUTES",
0 as HCM, 0 as PALPHABET, 0 as PBI, 0 as SCM, 0 as PBU, 0 as PDT, 0 as PIO, 0 as POTHER, 0 as PWC, 0 as TBD
From minutes
WHERE WEEK - WEEK_FROM IN(3, 51) And
CY = 2022 And To_Number(SubStr(WEEK, 5, 2)) Between 6 And 11
)
- Pivot the result
SELECT *
FROM ( Select WEEK "WEEK", g.RC "RC", g.FOUR_WEEK_MINUTES "FOUR_WEEK_MINUTES"
From grid g
)
PIVOT ( SUM(FOUR_WEEK_MINUTES / 4) FOR RC IN ('HCM' AS HCM, 'PALPHABET' AS PALPHABET, 'PBI' AS PBI, 'SCM' AS SCM, 'PBU' AS PBU,
'PDT' AS PDT, 'PIO' AS PIO, 'POTHER' AS POTHER, 'PWC' AS PWC, 'TBD' AS TBD )
)
ORDER BY WEEK
... and the resulting dataset should be
WEEK HCM PALPHABET PBI SCM PBU PDT PIO POTHER PWC TBD
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
202206 21.5 0 22.75 0 0 0 0 0 0 0
202207 109 0 69.5 0 0 0 0 12.25 0 1
202208 147.25 0 254.5 0 0 0 0 142.25 124.75 38.75
202209 214.25 32 268 0 0 23 0 142.25 142 38.75
202210 269.75 32 245.25 0 52.25 23 187.5 142.25 142 85.75
202211 182.25 32 198.5 24.25 112.25 23 191.5 130 142 84.75
Didn't test it for periods envolving cross years, but this should proces the data from previous year if it is within a 4 week window.
With current sample data to see the rows from before the 4 weeks window and after it - put the period limitiing the weeks between 5 and 15 to see all affected rows - result is:
WEEK HCM PALPHABET PBI SCM PBU PDT PIO POTHER PWC TBD
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
202205 0 0 0 0 0 0 0 0 0 0
202206 21.5 0 22.75 0 0 0 0 0 0 0
202207 109 0 69.5 0 0 0 0 12.25 0 1
202208 147.25 0 254.5 0 0 0 0 142.25 124.75 38.75
202209 214.25 32 268 0 0 23 0 142.25 142 38.75
202210 269.75 32 245.25 0 52.25 23 187.5 142.25 142 85.75
202211 182.25 32 198.5 24.25 112.25 23 191.5 130 142 84.75
202212 144 32 13.5 24.25 112.25 23 191.5 0 17.25 47
202213 77 0 0 24.25 112.25 0 191.5 0 0 47
202214 0 0 0 24.25 60 0 4 0 0 0
202215 0 0 0 0 0 0 0 0 0 0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论