Oracle SQL运行平均值当没有值存在(0被隐含)。

huangapple go评论77阅读模式
英文:

oracle sql running average when no value exists (0 implied)

问题

我正在尝试制作一个滚动平均报告,但遇到了一个障碍:

当我选择某个时间段的数据时,报告中的一些组可能在该时段没有数据,这意味着值为 0

因此,我运行以下查询:

  1. SELECT CY_WEEK,
  2. RC,
  3. AVG(DURATION_MINUTES) OVER (ORDER BY CY_WEEK ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS rolling_avg
  4. FROM MI_REPORT where CY_WEEK in ('2022_08','2022_09','2022_10','2022_11') and RC = 'FOO';

这会生成:

  1. 2022_11,FOO,97

从仅选择此情况的数据中,我们可以看到,实际上只有 1 条记录:

  1. select CY_WEEK, RC, DURATION_MINUTES from MI_REPORT where CY_WEEK in ('2022_08','2022_09','2022_10','2022_11') and RC = 'FOO'

它产生与平均值相同的结果:

  1. 2022_11,SCM,97

但实际上平均值应为 97/4 = 24.25

如何在不制造大量背景数据的情况下实现这一点?

英文:

I am trying to do a running average report and I have hit a road block:

When I select data for a time period, some of the groups in my report will have no data for that period, which implies a value of 0

So, I run the query below:

  1. SELECT CY_WEEK,
  2. RC,
  3. AVG(DURATION_MINUTES) OVER (ORDER BY CY_WEEK ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS rolling_avg
  4. FROM MI_REPORT where CY_WEEK in ('2022_08','2022_09','2022_10','2022_11') and RC = 'FOO';

Which produces:

  1. 2022_11,FOO,97

As we can see from selecting data just for this case, there was in fact, only 1 entry

  1. select CY_WEEK, RC,DURATION_MINUTES from MI_REPORT where CY_WEEK in ('2022_08','2022_09','2022_10','2022_11') and RC = 'FOO'

And it produces the same results as the average:

  1. 2022_11,SCM,97

But the average actually needs to be 97/4 = 24.25

How can I achieve this without fabricating a bunch of back data?

答案1

得分: 1

It sounds like whatever you get for your sum, you just divide it by 4. You do need a Coalesce(calulation, 0) to turn nulls into zeros. So,

  1. Coalesce(SUM(DURATION_MINUTES) OVER (ORDER BY CY_WEEK ROWS BETWEEN 3 PRECEDING AND CURRENT ROW),0)/4
英文:

It sounds like whatever you get for your sum, you just divide it by 4. You do need a Coalesce(calulation, 0) to turn nulls into zeros. So,

  1. Coalesce(SUM(DURATION_MINUTES) OVER (ORDER BY CY_WEEK ROWS BETWEEN 3 PRECEDING AND CURRENT ROW),0)/4

答案2

得分: 1

你的数据中没有空值,如果我理解正确,存在缺失的行。看起来你的数据像这样:

  1. WITH
  2. tbl (CY_WEEK, RC, DURATION_MINUTES) AS
  3. (
  4. Select '2021_09', 'FOO', 75 From Dual Union All
  5. Select '2021_11', 'FOO', 83 From Dual Union All
  6. Select '2022_01', 'FOO', 69 From Dual Union All
  7. Select '2022_03', 'FOO', 99 From Dual Union All
  8. Select '2022_06', 'FOO', 78 From Dual Union All
  9. Select '2022_07', 'FOO', 91 From Dual Union All
  10. Select '2022_11', 'FOO', 97 From Dual Union All
  11. Select '2022_12', 'FOO', 85 From Dual
  12. )

如果是这样,你只需将总和除以4。类似于这样:

  1. Select CY_WEEK, RC, DURATION_MINUTES,
  2. Sum(DURATION_MINUTES) OVER() "TOTAL_MINS",
  3. 4 "NUM_OF_WEEKS",
  4. Sum(DURATION_MINUTES) OVER() / 4 "WEEKLY_AVG_MINUTES"
  5. From tbl
  6. Where CY_WEEK Between '2022_08' And '2022_11' And RC = 'FOO'

使用此示例数据,先前4周(04 - 07)的结果如下:

  1. ...
  2. Where CY_WEEK Between '2022_04' And '2022_07' And RC = 'FOO'
  3. CY_WEEK RC DURATION_MINUTES TOTAL_MINS NUM_OF_WEEKS WEEKLY_AVG_MINUTES
  4. ------- --- ---------------- ---------- ------------ ------------------
  5. 2022_06 FOO 78 169 4 42.25
  6. 2022_07 FOO 91 169 4 42.25

你不需要窗口子句,因为 WHERE 子句已经筛选了数据。

附加 - 关于不同的 RCs

如果你希望得到每个 RC 的结果,而不在 WHERE 子句中进行过滤,那么你应该在分析函数中使用 OVER(Partition By RC)。以下是示例数据中添加了一个 RC=BAZ 行的代码和结果:

  1. ... ... -- 其他行
  2. Select '2022_11', 'BAZ', 43 From Dual Union All -- 示例数据中的新行
  3. ... ... -- 其他行
  4. Select CY_WEEK, RC, DURATION_MINUTES,
  5. Sum(DURATION_MINUTES) OVER(PARTITION BY RC) "TOTAL_MINS",
  6. 4 "NUM_OF_WEEKS",
  7. Sum(DURATION_MINUTES) OVER(PARTITION BY RC) / 4 "WEEKLY_AVG_MINUTES"
  8. From tbl
  9. Where CY_WEEK Between '2022_08' And '2022_11'

这将给出每个 RC 的结果:

  1. CY_WEEK RC DURATION_MINUTES TOTAL_MINS NUM_OF_WEEKS WEEKLY_AVG_MINUTES
  2. ------- --- ---------------- ---------- ------------ ------------------
  3. 2022_11 BAZ 43 43 4 10.75
  4. 2022_11 FOO 97 97 4 24.25
英文:

There are not null values in your data, if I got it right, there are missing rows, though. It seems to me that your data looks like this:

  1. WITH
  2. tbl (CY_WEEK, RC, DURATION_MINUTES) AS
  3. (
  4. Select '2021_09', 'FOO', 75 From Dual Union All
  5. Select '2021_11', 'FOO', 83 From Dual Union All
  6. Select '2022_01', 'FOO', 69 From Dual Union All
  7. Select '2022_03', 'FOO', 99 From Dual Union All
  8. Select '2022_06', 'FOO', 78 From Dual Union All
  9. Select '2022_07', 'FOO', 91 From Dual Union All
  10. Select '2022_11', 'FOO', 97 From Dual Union All
  11. Select '2022_12', 'FOO', 85 From Dual
  12. )

If it is true then you should just divide the sum by 4. Something like here:

  1. Select CY_WEEK, RC, DURATION_MINUTES,
  2. Sum(DURATION_MINUTES) OVER() "TOTAL_MINS",
  3. 4 "NUM_OF_WEEKS",
  4. Sum(DURATION_MINUTES) OVER() / 4 "WEEKLY_AVG_MINUTES"
  5. From tbl
  6. Where CY_WEEK Between '2022_08' And '2022_11' And RC = 'FOO'
  7. CY_WEEK RC DURATION_MINUTES TOTAL_MINS NUM_OF_WEEKS WEEKLY_AVG_MINUTES
  8. ------- --- ---------------- ---------- ------------ ------------------
  9. 2022_11 FOO 97 97 4 24.25

With this sample data the result for previous 4 weeks (04 - 07) is:

  1. ...
  2. Where CY_WEEK Between '2022_04' And '2022_07' And RC = 'FOO'
  3. CY_WEEK RC DURATION_MINUTES TOTAL_MINS NUM_OF_WEEKS WEEKLY_AVG_MINUTES
  4. ------- --- ---------------- ---------- ------------ ------------------
  5. 2022_06 FOO 78 169 4 42.25
  6. 2022_07 FOO 91 169 4 42.25

You don't need the windowing clause either as where condition filters the data.

ADDITION - regarding different RCs
If you want the result per RC without filtering in WHERE clause then you should do OVER(Partition By RC) in analytic function. Below find added 1 more row to the sample data RC=BAZ and adjusted code and result.

  1. ... ...
  2. Select '2022_11', 'BAZ', 43 From Dual Union All -- new row in sample data
  3. ... ...
  4. Select CY_WEEK, RC, DURATION_MINUTES,
  5. Sum(DURATION_MINUTES) OVER(PARTITION BY RC) "TOTAL_MINS",
  6. 4 "NUM_OF_WEEKS",
  7. Sum(DURATION_MINUTES) OVER(PARTITION BY RC) / 4 "WEEKLY_AVG_MINUTES"
  8. From tbl
  9. Where CY_WEEK Between '2022_08' And '2022_11'
  10. CY_WEEK RC DURATION_MINUTES TOTAL_MINS NUM_OF_WEEKS WEEKLY_AVG_MINUTES
  11. ------- --- ---------------- ---------- ------------ ------------------
  12. 2022_11 BAZ 43 43 4 10.75
  13. 2022_11 FOO 97 97 4 24.25

答案3

得分: 1

以下是翻译好的代码部分:

  1. 可以生成一个包含在您的范围内以及前三周的日历,以覆盖您正在平均的滚动窗口,然后将其连接到您的数据以填充缺失的行并找到平均值,之后,根据周范围进行筛选:
  2. WITH weeks (week) AS (
  3. SELECT '2022_05' FROM DUAL UNION ALL
  4. SELECT '2022_06' FROM DUAL UNION ALL
  5. SELECT '2022_07' FROM DUAL UNION ALL
  6. SELECT '2022_08' FROM DUAL UNION ALL
  7. SELECT '2022_09' FROM DUAL UNION ALL
  8. SELECT '2022_10' FROM DUAL UNION ALL
  9. SELECT '2022_11' FROM DUAL
  10. ),
  11. averages (cy_week, rc, rolling_avg) AS (
  12. SELECT w.week,
  13. r.rc,
  14. AVG(COALESCE(r.duration_minutes, 0)) OVER (
  15. ORDER BY w.week ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
  16. ) AS rolling_avg
  17. FROM weeks w
  18. LEFT OUTER JOIN MI_REPORT r
  19. PARTITION BY (r.rc)
  20. ON (w.week = r.cy_week)
  21. WHERE RC = 'FOO'
  22. )
  23. SELECT cy_week,
  24. rc,
  25. rolling_avg
  26. FROM averages
  27. WHERE cy_week in ('2022_08','2022_09','2022_10','2022_11');

对于示例数据:

  1. CREATE TABLE MI_REPORT (cy_week, rc, duration_minutes) AS
  2. SELECT '2022_11', 'FOO', 97 FROM DUAL;

输出:

CY_WEEK RC ROLLING_AVG
2022_08 FOO 0
2022_09 FOO 0
2022_10 FOO 0
2022_11 FOO 24.25

fiddle

英文:

You can generate a calendar with all the weeks in your range and the 3 preceding weeks, to cover the rolling window you are averaging, and then join that to your data to fill in the missing rows and find the averages and, after that, filter on the range of weeks:

  1. WITH weeks (week) AS (
  2. SELECT '2022_05' FROM DUAL UNION ALL
  3. SELECT '2022_06' FROM DUAL UNION ALL
  4. SELECT '2022_07' FROM DUAL UNION ALL
  5. SELECT '2022_08' FROM DUAL UNION ALL
  6. SELECT '2022_09' FROM DUAL UNION ALL
  7. SELECT '2022_10' FROM DUAL UNION ALL
  8. SELECT '2022_11' FROM DUAL
  9. ),
  10. averages (cy_week, rc, rolling_avg) AS (
  11. SELECT w.week,
  12. r.rc,
  13. AVG(COALESCE(r.duration_minutes, 0)) OVER (
  14. ORDER BY w.week ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
  15. ) AS rolling_avg
  16. FROM weeks w
  17. LEFT OUTER JOIN MI_REPORT r
  18. PARTITION BY (r.rc)
  19. ON (w.week = r.cy_week)
  20. WHERE RC = 'FOO'
  21. )
  22. SELECT cy_week,
  23. rc,
  24. rolling_avg
  25. FROM averages
  26. WHERE cy_week in ('2022_08','2022_09','2022_10','2022_11');

Which, for the sample data:

  1. CREATE TABLE MI_REPORT (cy_week, rc, duration_minutes) AS
  2. SELECT '2022_11', 'FOO', 97 FROM DUAL;

Outputs:

CY_WEEK RC ROLLING_AVG
2022_08 FOO 0
2022_09 FOO 0
2022_10 FOO 0
2022_11 FOO 24.25

fiddle

答案4

得分: 1

我明白你想要一个运行平均值,但在所需的过去4周内没有行的情况下,你希望将缺失的行视为值为0,而不是简单地从平均值中省略它。你是对的,你可以确实用0回填缺失的行,并使用带有ROWS BETWEEN窗口的AVG函数。但如果你希望不进行回填,你可以将表与自身连接,以找到4周窗口内的历史行,将它们的值相加然后除以4。那么任何缺失的行都会产生与它存在且值为0的效果。

请注意,为了使此方法有效,你必须进行日期运算。这需要使用实际的日期,而不是像2022_11这样的字符串。因此,在下面的答案中,我首先生成一些用于测试的测试数据,然后将字符串转换为实际日期,以便下一个阶段可以正确执行日期运算。此外,我假设CY_WEEK是ISO周数,而不是月份。如果它不是周数,你可以相应地调整日期转换和运算。

这将得到一个结果为24.25的值:

  1. WITH
  2. MI_REPORT AS
  3. (
  4. Select '2021_09' CY_WEEK, 'FOO' RC, 75 DURATION_MINUTES From Dual Union All
  5. Select '2021_11', 'FOO', 83 From Dual Union All
  6. Select '2022_01', 'FOO', 69 From Dual Union All
  7. Select '2022_03', 'FOO', 99 From Dual Union All
  8. Select '2022_06', 'FOO', 78 From Dual Union All
  9. Select '2022_07', 'FOO', 91 From Dual Union All
  10. Select '2022_11', 'FOO', 97 From Dual Union All
  11. Select '2022_12', 'FOO', 85 From Dual
  12. ),
  13. fixeddate AS
  14. (SELECT TO_DATE(year,'YYYY') + TO_NUMBER(week)*7 cy_date,
  15. cy_week,
  16. rc,
  17. duration_minutes
  18. FROM (SELECT cy_week,
  19. SUBSTR(cy_week,1,4) year,
  20. SUBSTR(cy_week,6,2) week,
  21. rc,
  22. duration_minutes
  23. FROM mi_report))
  24. SELECT curr.CY_WEEK,
  25. curr.RC,
  26. SUM(prev.DURATION_MINUTES)/4 rolling_avg
  27. FROM fixeddate curr,
  28. fixeddate prev
  29. WHERE curr.RC = 'FOO'
  30. AND curr.rc = prev.rc
  31. AND prev.cy_date > curr.cy_date - (7*4)
  32. AND prev.cy_date <= curr.cy_date
  33. AND curr.CY_WEEK in ('2022_08','2022_09','2022_10','2022_11')
  34. GROUP BY curr.CY_WEEK,
  35. curr.RC

(注意:这是SQL代码,不需要翻译。)

英文:

I understand that you want a running average but in the absence of a row within the desired past 4 weeks, you want to treat that missing row as a value of 0, rather than simply omitting it from the average. You are right, you can certainly backfill missing rows with 0s and use an AVG with a ROWS BETWEEN windowing. But if you wish to do it without backfilling, you can join the table back to itself to find historical rows within the 4-week window, add their values up and divide by 4. Then any missing row would have the same effect as it would if it were present with a 0.

Note that for this to work, you have to do date arithmetic. That requires the use of real dates, not strings like 2022_11. So in my answer below, after generating some test data for testing only, I then convert the strings into real dates, so that the next phase can do the date arithmetic properly. Also, I'm assuming CY_WEEK is an ISO week number, not a month. Obviously you can adjust the date conversion and arithmetic if it's something other than a week number.

This does give a result of 24.25:

  1. WITH
  2. MI_REPORT AS
  3. (
  4. Select &#39;2021_09&#39; CY_WEEK, &#39;FOO&#39; RC, 75 DURATION_MINUTES From Dual Union All
  5. Select &#39;2021_11&#39;, &#39;FOO&#39;, 83 From Dual Union All
  6. Select &#39;2022_01&#39;, &#39;FOO&#39;, 69 From Dual Union All
  7. Select &#39;2022_03&#39;, &#39;FOO&#39;, 99 From Dual Union All
  8. Select &#39;2022_06&#39;, &#39;FOO&#39;, 78 From Dual Union All
  9. Select &#39;2022_07&#39;, &#39;FOO&#39;, 91 From Dual Union All
  10. Select &#39;2022_11&#39;, &#39;FOO&#39;, 97 From Dual Union All
  11. Select &#39;2022_12&#39;, &#39;FOO&#39;, 85 From Dual
  12. ),
  13. fixeddate AS
  14. (SELECT TO_DATE(year,&#39;YYYY&#39;) + TO_NUMBER(week)*7 cy_date,
  15. cy_week,
  16. rc,
  17. duration_minutes
  18. FROM (SELECT cy_week,
  19. SUBSTR(cy_week,1,4) year,
  20. SUBSTR(cy_week,6,2) week,
  21. rc,
  22. duration_minutes
  23. FROM mi_report))
  24. SELECT curr.CY_WEEK,
  25. curr.RC,
  26. SUM(prev.DURATION_MINUTES)/4 rolling_avg
  27. FROM fixeddate curr,
  28. fixeddate prev
  29. WHERE curr.RC = &#39;FOO&#39;
  30. AND curr.rc = prev.rc
  31. AND prev.cy_date &gt; curr.cy_date - (7*4)
  32. AND prev.cy_date &lt;= curr.cy_date
  33. AND curr.CY_WEEK in (&#39;2022_08&#39;,&#39;2022_09&#39;,&#39;2022_10&#39;,&#39;2022_11&#39;)
  34. GROUP BY curr.CY_WEEK,
  35. curr.RC

答案5

得分: 1

为了得到正确的答案,你的基本表应该是"calendar"。报告中的所有交易应该与该表连接,然后进行计算。

以下是一个简单的日历工厂,用于轻松生成天或月份,无需担心维护或计算正确的数量等:

  1. create or replace type list_date is table of date;
  2. /
  3. create or replace function days_factory(p_date_from date, p_date_to date, p_mode in varchar2 default 'DD') return list_date
  4. pipelined deterministic as
  5. begin
  6. if p_mode = 'DD' then
  7. for i in (select trunc(p_date_from, p_mode) + (level - 1) calterm
  8. from dual
  9. connect by level <= (1 + trunc(p_date_to) - trunc(p_date_from))) loop
  10. pipe row(i.calterm);
  11. end loop;
  12. else
  13. for i in (select add_months(trunc(p_date_from, p_mode), level - 1) calterm
  14. from dual
  15. connect by level <= (1 + months_between(trunc(p_date_to, p_mode), trunc(p_date_from, p_mode)))) loop
  16. pipe row(i.calterm);
  17. end if;
  18. end;
  19. /

创建这些对象后,可以使用select * from table(days_factory(date '2022-08-01', date '2022-12-31', 'MM'))来获取月份表,然后左连接你的销售/其他内容,然后才能计算正确的平均值。

英文:

For a correct answer, your base table should be calendar. All transactions for the report should be joined to it and then calculated.

Here is the simple calendar factory to easily produce days or months and not bother with maintaining/calculating right amount and so on:

  1. create or replace type list_date is table of date;
  2. /
  3. create or replace function days_factory(p_date_from date, p_date_to date, p_mode in varchar2 default &#39;DD&#39;) return list_date
  4. pipelined deterministic as
  5. begin
  6. if p_mode = &#39;DD&#39; then
  7. for i in (select trunc(p_date_from, p_mode) + (level - 1) calterm
  8. from dual
  9. connect by level &lt;= (1 + trunc(p_date_to) - trunc(p_date_from))) loop
  10. pipe row(i.calterm);
  11. end loop;
  12. else
  13. for i in (select add_months(trunc(p_date_from, p_mode), level - 1) calterm
  14. from dual
  15. connect by level &lt;= (1 + months_between(trunc(p_date_to, p_mode), trunc(p_date_from, p_mode)))) loop
  16. pipe row(i.calterm);
  17. end loop;
  18. end if;
  19. end;
  20. /

After creating these objects, use select * from table(days_factory(date &#39;2022-08-01&#39;, date &#39;2022-12-31&#39;, &#39;MM&#39;)) to get months table, then left join your sales/whatever and only after this you can calculate correct averages.

huangapple
  • 本文由 发表于 2023年3月1日 08:34:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/75598595.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定