2023年4月7日 02:17:02go评论65阅读模式

英文:

Counts by date and status with intermittent data

问题

我想要统计特定日期的特定状态下的对象数量（例如：灯泡、计划任务、客户账单状态）。然而，数据是不连续的。对象可能在特定日期有行，也可能没有。

不是一个简单的按“GROUP BY”解决的问题。统计“特定状态下的对象数量”的要求依赖于之前的行。

在下面的最小问题示例中，如果在03-01进行“GROUP BY”操作，结果将是“1个开启和1个关闭”。在03-02只有一个对象。进行03-02的“GROUP BY”操作结果将是“1个开启和0个关闭”。03-02的正确答案应该是“2个开启和1个关闭”，因为它必须包括03-01日期的对象。

要解决的最小问题示例如下：

SET NOCOUNT ON
-->>-- 最小问题示例  -- 按日期统计状态数量
IF OBJECT_ID('tempdb..#d') IS NOT NULL DROP TABLE #d
CREATE TABLE #d (ndx SMALLINT IDENTITY(1,1), id TINYINT, dt DATE, status CHAR(10) )

INSERT INTO #d (id, dt, status)
VALUES
  ( 1, '20230301' , 'on'    )
, ( 3, '20230301' , 'off'    )
, ( 2, '20230302' , 'on'    )
, ( 3, '20230303' , 'off'    )
, ( 3, '20230305' , 'on'    )
, ( 1, '20230308' , 'off'    )
, ( 2, '20230308' , 'off'    )
, ( 1, '20230310' , 'off'    )
, ( 2, '20230311' , 'off'    )
, ( 1, '20230312' , 'off'    )
, ( 3, '20230312' , 'off'    )
, ( 2, '20230313' , 'on'    )
, ( 1, '20230314' , 'on'    )
, ( 3, '20230314' , 'off'    )
, ( 3, '20230316' , 'off'    )
, ( 2, '20230320' , 'on'    )
, ( 1, '20230321' , 'off'    )

SELECT * FROM #d d ORDER BY id, dt

IF OBJECT_ID('tempdb..#c') IS NOT NULL DROP TABLE #c
CREATE TABLE #c ( calendardt DATE )
INSERT INTO #c(calendardt)
VALUES
  ('2023-03-01 '), ('2023-03-02 '), ('2023-03-03 '), ('2023-03-04 '), ('2023-03-05 ')
, ('2023-03-06 '), ('2023-03-07 '), ('2023-03-08 '), ('2023-03-09 '), ('2023-03-10 ')
, ('2023-03-11 '), ('2023-03-12 '), ('2023-03-13 '), ('2023-03-14 '), ('2023-03-15 ')
, ('2023-03-16 '), ('2023-03-17 '), ('2023-03-18 '), ('2023-03-19 '), ('2023-03-20 ')
, ('2023-03-21 '), ('2023-03-22 '), ('2023-03-23 '), ('2023-03-24 '), ('2023-03-25 ')

SELECT * FROM #c UNION ALL SELECT * FROM #c ORDER BY calendardt

SELECT *
FROM #c c
LEFT JOIN #d d ON d.dt = c.calendardt
ORDER BY c.calendardt, d.id

期望的结果如下：

-->>-- 期望的结果
calendardt  [status]    [count]
2023-03-01  on      1
2023-03-01  off     1
2023-03-02  on      2
2023-03-02  off     1
2023-03-03  on      2
2023-03-03  off     1
2023-03-04  on      2
2023-03-04  off     1
2023-03-05  on      3
2023-03-05  off     0
2023-03-06  on      3
2023-03-06  off     0
2023-03-07  on      3
2023-03-07  off     0
2023-03-08  on      1
2023-03-08  off     2
2023-03-09  on      1
2023-03-09  off     2
2023-03-10  on      1
2023-03-10  off     2
2023-03-11  on      1
2023-03-11  off     2
2023-03-12  on      1
2023-03-12  off     2
2023-03-13  on      0
2023-03-13  off     3
2023-03-14  on      0
2023-03-14  off     3
2023-03-15  on      0
2023-03-15  off     3
2023-03-16  on      0
2023-03-16  off     3

我没有尝试，我能够按每天跟踪数据的变化。我可以捕获对象首次进入新状态的日期，并计算它在该状态下停留的时间。但是，我无法计算按日期和状态的总数。任何关于按日期和状态进行计数的帮助将不胜感激。

英文:

I would like to count the number of objects (lightbulbs, scheduled jobs, customer billing status) that are in a specific status by date. However, the data is intermittent. Objects may or may not have a row on a specific date.

Not a simple GROUP BY solution. The requirement for number of objects "in a specific status" depends on previous rows.

In the minimum problem to solve example below, on 03-01 a GROUP BY would result in "1 on and 1 off". On 03-02 only a single object is represented. A GROUP BY showing 03-02 results would be "1 on and 0 off". The correct answer for total objects by status on 03-02 is "2 on and 1 off" because it must include the objects with rows on 03-01.

A minimum problem to solve is here

SET NOCOUNT ON
--&gt;&gt;-- minimum problem to solve  -- count by status for a specific day
IF OBJECT_ID(&#39;tempdb..#d&#39;) IS NOT NULL DROP TABLE #d
CREATE TABLE #d (ndx SMALLINT IDENTITY(1,1), id TINYINT, dt DATE, status CHAR(10) )
INSERT INTO #d (id, dt, status)
VALUES
( 1, &#39;20230301&#39; , &#39;on&#39;		)
, ( 3, &#39;20230301&#39; , &#39;off&#39;		)
, ( 2, &#39;20230302&#39; , &#39;on&#39;		)
, ( 3, &#39;20230303&#39; , &#39;off&#39;		)
, ( 3, &#39;20230305&#39; , &#39;on&#39;		)
, ( 1, &#39;20230308&#39; , &#39;off&#39;		)
, ( 2, &#39;20230308&#39; , &#39;off&#39;		)
, ( 1, &#39;20230310&#39; , &#39;off&#39;		)
, ( 2, &#39;20230311&#39; , &#39;off&#39;		)
, ( 1, &#39;20230312&#39; , &#39;off&#39;		)
, ( 3, &#39;20230312&#39; , &#39;off&#39;		)
, ( 2, &#39;20230313&#39; , &#39;on&#39;		)
, ( 1, &#39;20230314&#39; , &#39;on&#39;		)
, ( 3, &#39;20230314&#39; , &#39;off&#39;		)
, ( 3, &#39;20230316&#39; , &#39;off&#39;		)
, ( 2, &#39;20230320&#39; , &#39;on&#39;		)
, ( 1, &#39;20230321&#39; , &#39;off&#39;		)
SELECT * FROM #d d ORDER BY id, dt
IF OBJECT_ID(&#39;tempdb..#c&#39;) IS NOT NULL DROP TABLE #c
CREATE TABLE #c ( calendardt DATE )
INSERT INTO #c(calendardt)
VALUES
(&#39;2023-03-01 &#39;), (&#39;2023-03-02 &#39;), (&#39;2023-03-03 &#39;), (&#39;2023-03-04 &#39;), (&#39;2023-03-05 &#39;)
, (&#39;2023-03-06 &#39;), (&#39;2023-03-07 &#39;), (&#39;2023-03-08 &#39;), (&#39;2023-03-09 &#39;), (&#39;2023-03-10 &#39;)
, (&#39;2023-03-11 &#39;), (&#39;2023-03-12 &#39;), (&#39;2023-03-13 &#39;), (&#39;2023-03-14 &#39;), (&#39;2023-03-15 &#39;)
, (&#39;2023-03-16 &#39;), (&#39;2023-03-17 &#39;), (&#39;2023-03-18 &#39;), (&#39;2023-03-19 &#39;), (&#39;2023-03-20 &#39;)
, (&#39;2023-03-21 &#39;), (&#39;2023-03-22 &#39;), (&#39;2023-03-23 &#39;), (&#39;2023-03-24 &#39;), (&#39;2023-03-25 &#39;)
SELECT * FROM #c UNION ALL SELECT * FROM #c ORDER BY calendardt
SELECT *
FROM #c c
LEFT JOIN #d d ON d.dt = c.calendardt
ORDER BY c.calendardt, d.id

The expected result should look like this

--&gt;&gt;-- expected result
calendardt	[status]	[count]
2023-03-01	on			1
2023-03-01	off			1
2023-03-02	on			2
2023-03-02	off			1
2023-03-03	on			2
2023-03-03	off			1
2023-03-04	on			2
2023-03-04	off			1
2023-03-05	on			3
2023-03-05	off			0
2023-03-06	on			3
2023-03-06	off			0
2023-03-07	on			3
2023-03-07	off			0
2023-03-08	on			1
2023-03-08	off			2
2023-03-09	on			1
2023-03-09	off			2
2023-03-10	on			1
2023-03-10	off			2
2023-03-11	on			1
2023-03-11	off			2
2023-03-12	on			1
2023-03-12	off			2
2023-03-13	on			0
2023-03-13	off			3
2023-03-14	on			0
2023-03-14	off			3
2023-03-15	on			0
2023-03-15	off			3
2023-03-16	on			0
2023-03-16	off			3

I do not have an attempt. I am able to track changes to the data on a per day basis. I can capture the first date the object entered a new status and calculate how long it has been in that status. But I am not able calculate the totals by date and status.

Any help counting by date and status would be appreciated.

答案1

得分: 1

以下是您要翻译的SQL查询的部分：

with changes as (
  select id, dt, status 
  from (select id, dt, status, 
               case when lag(status) over (partition by id order by dt)  = status 
               then 0 else 1 end chg 
        from #d) t
  where chg = 1),
cal as (
  select calendardt dt, id from #c cross join (select distinct id from #d) t),
joined as (
  select cal.dt, cal.id,
         last_value(status) ignore nulls 
         over (partition by cal.id order by cal.dt) status 
  from cal left join changes d on cal.dt = d.dt and d.id = cal.id)
select * --dt, stat_on, stat_off 
from joined pivot (count(id) for status in ([on], [off])) piv
order by dt

请注意，代码部分未被翻译，只翻译了SQL查询的内容。

英文:

with changes as (
select id, dt, status 
from (select id, dt, status, 
case when lag(status) over (partition by id order by dt)  = status 
then 0 else 1 end chg 
from #d) t
where chg = 1),
cal as (
select calendardt dt, id from #c cross join (select distinct id from #d) t),
joined as (
select cal.dt, cal.id,
last_value(status) ignore nulls 
over (partition by cal.id order by cal.dt) status 
from cal left join changes d on cal.dt = d.dt and d.id = cal.id)
select * --dt, stat_on, stat_off 
from joined pivot (count(id) for status in ([on], [off])) piv
order by dt

SQL Server dbfiddle

Initially I made solution in Oracle, dbfiddle is here. My knowledge of SQL Server is very limited and I can test it only in dbfiddle, but it returns expected values (until 13 April, but I think it should be 1 ON, because id=2 is ON).

The idea is to first eliminate rows which do not change anything (status off->off for example), then join with calendar and cross join with all ids. This way we have rows for each day and id. Then last_value fills missing data. And we can aggregate.

答案2

得分: 0

On/off table is missing

..
CREATE TABLE #on_off (status varchar(3));
INSERT INTO #on_off (status)
VALUES
('on'), ('off');
SELECT c.calendardt, c.status, count(d.id) n
FROM (
select c.*, f.*
from #c c
cross join #on_off f
) c
LEFT JOIN #d d ON d.dt = c.calendardt and d.status = c.status
GROUP BY c.calendardt, c.status
ORDER BY c.calendardt, c.status

db<>fiddle

英文:

On/off table is missing

..
CREATE TABLE #on_off (status varchar(3));
INSERT INTO #on_off (status)
VALUES
(&#39;on&#39;), (&#39;off&#39;);
SELECT c.calendardt, c.status, count(d.id) n
FROM (
select c.*, f.*
from #c c
cross join #on_off f
) c
LEFT JOIN #d d ON d.dt = c.calendardt and d.status = c.status
GROUP BY c.calendardt, c.status
ORDER BY c.calendardt, c.status

db<>fiddle

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

按日期和状态计数，数据间歇性。

问题

答案1

答案2

在Django SQL查询中转义%会导致列表索引超出范围错误。

在多表连接查询中返回ROWID而不是数值

连接到下一行

SQL：如何选择具有最年轻日期或时间戳的行

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论