构建具有重叠记录的时间轴 – SQL-DB2-IBM

huangapple go评论199阅读模式
英文:

Build Timeline with overlapping records - SQL-DB2-IBM

问题

Userid 总时间
User1 04:45:00
User2 03:45:00
英文:

I have a table in which I keep information related to working periods and tasks and I want to get the total amount of working time per user. But every user can work on multiple taks at the same time.
So, the result is that I have overlapped tasks.
This is the table I have

Userid Task_id Start_datetime End_datetime Total_Time
User1 Task1 2023-08-09 08:00:00 2023-08-09 09:00:00 01:00:00
User1 Task2 2023-08-09 08:15:00 2023-08-09 10:00:00 01:45:00
User2 Task1 2023-08-09 08:30:00 2023-08-09 10:00:00 01:30:00
User2 Task2 2023-08-09 09:00:00 2023-08-09 11:30:00 02:30:00
User1 Task3 2023-08-09 11:15:00 2023-08-09 13:00:00 02:45:00
User2 Task3 2023-08-09 15:15:00 2023-08-09 16:00:00 00:45:00
User2 Task1 2023-08-09 15:20:00 2023-08-09 16:00:00 00:40:00

If I try to get the total amount of time worked per user I get this, but this is not real because users are multitasks

Userid Total_worked_Time per day
User1 05:30:00
User2 05:25:00

What I would like to get is the total amount of time that every user has been busy.
So the result would be:

Userid Total Time
User1 04:45:00
User2 03:45:00

I've trying some queries and reading about CTE and Overlapping but I can't get the correct result.

答案1

得分: 1

表格中的用户1任务3的总时间是错误的,应该是01:45:00。

这里有一个解决方案:

  1. with table1 (Userid, Task_id, Start, End, Total_Time) as (
  2. VALUES
  3. ('User1', 'Task1', timestamp '2023-08-09 08:00:00', timestamp '2023-08-09 09:00:00', time '01:00:00'),
  4. ('User1', 'Task2', '2023-08-09 08:15:00', '2023-08-09 10:00:00', '01:45:00'),
  5. ('User2', 'Task1', '2023-08-09 08:30:00', '2023-08-09 10:00:00', '01:30:00'),
  6. ('User2', 'Task2', '2023-08-09 09:00:00', '2023-08-09 11:30:00', '02:30:00'),
  7. ('User1', 'Task3', '2023-08-09 11:15:00', '2023-08-09 13:00:00', '02:45:00'),
  8. ('User2', 'Task3', '2023-08-09 15:15:00', '2023-08-09 16:00:00', '00:45:00'),
  9. ('User2', 'Task1', '2023-08-09 15:20:00', '2023-08-09 16:00:00', '00:40:00')
  10. ),
  11. -- 在两个任务共享完全相同的时间段的情况下
  12. distinct_periods as (
  13. select distinct userid, start, end from table1
  14. ),
  15. -- 扩展时间段
  16. periods (n, userid, start, end) as (
  17. -- 以不重叠的时间段开始
  18. select 1, userid, start, end from distinct_periods t1
  19. where not exists (
  20. select * from distinct_periods t2
  21. where t2.userid = t1.userid
  22. and (t1.start > t2.start and t1.start <= t2.end
  23. or t1.start = t2.start and t1.end > t2.end)
  24. )
  25. -- 扩展与之重叠的时间段
  26. union all
  27. select n+1, periods.userid, periods.start, distinct_periods.end
  28. from periods, distinct_periods
  29. where
  30. distinct_periods.userid = periods.userid
  31. and distinct_periods.start between periods.start and periods.end
  32. and distinct_periods.end > periods.end
  33. ),
  34. -- 添加一个按结束日期降序排列的排名,以便最宽的时间段具有排名1
  35. with_rank as (
  36. select periods.*, rank() over(partition by userid, start order by end desc) rank from periods order by userid, start
  37. )
  38. -- 对排名为1的时间段长度求和
  39. select
  40. userid,
  41. -- 以下行与示例链接中的不同,由于IBM iDB2 LUW之间的差异
  42. time '00:00:00' + sum(timestampdiff(4, char(end - start))) minutes
  43. as time_busy
  44. from with_rank where rank = 1 group by userid
USERID TIME_BUSY
User1 03:45:00
User2 03:45:00

fiddle

英文:

Total time for User 1 task 3 is wrong in your table, should be 01:45:00

Here is a solution

  1. with table1 (Userid, Task_id, Start, End, Total_Time) as (
  2. VALUES
  3. (&#39;User1&#39;, &#39;Task1&#39;, timestamp &#39;2023-08-09 08:00:00&#39;, timestamp &#39;2023-08-09 09:00:00&#39;, time &#39;01:00:00&#39;),
  4. (&#39;User1&#39;, &#39;Task2&#39;, &#39;2023-08-09 08:15:00&#39;, &#39;2023-08-09 10:00:00&#39;, &#39;01:45:00&#39;),
  5. (&#39;User2&#39;, &#39;Task1&#39;, &#39;2023-08-09 08:30:00&#39;, &#39;2023-08-09 10:00:00&#39;, &#39;01:30:00&#39;),
  6. (&#39;User2&#39;, &#39;Task2&#39;, &#39;2023-08-09 09:00:00&#39;, &#39;2023-08-09 11:30:00&#39;, &#39;02:30:00&#39;),
  7. (&#39;User1&#39;, &#39;Task3&#39;, &#39;2023-08-09 11:15:00&#39;, &#39;2023-08-09 13:00:00&#39;, &#39;02:45:00&#39;),
  8. (&#39;User2&#39;, &#39;Task3&#39;, &#39;2023-08-09 15:15:00&#39;, &#39;2023-08-09 16:00:00&#39;, &#39;00:45:00&#39;),
  9. (&#39;User2&#39;, &#39;Task1&#39;, &#39;2023-08-09 15:20:00&#39;, &#39;2023-08-09 16:00:00&#39;, &#39;00:40:00&#39;)
  10. ),
  11. -- is case when two tasks share the exact same period
  12. distinct_periods as (
  13. select distinct userid, start, end from table1
  14. ),
  15. -- extend periods
  16. periods (n, userid, start, end) as (
  17. -- start with periods that are not preceded by an overlaping period
  18. select 1, userid, start, end from distinct_periods t1
  19. where not exists (
  20. select * from distinct_periods t2
  21. where t2.userid = t1.userid
  22. and (t1.start &gt; t2.start and t1.start &lt;= t2.end
  23. or t1.start = t2.start and t1.end &gt; t2.end)
  24. )
  25. -- extend with those that overlap
  26. union all
  27. select n+1, periods.userid, periods.start, distinct_periods.end
  28. from periods, distinct_periods
  29. where
  30. distinct_periods.userid = periods.userid
  31. and distinct_periods.start between periods.start and periods.end
  32. and distinct_periods.end &gt; periods.end
  33. -- and n &lt; 10
  34. ),
  35. -- add a rank by end date descending so that the widest period has rank 1
  36. with_rank as (
  37. select periods.*, rank() over(partition by userid, start order by end desc) rank from periods order by userid, start
  38. )
  39. -- sum the lengths of periods of rank 1
  40. select
  41. userid,
  42. -- the following line is not the same in the fiddle due to differences between DB2 for IBMi and DB2 LUW
  43. time &#39;00:00:00&#39; + sum(timestampdiff(4, char(end - start))) minutes
  44. as time_busy
  45. from with_rank where rank = 1 group by userid
USERID TIME_BUSY
User1 03:45:00
User2 03:45:00

fiddle

答案2

得分: 0

这是翻译好的部分:

"That's awesome. Thank you!
I have added an additional column in order to group results by date too.
So, solution would be like this:
with table1 (Userid, Task_id, Start, End, Total_Time) as (
VALUES
('User1', 'Task1', timestamp '2023-08-09 08:00:00', timestamp '2023-08-09 09:00:00', time '01:00:00'),
('User1', 'Task2', '2023-08-09 08:15:00', '2023-08-09 10:00:00', '01:45:00'),
('User2', 'Task1', '2023-08-09 08:30:00', '2023-08-09 10:00:00', '01:30:00'),
('User2', 'Task2', '2023-08-09 09:00:00', '2023-08-09 11:30:00', '02:30:00'),
('User1', 'Task3', '2023-08-09 11:15:00', '2023-08-09 13:00:00', '02:45:00'),
('User2', 'Task3', '2023-08-09 15:15:00', '2023-08-09 16:00:00', '00:45:00'),
('User2', 'Task1', '2023-08-09 15:20:00', '2023-08-09 16:00:00', '00:40:00')
),
-- is case when two tasks share the exact same period
distinct_periods as (
select distinct userid, start, end, cast(start as date) as workday from table1
),
-- extend periods
periods (n, userid, start, end,workday) as (
-- start with periods that are not preceded by an overlapping period
select 1, userid, start, end, workday from distinct_periods t1
where not exists (
select * from distinct_periods t2
where t2.userid = t1.userid
and (t1.start > t2.start and t1.start <= t2.end
or t1.start = t2.start and t1.end > t2.end)
and (t1.workday=t2.workday)
)
-- extend with those that overlap
union all
select n+1, periods.userid, periods.start, distinct_periods.end, periods.workday
from periods, distinct_periods
where
distinct_periods.userid = periods.userid
and distinct_periods.start between periods.start and periods.end
and distinct_periods.end > periods.end
-- and n < 10
),
-- add a rank by end date descending so that the widest period has rank 1
with_rank as (
select periods.*, rank() over(partition by userid, workday, start order by end desc) rank from periods order by userid, start
)
-- sum the lengths of periods of rank 1
select
workday, userid,
-- the following line is not the same in the fiddle due to differences between DB2 for IBMi and DB2 LUW
time '00:00:00' + sum(timestampdiff(4, char(end - start))) minutes
as time_busy
from with_rank where rank = 1 group by workday, userid
order by workday, userid"

英文:

That's awsesome. Thank you!
I have added an additional column in order to group results by date too.
So, solution would be like this:

  1. with table1 (Userid, Task_id, Start, End, Total_Time) as (
  2. VALUES
  3. (&#39;User1&#39;, &#39;Task1&#39;, timestamp &#39;2023-08-09 08:00:00&#39;, timestamp &#39;2023-08-09 09:00:00&#39;, time &#39;01:00:00&#39;),
  4. (&#39;User1&#39;, &#39;Task2&#39;, &#39;2023-08-09 08:15:00&#39;, &#39;2023-08-09 10:00:00&#39;, &#39;01:45:00&#39;),
  5. (&#39;User2&#39;, &#39;Task1&#39;, &#39;2023-08-09 08:30:00&#39;, &#39;2023-08-09 10:00:00&#39;, &#39;01:30:00&#39;),
  6. (&#39;User2&#39;, &#39;Task2&#39;, &#39;2023-08-09 09:00:00&#39;, &#39;2023-08-09 11:30:00&#39;, &#39;02:30:00&#39;),
  7. (&#39;User1&#39;, &#39;Task3&#39;, &#39;2023-08-09 11:15:00&#39;, &#39;2023-08-09 13:00:00&#39;, &#39;02:45:00&#39;),
  8. (&#39;User2&#39;, &#39;Task3&#39;, &#39;2023-08-09 15:15:00&#39;, &#39;2023-08-09 16:00:00&#39;, &#39;00:45:00&#39;),
  9. (&#39;User2&#39;, &#39;Task1&#39;, &#39;2023-08-09 15:20:00&#39;, &#39;2023-08-09 16:00:00&#39;, &#39;00:40:00&#39;)
  10. ),
  11. -- is case when two tasks share the exact same period
  12. distinct_periods as (
  13. select distinct userid, start, end, cast(start as date) as workday from table1
  14. ),
  15. -- extend periods
  16. periods (n, userid, start, end,workday) as (
  17. -- start with periods that are not preceded by an overlaping period
  18. select 1, userid, start, end, workday from distinct_periods t1
  19. where not exists (
  20. select * from distinct_periods t2
  21. where t2.userid = t1.userid
  22. and (t1.start &gt; t2.start and t1.start &lt;= t2.end
  23. or t1.start = t2.start and t1.end &gt; t2.end)
  24. and (t1.workday=t2.workday)
  25. )
  26. -- extend with those that overlap
  27. union all
  28. select n+1, periods.userid, periods.start, distinct_periods.end, periods.workday
  29. from periods, distinct_periods
  30. where
  31. distinct_periods.userid = periods.userid
  32. and distinct_periods.start between periods.start and periods.end
  33. and distinct_periods.end &gt; periods.end
  34. -- and n &lt; 10
  35. ),
  36. -- add a rank by end date descending so that the widest period has rank 1
  37. with_rank as (
  38. select periods.*, rank() over(partition by userid, workday, start order by end desc) rank from periods order by userid, start
  39. )
  40. -- sum the lengths of periods of rank 1
  41. select
  42. workday, userid,
  43. -- the following line is not the same in the fiddle due to differences between DB2 for IBMi and DB2 LUW
  44. time &#39;00:00:00&#39; + sum(timestampdiff(4, char(end - start))) minutes
  45. as time_busy
  46. from with_rank where rank = 1 group by workday, userid
  47. order by workday, userid

huangapple
  • 本文由 发表于 2023年8月10日 22:41:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76876811.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定