如何合并任意链接的重叠时间范围。

huangapple go评论70阅读模式
英文:

How to merge arbitrarily chained overlapping time ranges

问题

我有一组带有开始和结束时间戳的事件。有一些约束条件:

  • 事件1可能与事件2、3、5、99等重叠。
  • 事件1可能与2重叠,2可能与3重叠,依此类推。
  • 一个事件可能完全包含在另一个事件中。
  • 一个事件可能与任何其他事件都不重叠。

以下是一个关于一组重叠事件和所需输出的快速可视化示例(s表示开始,e表示结束)。数据可能包含多个这种"链接"重叠时间范围,以及没有重叠的时间范围。

                                
                                 _______
         __________________     |       |   ______
        |       ___________|____|___    |  |      |
        |      |    __     |    |   |   |  |      |
        |      |   |  |    |    |   |   |  |      |
        |      |   |  |    |    |   |   |  |      |
        |      |   |  |    |    |   |   |  |      |
        |______|___|__|____|____|___|___|  |______|
       s1     s2  s3 e3   e1   s4  e2  e4 s5     e5                  

辅助函数

以下是一个使用plotly制作交互式图形以可视化事件的快速函数。它假设输入是一个带有Start列和End列的pd.DataFrame(通过对下面的输入进行简单的pd.read_csv获得)。

import pandas as pd
import numpy as np
import plotly.graph_objects as go

def plot_events(events):
    fig = go.Figure()

    # 添加虚拟数据以生成日期时间轴
    fig.add_trace(go.Scatter(
        x=pd.date_range(start=events.Start.min(), end=events.End.max()),
        y=np.zeros(events.shape[0])
    ))

    # 添加时间范围
    for i, event in events.iterrows():
        fig.add_vrect(
            x0=event.Start,
            x1=event.End
        )
    return fig

原始数据(CSV)

,Start,End,Duration
0,2020-08-03 21:05:00+00:00,2020-08-03 21:58:00+00:00,0 days 00:53:00
1,2020-08-03 22:53:00+00:00,2020-08-03 23:42:00+00:00,0 days 00:49:00
2,2020-08-03 23:29:00+00:00,2020-08-04 00:18:00+00:00,0 days 00:49:00
...

(注意:这部分内容主要是数据和代码,不需要翻译。如果您有其他需要,请告诉我。)

英文:

I have a set of events with a start and end timestamp. There are few constraints

  • event 1 may overlap with events 2, 3, 5, 99, ...
  • event 1 may overlap with 2 which may overlap with 3, ...
  • an event may be completely contained by another event
  • an event may not overlap with any other event

Here is a quick visualization of one group over overlapping events and the desired output (s indicates a start, and e indicates an end). The data may contain multiple such "chained" overlapping time ranges, as well as time ranges with no overlaps.

                             _______
     __________________     |       |   ______
    |       ___________|____|___    |  |      |
    |      |    __     |    |   |   |  |      |
    |      |   |  |    |    |   |   |  |      |
    |      |   |  |    |    |   |   |  |      |
    |      |   |  |    |    |   |   |  |      |
    |______|___|__|____|____|___|___|  |______|
   s1     s2  s3 e3   e1   s4  e2  e4 s5     e5                  
     _______________________________    ______
    |                               |  |      |
    |                               |  |      |
    |                               |  |      |
    |                               |  |      |
    |                               |  |      |
    |_______________________________|  |______|
   s1                              e1 s2     e2

Helper function

Here's a quick function for making an interactive plot with plotly in order to visualize the events. It assumes that the input is a pd.DataFrame with a Start column and an End column (as obtained by a simple pd.read_csv based on the input below).

import pandas as pd
import numpy as np
import plotly.graph_objects as go

def plot_events(events):
    fig = go.Figure()

    # Add dummy to generate the datetime axis
    fig.add_trace(go.Scatter(
        x=pd.date_range(start=events.Start.min(), end=events.End.max()),
        y=np.zeros(events.shape[0])
    ))

    # Add time capsules
    for i, event in events.iterrows():
        fig.add_vrect(
            x0=event .Start,
            x1=event .End
        )
    return fig

Raw data (CSV)

,Start,End,Duration
0,2020-08-03 21:05:00+00:00,2020-08-03 21:58:00+00:00,0 days 00:53:00
1,2020-08-03 22:53:00+00:00,2020-08-03 23:42:00+00:00,0 days 00:49:00
2,2020-08-03 23:29:00+00:00,2020-08-04 00:18:00+00:00,0 days 00:49:00
3,2020-08-04 17:01:00+00:00,2020-08-04 17:50:00+00:00,0 days 00:49:00
4,2020-08-12 02:09:00+00:00,2020-08-12 03:06:00+00:00,0 days 00:57:00
5,2020-08-12 02:13:00+00:00,2020-08-12 03:10:00+00:00,0 days 00:57:00
6,2020-08-12 02:33:00+00:00,2020-08-12 03:38:00+00:00,0 days 01:05:00
7,2020-08-12 04:45:00+00:00,2020-08-12 05:38:00+00:00,0 days 00:53:00
8,2020-08-12 05:09:00+00:00,2020-08-12 06:14:00+00:00,0 days 01:05:00
9,2020-08-12 05:37:00+00:00,2020-08-12 06:26:00+00:00,0 days 00:49:00
10,2020-08-12 05:57:00+00:00,2020-08-12 06:46:00+00:00,0 days 00:49:00
11,2020-08-12 06:09:00+00:00,2020-08-12 07:02:00+00:00,0 days 00:53:00
12,2020-08-12 08:21:00+00:00,2020-08-12 09:26:00+00:00,0 days 01:05:00
13,2020-08-12 08:53:00+00:00,2020-08-12 10:14:00+00:00,0 days 01:21:00
14,2020-08-12 09:49:00+00:00,2020-08-12 10:42:00+00:00,0 days 00:53:00
15,2020-08-12 10:09:00+00:00,2020-08-12 10:58:00+00:00,0 days 00:49:00
16,2020-08-12 10:37:00+00:00,2020-08-12 11:46:00+00:00,0 days 01:09:00
17,2020-08-12 13:17:00+00:00,2020-08-12 14:06:00+00:00,0 days 00:49:00
18,2020-08-12 13:41:00+00:00,2020-08-12 14:30:00+00:00,0 days 00:49:00
19,2020-08-13 07:21:00+00:00,2020-08-13 08:18:00+00:00,0 days 00:57:00
20,2020-08-13 07:45:00+00:00,2020-08-13 08:46:00+00:00,0 days 01:01:00
21,2020-08-14 12:33:00+00:00,2020-08-14 13:22:00+00:00,0 days 00:49:00
22,2020-08-14 12:33:00+00:00,2020-08-14 13:22:00+00:00,0 days 00:49:00
23,2020-08-21 06:49:00+00:00,2020-08-21 07:46:00+00:00,0 days 00:57:00
24,2020-08-21 06:49:00+00:00,2020-08-21 07:42:00+00:00,0 days 00:53:00
25,2020-08-21 07:05:00+00:00,2020-08-21 07:54:00+00:00,0 days 00:49:00
26,2020-08-21 08:53:00+00:00,2020-08-21 09:46:00+00:00,0 days 00:53:00
27,2020-08-21 08:53:00+00:00,2020-08-21 10:14:00+00:00,0 days 01:21:00
28,2020-08-21 10:05:00+00:00,2020-08-21 10:54:00+00:00,0 days 00:49:00
29,2020-08-27 16:53:00+00:00,2020-08-27 18:02:00+00:00,0 days 01:09:00
30,2020-08-27 17:05:00+00:00,2020-08-27 18:02:00+00:00,0 days 00:57:00
31,2020-08-27 19:57:00+00:00,2020-08-27 20:46:00+00:00,0 days 00:49:00
32,2020-08-27 19:57:00+00:00,2020-08-27 20:46:00+00:00,0 days 00:49:00
33,2020-08-27 21:37:00+00:00,2020-08-27 22:30:00+00:00,0 days 00:53:00
34,2020-08-28 13:45:00+00:00,2020-08-28 14:30:00+00:00,0 days 00:45:00
35,2020-08-28 13:53:00+00:00,2020-08-28 14:38:00+00:00,0 days 00:45:00
36,2020-08-28 21:33:00+00:00,2020-08-29 04:26:00+00:00,0 days 06:53:00
37,2020-09-01 23:53:00+00:00,2020-09-02 00:46:00+00:00,0 days 00:53:00
38,2020-09-02 00:17:00+00:00,2020-09-02 01:50:00+00:00,0 days 01:33:00
39,2020-09-02 00:17:00+00:00,2020-09-02 01:06:00+00:00,0 days 00:49:00
40,2020-09-02 02:33:00+00:00,2020-09-02 04:22:00+00:00,0 days 01:49:00
41,2020-09-02 02:33:00+00:00,2020-09-02 03:34:00+00:00,0 days 01:01:00
42,2020-09-02 10:09:00+00:00,2020-09-02 11:02:00+00:00,0 days 00:53:00
43,2020-09-02 10:57:00+00:00,2020-09-02 11:46:00+00:00,0 days 00:49:00
44,2020-09-14 06:13:00+00:00,2020-09-14 07:02:00+00:00,0 days 00:49:00
45,2020-10-12 07:29:00+00:00,2020-10-12 08:38:00+00:00,0 days 01:09:00
46,2020-10-12 07:33:00+00:00,2020-10-12 08:26:00+00:00,0 days 00:53:00
47,2020-10-12 09:41:00+00:00,2020-10-12 10:30:00+00:00,0 days 00:49:00
48,2020-10-12 09:41:00+00:00,2020-10-12 10:58:00+00:00,0 days 01:17:00
49,2020-10-16 08:05:00+00:00,2020-10-16 08:50:00+00:00,0 days 00:45:00
50,2020-10-20 13:05:00+00:00,2020-10-20 13:50:00+00:00,0 days 00:45:00
51,2020-10-25 12:45:00+00:00,2020-10-25 13:38:00+00:00,0 days 00:53:00
52,2020-10-25 13:21:00+00:00,2020-10-25 14:18:00+00:00,0 days 00:57:00
53,2020-11-14 13:13:00+00:00,2020-11-14 14:26:00+00:00,0 days 01:13:00
54,2020-11-14 13:17:00+00:00,2020-11-14 14:26:00+00:00,0 days 01:09:00
55,2020-11-14 13:49:00+00:00,2020-11-14 14:42:00+00:00,0 days 00:53:00
56,2020-11-14 14:21:00+00:00,2020-11-14 16:06:00+00:00,0 days 01:45:00
57,2020-12-12 09:41:00+00:00,2020-12-12 10:46:00+00:00,0 days 01:05:00
58,2020-12-12 10:09:00+00:00,2020-12-12 10:58:00+00:00,0 days 00:49:00
59,2020-12-12 10:37:00+00:00,2020-12-12 11:26:00+00:00,0 days 00:49:00
60,2020-12-12 12:33:00+00:00,2020-12-12 13:30:00+00:00,0 days 00:57:00
61,2020-12-12 13:25:00+00:00,2020-12-12 14:14:00+00:00,0 days 00:49:00
62,2020-12-18 08:13:00+00:00,2020-12-18 09:02:00+00:00,0 days 00:49:00
63,2020-12-19 11:17:00+00:00,2020-12-19 12:06:00+00:00,0 days 00:49:00
64,2020-12-19 13:01:00+00:00,2020-12-19 13:54:00+00:00,0 days 00:53:00
65,2020-12-23 13:53:00+00:00,2020-12-23 14:42:00+00:00,0 days 00:49:00
66,2020-12-24 05:33:00+00:00,2020-12-24 06:22:00+00:00,0 days 00:49:00
67,2020-12-25 05:49:00+00:00,2020-12-25 06:38:00+00:00,0 days 00:49:00
68,2020-12-30 19:57:00+00:00,2020-12-30 20:50:00+00:00,0 days 00:53:00
69,2020-12-31 13:13:00+00:00,2020-12-31 14:10:00+00:00,0 days 00:57:00
70,2021-01-02 04:05:00+00:00,2021-01-02 04:50:00+00:00,0 days 00:45:00
71,2021-01-18 04:57:00+00:00,2021-01-18 05:42:00+00:00,0 days 00:45:00
72,2021-01-27 14:01:00+00:00,2021-01-27 15:14:00+00:00,0 days 01:13:00
73,2021-01-27 14:53:00+00:00,2021-01-27 15:42:00+00:00,0 days 00:49:00
74,2021-03-05 06:53:00+00:00,2021-03-05 07:50:00+00:00,0 days 00:57:00
75,2021-03-05 06:53:00+00:00,2021-03-05 08:06:00+00:00,0 days 01:13:00
76,2021-03-05 07:29:00+00:00,2021-03-05 08:18:00+00:00,0 days 00:49:00
77,2021-03-05 11:09:00+00:00,2021-03-05 11:58:00+00:00,0 days 00:49:00
78,2021-03-05 11:09:00+00:00,2021-03-05 12:34:00+00:00,0 days 01:25:00
79,2021-03-05 13:01:00+00:00,2021-03-05 13:50:00+00:00,0 days 00:49:00
80,2021-03-05 13:01:00+00:00,2021-03-05 13:50:00+00:00,0 days 00:49:00
81,2021-03-08 07:37:00+00:00,2021-03-08 08:26:00+00:00,0 days 00:49:00
82,2021-03-08 07:37:00+00:00,2021-03-08 08:50:00+00:00,0 days 01:13:00
83,2021-03-08 08:13:00+00:00,2021-03-08 09:02:00+00:00,0 days 00:49:00
84,2021-03-08 08:29:00+00:00,2021-03-08 09:18:00+00:00,0 days 00:49:00
85,2021-03-08 08:37:00+00:00,2021-03-08 09:42:00+00:00,0 days 01:05:00
86,2021-03-08 09:05:00+00:00,2021-03-08 09:58:00+00:00,0 days 00:53:00
87,2021-03-08 09:21:00+00:00,2021-03-08 10:10:00+00:00,0 days 00:49:00
88,2021-03-08 09:41:00+00:00,2021-03-08 10:30:00+00:00,0 days 00:49:00
89,2021-03-08 10:53:00+00:00,2021-03-08 11:42:00+00:00,0 days 00:49:00
90,2021-03-08 12:05:00+00:00,2021-03-08 12:54:00+00:00,0 days 00:49:00
91,2021-03-08 12:05:00+00:00,2021-03-08 12:54:00+00:00,0 days 00:49:00
92,2021-03-08 12:53:00+00:00,2021-03-08 13:42:00+00:00,0 days 00:49:00
93,2021-03-08 13:33:00+00:00,2021-03-08 14:22:00+00:00,0 days 00:49:00
94,2021-03-08 14:09:00+00:00,2021-03-08 14:58:00+00:00,0 days 00:49:00
95,2021-03-08 14:21:00+00:00,2021-03-08 15:10:00+00:00,0 days 00:49:00
96,2021-03-08 14:41:00+00:00,2021-03-08 15:34:00+00:00,0 days 00:53:00
97,2021-03-08 20:13:00+00:00,2021-03-08 21:02:00+00:00,0 days 00:49:00
98,2021-03-08 20:13:00+00:00,2021-03-08 21:10:00+00:00,0 days 00:57:00
99,2021-03-08 20:29:00+00:00,2021-03-08 21:54:00+00:00,0 days 01:25:00
100,2021-03-08 21:57:00+00:00,2021-03-08 23:34:00+00:00,0 days 01:37:00
101,2021-03-08 22:09:00+00:00,2021-03-08 22:58:00+00:00,0 days 00:49:00
102,2021-03-08 23:01:00+00:00,2021-03-08 23:50:00+00:00,0 days 00:49:00
103,2021-03-08 23:13:00+00:00,2021-03-08 23:58:00+00:00,0 days 00:45:00
104,2021-03-08 23:21:00+00:00,2021-03-09 00:06:00+00:00,0 days 00:45:00
105,2021-03-08 23:41:00+00:00,2021-03-09 00:30:00+00:00,0 days 00:49:00
106,2021-03-09 00:17:00+00:00,2021-03-09 01:06:00+00:00,0 days 00:49:00
107,2021-03-09 00:49:00+00:00,2021-03-09 01:38:00+00:00,0 days 00:49:00
108,2021-03-09 01:01:00+00:00,2021-03-09 01:58:00+00:00,0 days 00:57:00
109,2021-03-09 01:21:00+00:00,2021-03-09 02:10:00+00:00,0 days 00:49:00
110,2021-03-09 01:41:00+00:00,2021-03-09 02:42:00+00:00,0 days 01:01:00
111,2021-03-09 02:09:00+00:00,2021-03-09 03:06:00+00:00,0 days 00:57:00
112,2021-03-09 03:01:00+00:00,2021-03-09 03:50:00+00:00,0 days 00:49:00
113,2021-03-09 03:49:00+00:00,2021-03-09 04:38:00+00:00,0 days 00:49:00
114,2021-03-09 04:13:00+00:00,2021-03-09 05:02:00+00:00,0 days 00:49:00
115,2021-03-09 04:25:00+00:00,2021-03-09 05:14:00+00:00,0 days 00:49:00
116,2021-03-09 04:25:00+00:00,2021-03-09 05:14:00+00:00,0 days 00:49:00
117,2021-03-09 05:01:00+00:00,2021-03-09 05:50:00+00:00,0 days 00:49:00
118,2021-03-09 10:05:00+00:00,2021-03-09 10:58:00+00:00,0 days 00:53:00
119,2021-03-09 10:09:00+00:00,2021-03-09 11:26:00+00:00,0 days 01:17:00
120,2021-03-09 10:37:00+00:00,2021-03-09 11:42:00+00:00,0 days 01:05:00
121,2021-03-09 11:41:00+00:00,2021-03-09 12:30:00+00:00,0 days 00:49:00
122,2021-03-09 11:53:00+00:00,2021-03-09 12:42:00+00:00,0 days 00:49:00
123,2021-03-09 12:25:00+00:00,2021-03-09 13:42:00+00:00,0 days 01:17:00
124,2021-03-09 13:05:00+00:00,2021-03-09 13:58:00+00:00,0 days 00:53:00
125,2021-03-09 13:21:00+00:00,2021-03-09 14:10:00+00:00,0 days 00:49:00
126,2021-03-09 13:41:00+00:00,2021-03-09 14:38:00+00:00,0 days 00:57:00
127,2021-03-09 15:13:00+00:00,2021-03-09 16:30:00+00:00,0 days 01:17:00
128,2021-03-09 16:01:00+00:00,2021-03-09 17:30:00+00:00,0 days 01:29:00
129,2021-03-09 16:45:00+00:00,2021-03-09 18:30:00+00:00,0 days 01:45:00
130,2021-03-09 16:53:00+00:00,2021-03-09 17:46:00+00:00,0 days 00:53:00
131,2021-03-09 19:21:00+00:00,2021-03-09 20:38:00+00:00,0 days 01:17:00
132,2021-03-09 19:21:00+00:00,2021-03-09 20:18:00+00:00,0 days 00:57:00
133,2021-03-09 20:01:00+00:00,2021-03-09 20:54:00+00:00,0 days 00:53:00
134,2021-03-09 22:09:00+00:00,2021-03-09 23:18:00+00:00,0 days 01:09:00
135,2021-03-10 20:37:00+00:00,2021-03-10 21:30:00+00:00,0 days 00:53:00
136,2021-03-10 20:53:00+00:00,2021-03-10 21:50:00+00:00,0 days 00:57:00
137,2021-03-11 00:01:00+00:00,2021-03-11 00:54:00+00:00,0 days 00:53:00
138,2021-03-11 00:25:00+00:00,2021-03-11 01:22:00+00:00,0 days 00:57:00
139,2021-03-11 00:49:00+00:00,2021-03-11 01:38:00+00:00,0 days 00:49:00
140,2021-03-11 03:25:00+00:00,2021-03-11 04:30:00+00:00,0 days 01:05:00
141,2021-03-11 04:05:00+00:00,2021-03-11 04:58:00+00:00,0 days 00:53:00
142,2021-03-11 04:21:00+00:00,2021-03-11 05:06:00+00:00,0 days 00:45:00
143,2021-03-11 04:29:00+00:00,2021-03-11 05:14:00+00:00,0 days 00:45:00
144,2021-03-11 07:13:00+00:00,2021-03-11 08:10:00+00:00,0 days 00:57:00
145,2021-03-11 07:13:00+00:00,2021-03-11 08:14:00+00:00,0 days 01:01:00
146,2021-03-11 08:09:00+00:00,2021-03-11 08:58:00+00:00,0 days 00:49:00
147,2021-03-11 08:13:00+00:00,2021-03-11 09:30:00+00:00,0 days 01:17:00
148,2021-03-11 09:25:00+00:00,2021-03-11 10:14:00+00:00,0 days 00:49:00
149,2021-03-11 10:29:00+00:00,2021-03-11 11:34:00+00:00,0 days 01:05:00
150,2021-03-11 12:49:00+00:00,2021-03-11 13:38:00+00:00,0 days 00:49:00
151,2021-03-11 13:37:00+00:00,2021-03-11 14:42:00+00:00,0 days 01:05:00
152,2021-03-11 14:01:00+00:00,2021-03-11 14:50:00+00:00,0 days 00:49:00
153,2021-03-11 14:13:00+00:00,2021-03-11 15:06:00+00:00,0 days 00:53:00
154,2021-03-11 14:13:00+00:00,2021-03-11 15:02:00+00:00,0 days 00:49:00
155,2021-03-11 15:33:00+00:00,2021-03-11 16:30:00+00:00,0 days 00:57:00
156,2021-03-11 16:01:00+00:00,2021-03-11 16:54:00+00:00,0 days 00:53:00
157,2021-03-11 16:05:00+00:00,2021-03-11 17:06:00+00:00,0 days 01:01:00
158,2021-03-12 12:21:00+00:00,2021-03-12 13:14:00+00:00,0 days 00:53:00
159,2021-03-12 13:13:00+00:00,2021-03-12 14:02:00+00:00,0 days 00:49:00
160,2021-03-12 13:13:00+00:00,2021-03-12 14:06:00+00:00,0 days 00:53:00
161,2021-03-12 14:13:00+00:00,2021-03-12 15:02:00+00:00,0 days 00:49:00
162,2021-03-12 14:33:00+00:00,2021-03-12 15:22:00+00:00,0 days 00:49:00
163,2021-03-12 14:49:00+00:00,2021-03-12 15:38:00+00:00,0 days 00:49:00
164,2021-03-12 14:57:00+00:00,2021-03-12 15:46:00+00:00,0 days 00:49:00
165,2021-03-12 15:09:00+00:00,2021-03-12 15:58:00+00:00,0 days 00:49:00
166,2021-03-12 15:09:00+00:00,2021-03-12 15:58:00+00:00,0 days 00:49:00
167,2021-03-12 15:25:00+00:00,2021-03-12 16:14:00+00:00,0 days 00:49:00
168,2021-03-12 15:57:00+00:00,2021-03-12 16:46:00+00:00,0 days 00:49:00
169,2021-03-12 15:57:00+00:00,2021-03-12 16:46:00+00:00,0 days 00:49:00
170,2021-03-12 17:33:00+00:00,2021-03-12 18:22:00+00:00,0 days 00:49:00
171,2021-03-16 11:57:00+00:00,2021-03-16 12:46:00+00:00,0 days 00:49:00
172,2021-03-16 12:01:00+00:00,2021-03-16 12:58:00+00:00,0 days 00:57:00
173,2021-03-16 12:37:00+00:00,2021-03-16 13:26:00+00:00,0 days 00:49:00
174,2021-03-16 12:57:00+00:00,2021-03-16 13:46:00+00:00,0 days 00:49:00
175,2021-03-16 13:13:00+00:00,2021-03-16 14:02:00+00:00,0 days 00:49:00
176,2021-03-16 13:45:00+00:00,2021-03-16 14:34:00+00:00,0 days 00:49:00
177,2021-03-16 14:01:00+00:00,2021-03-16 14:54:00+00:00,0 days 00:53:00
178,2021-03-16 14:05:00+00:00,2021-03-16 14:58:00+00:00,0 days 00:53:00
179,2021-03-16 15:09:00+00:00,2021-03-16 15:58:00+00:00,0 days 00:49:00
180,2021-03-16 17:41:00+00:00,2021-03-16 18:38:00+00:00,0 days 00:57:00
181,2021-03-16 17:49:00+00:00,2021-03-16 18:54:00+00:00,0 days 01:05:00
182,2021-03-16 18:17:00+00:00,2021-03-16 19:14:00+00:00,0 days 00:57:00
183,2021-03-16 18:41:00+00:00,2021-03-16 19:30:00+00:00,0 days 00:49:00
184,2021-03-16 18:53:00+00:00,2021-03-16 19:58:00+00:00,0 days 01:05:00
185,2021-03-16 20:01:00+00:00,2021-03-16 20:58:00+00:00,0 days 00:57:00
186,2021-03-16 22:05:00+00:00,2021-03-16 22:54:00+00:00,0 days 00:49:00
187,2021-03-16 22:17:00+00:00,2021-03-16 23:14:00+00:00,0 days 00:57:00
188,2021-03-16 22:41:00+00:00,2021-03-16 23:38:00+00:00,0 days 00:57:00
189,2021-03-16 23:41:00+00:00,2021-03-17 00:30:00+00:00,0 days 00:49:00
190,2021-03-17 02:57:00+00:00,2021-03-17 03:54:00+00:00,0 days 00:57:00
191,2021-03-17 09:37:00+00:00,2021-03-17 10:26:00+00:00,0 days 00:49:00
192,2021-06-28 23:05:00+00:00,2021-06-29 00:10:00+00:00,0 days 01:05:00
193,2021-06-29 00:05:00+00:00,2021-06-29 00:50:00+00:00,0 days 00:45:00
194,2021-06-29 00:13:00+00:00,2021-06-29 00:58:00+00:00,0 days 00:45:00
195,2021-09-19 23:21:00+00:00,2021-09-20 00:10:00+00:00,0 days 00:49:00
196,2021-09-20 03:33:00+00:00,2021-09-20 04:22:00+00:00,0 days 00:49:00
197,2021-09-30 09:29:00+00:00,2021-09-30 10:18:00+00:00,0 days 00:49:00
198,2021-09-30 09:41:00+00:00,2021-09-30 10:30:00+00:00,0 days 00:49:00
199,2021-09-30 09:53:00+00:00,2021-09-30 10:46:00+00:00,0 days 00:53:00
200,2021-11-01 11:33:00+00:00,2021-11-01 12:22:00+00:00,0 days 00:49:00
201,2021-11-01 12:37:00+00:00,2021-11-01 13:50:00+00:00,0 days 01:13:00
202,2021-11-01 12:45:00+00:00,2021-11-01 13:50:00+00:00,0 days 01:05:00
203,2021-11-01 13:21:00+00:00,2021-11-01 14:38:00+00:00,0 days 01:17:00
204,2021-11-01 13:21:00+00:00,2021-11-01 14:14:00+00:00,0 days 00:53:00
205,2021-11-01 17:49:00+00:00,2021-11-01 18:42:00+00:00,0 days 00:53:00
206,2021-11-01 17:49:00+00:00,2021-11-01 18:42:00+00:00,0 days 00:53:00
207,2021-11-01 18:33:00+00:00,2021-11-01 19:46:00+00:00,0 days 01:13:00
208,2021-11-01 18:33:00+00:00,2021-11-01 19:22:00+00:00,0 days 00:49:00
209,2021-11-01 20:25:00+00:00,2021-11-01 21:26:00+00:00,0 days 01:01:00
210,2021-11-01 20:29:00+00:00,2021-11-01 21:22:00+00:00,0 days 00:53:00
211,2021-11-01 20:49:00+00:00,2021-11-01 21:42:00+00:00,0 days 00:53:00
212,2021-11-01 21:41:00+00:00,2021-11-01 22:34:00+00:00,0 days 00:53:00
213,2021-11-01 21:57:00+00:00,2021-11-01 22:50:00+00:00,0 days 00:53:00
214,2021-11-01 22:17:00+00:00,2021-11-01 23:02:00+00:00,0 days 00:45:00
215,2021-11-01 22:25:00+00:00,2021-11-01 23:10:00+00:00,0 days 00:45:00
216,2021-11-02 10:29:00+00:00,2021-11-02 11:26:00+00:00,0 days 00:57:00
217,2021-11-02 18:09:00+00:00,2021-11-02 19:06:00+00:00,0 days 00:57:00
218,2021-11-03 17:01:00+00:00,2021-11-03 17:46:00+00:00,0 days 00:45:00
219,2021-11-05 09:21:00+00:00,2021-11-05 10:26:00+00:00,0 days 01:05:00
220,2021-11-06 13:21:00+00:00,2021-11-06 14:06:00+00:00,0 days 00:45:00
221,2021-11-06 13:29:00+00:00,2021-11-06 14:14:00+00:00,0 days 00:45:00
222,2021-11-13 17:21:00+00:00,2021-11-13 18:10:00+00:00,0 days 00:49:00
223,2021-11-13 17:33:00+00:00,2021-11-13 18:50:00+00:00,0 days 01:17:00
224,2021-11-13 18:29:00+00:00,2021-11-13 19:26:00+00:00,0 days 00:57:00
225,2021-11-13 19:45:00+00:00,2021-11-13 20:54:00+00:00,0 days 01:09:00
226,2021-11-13 21:09:00+00:00,2021-11-13 22:10:00+00:00,0 days 01:01:00
227,2021-11-13 21:57:00+00:00,2021-11-13 22:58:00+00:00,0 days 01:01:00
228,2021-11-13 23:25:00+00:00,2021-11-14 00:22:00+00:00,0 days 00:57:00
229,2021-11-14 00:01:00+00:00,2021-11-14 00:46:00+00:00,0 days 00:45:00
230,2021-11-14 00:09:00+00:00,2021-11-14 00:54:00+00:00,0 days 00:45:00
231,2021-11-14 00:45:00+00:00,2021-11-14 01:42:00+00:00,0 days 00:57:00
232,2021-11-14 14:45:00+00:00,2021-11-14 15:38:00+00:00,0 days 00:53:00
233,2021-11-14 15:17:00+00:00,2021-11-14 16:06:00+00:00,0 days 00:49:00
234,2021-11-14 17:05:00+00:00,2021-11-14 17:50:00+00:00,0 days 00:45:00
235,2021-11-14 17:13:00+00:00,2021-11-14 17:58:00+00:00,0 days 00:45:00
236,2021-11-14 18:09:00+00:00,2021-11-14 19:06:00+00:00,0 days 00:57:00
237,2021-11-14 20:21:00+00:00,2021-11-14 21:06:00+00:00,0 days 00:45:00
238,2021-11-14 20:29:00+00:00,2021-11-14 21:14:00+00:00,0 days 00:45:00
239,2021-11-14 20:53:00+00:00,2021-11-14 21:42:00+00:00,0 days 00:49:00
240,2021-11-14 21:05:00+00:00,2021-11-14 21:54:00+00:00,0 days 00:49:00
241,2021-11-14 21:09:00+00:00,2021-11-14 22:02:00+00:00,0 days 00:53:00
242,2021-11-14 21:17:00+00:00,2021-11-14 22:14:00+00:00,0 days 00:57:00
243,2021-11-14 21:33:00+00:00,2021-11-14 22:30:00+00:00,0 days 00:57:00
244,2021-11-14 21:53:00+00:00,2021-11-14 22:42:00+00:00,0 days 00:49:00
245,2021-11-14 22:01:00+00:00,2021-11-14 22:58:00+00:00,0 days 00:57:00
246,2021-11-14 22:05:00+00:00,2021-11-14 22:54:00+00:00,0 days 00:49:00
247,2021-11-14 22:17:00+00:00,2021-11-14 23:30:00+00:00,0 days 01:13:00
248,2021-11-14 22:29:00+00:00,2021-11-14 23:26:00+00:00,0 days 00:57:00
249,2021-11-14 23:21:00+00:00,2021-11-15 00:14:00+00:00,0 days 00:53:00
250,2021-11-14 23:37:00+00:00,2021-11-15 00:30:00+00:00,0 days 00:53:00
251,2021-11-14 23:49:00+00:00,2021-11-15 00:46:00+00:00,0 days 00:57:00
252,2021-11-15 00:21:00+00:00,2021-11-15 01:14:00+00:00,0 days 00:53:00
253,2021-11-15 00:33:00+00:00,2021-11-15 01:34:00+00:00,0 days 01:01:00
254,2021-11-15 01:01:00+00:00,2021-11-15 01:54:00+00:00,0 days 00:53:00
255,2021-11-15 01:17:00+00:00,2021-11-15 02:06:00+00:00,0 days 00:49:00
256,2021-11-15 01:33:00+00:00,2021-11-15 02:22:00+00:00,0 days 00:49:00
257,2021-11-15 02:01:00+00:00,2021-11-15 02:54:00+00:00,0 days 00:53:00
258,2021-11-15 04:33:00+00:00,2021-11-15 05:22:00+00:00,0 days 00:49:00
259,2021-11-15 04:49:00+00:00,2021-11-15 05:38:00+00:00,0 days 00:49:00
260,2021-11-15 11:17:00+00:00,2021-11-15 12:10:00+00:00,0 days 00:53:00
261,2021-11-15 11:37:00+00:00,2021-11-15 12:30:00+00:00,0 days 00:53:00
262,2021-11-15 15:57:00+00:00,2021-11-15 16:50:00+00:00,0 days 00:53:00
263,2021-11-15 17:57:00+00:00,2021-11-15 18:50:00+00:00,0 days 00:53:00
264,2021-11-15 18:21:00+00:00,2021-11-15 19:10:00+00:00,0 days 00:49:00
265,2021-11-15 23:29:00+00:00,2021-11-16 00:18:00+00:00,0 days 00:49:00
266,2021-11-15 23:53:00+00:00,2021-11-16 00:42:00+00:00,0 days 00:49:00
267,2021-11-16 00:13:00+00:00,2021-11-16 01:14:00+00:00,0 days 01:01:00
268,2021-11-16 00:37:00+00:00,2021-11-16 01:34:00+00:00,0 days 00:57:00
269,2021-11-16 00:53:00+00:00,2021-11-16 02:22:00+00:00,0 days 01:29:00
270,2021-11-16 01:49:00+00:00,2021-11-16 02:50:00+00:00,0 days 01:01:00
271,2021-11-16 02:13:00+00:00,2021-11-16 03:02:00+00:00,0 days 00:49:00
272,2021-11-16 02:25:00+00:00,2021-11-16 03:14:00+00:00,0 days 00:49:00
273,2021-11-16 02:37:00+00:00,2021-11-16 03:30:00+00:00,0 days 00:53:00
274,2021-11-16 05:17:00+00:00,2021-11-16 06:14:00+00:00,0 days 00:57:00
275,2021-11-16 19:17:00+00:00,2021-11-16 20:22:00+00:00,0 days 01:05:00
276,2021-11-16 19:53:00+00:00,2021-11-16 20:46:00+00:00,0 days 00:53:00
277,2021-11-16 20:13:00+00:00,2021-11-16 21:02:00+00:00,0 days 00:49:00
278,2021-11-16 20:25:00+00:00,2021-11-16 21:14:00+00:00,0 days 00:49:00
279,2021-11-16 20:53:00+00:00,2021-11-16 21:42:00+00:00,0 days 00:49:00
280,2021-11-16 21:05:00+00:00,2021-11-16 21:54:00+00:00,0 days 00:49:00
281,2021-11-16 21:29:00+00:00,2021-11-16 22:18:00+00:00,0 days 00:49:00
282,2021-11-16 21:41:00+00:00,2021-11-16 22:34:00+00:00,0 days 00:53:00
283,2021-11-16 21:45:00+00:00,2021-11-16 22:38:00+00:00,0 days 00:53:00
284,2021-11-16 22:01:00+00:00,2021-11-16 23:26:00+00:00,0 days 01:25:00
285,2021-11-16 22:21:00+00:00,2021-11-16 23:14:00+00:00,0 days 00:53:00
286,2021-11-16 22:49:00+00:00,2021-11-16 23:46:00+00:00,0 days 00:57:00
287,2021-11-16 23:29:00+00:00,2021-11-17 00:18:00+00:00,0 days 00:49:00
288,2021-11-17 00:13:00+00:00,2021-11-17 01:06:00+00:00,0 days 00:53:00
289,2021-11-17 00:49:00+00:00,2021-11-17 02:06:00+00:00,0 days 01:17:00
290,2021-11-17 00:49:00+00:00,2021-11-17 01:58:00+00:00,0 days 01:09:00
291,2021-11-17 03:29:00+00:00,2021-11-17 04:22:00+00:00,0 days 00:53:00
292,2021-11-17 03:29:00+00:00,2021-11-17 04:30:00+00:00,0 days 01:01:00
293,2021-11-17 04:01:00+00:00,2021-11-17 04:50:00+00:00,0 days 00:49:00
294,2021-11-17 04:05:00+00:00,2021-11-17 04:54:00+00:00,0 days 00:49:00
295,2021-11-17 07:25:00+00:00,2021-11-17 08:14:00+00:00,0 days 00:49:00
296,2021-11-17 07:25:00+00:00,2021-11-17 08:14:00+00:00,0 days 00:49:00
297,2021-11-17 07:37:00+00:00,2021-11-17 08:34:00+00:00,0 days 00:57:00
298,2021-11-17 07:37:00+00:00,2021-11-17 08:26:00+00:00,0 days 00:49:00
299,2021-11-17 07:57:00+00:00,2021-11-17 08:58:00+00:00,0 days 01:01:00
300,2021-11-17 10:49:00+00:00,2021-11-17 11:50:00+00:00,0 days 01:01:00
301,2021-11-17 10:57:00+00:00,2021-11-17 11:46:00+00:00,0 days 00:49:00
302,2021-11-17 11:13:00+00:00,2021-11-17 12:18:00+00:00,0 days 01:05:00
303,2021-11-17 11:41:00+00:00,2021-11-17 12:30:00+00:00,0 days 00:49:00
304,2021-11-17 14:01:00+00:00,2021-11-17 14:50:00+00:00,0 days 00:49:00
305,2021-11-17 14:13:00+00:00,2021-11-17 14:58:00+00:00,0 days 00:45:00
306,2021-11-17 14:25:00+00:00,2021-11-17 15:18:00+00:00,0 days 00:53:00
307,2021-11-17 17:17:00+00:00,2021-11-17 18:38:00+00:00,0 days 01:21:00
308,2021-11-17 18:53:00+00:00,2021-11-17 19:42:00+00:00,0 days 00:49:00
309,2021-11-17 18:57:00+00:00,2021-11-17 19:54:00+00:00,0 days 00:57:00
310,2021-11-17 19:17:00+00:00,2021-11-17 20:10:00+00:00,0 days 00:53:00
311,2021-11-17 20:05:00+00:00,2021-11-17 20:54:00+00:00,0 days 00:49:00
312,2021-11-17 20:05:00+00:00,2021-11-17 21:10:00+00:00,0 days 01:05:00
313,2021-11-17 20:21:00+00:00,2021-11-17 21:14:00+00:00,0 days 00:53:00
314,2021-11-17 21:01:00+00:00,2021-11-17 21:54:00+00:00,0 days 00:53:00
315,2021-11-17 21:21:00+00:00,2021-11-17 22:10:00+00:00,0 days 00:49:00
316,2021-11-17 23:29:00+00:00,2021-11-18 01:06:00+00:00,0 days 01:37:00
317,2021-11-17 23:33:00+00:00,2021-11-18 00:22:00+00:00,0 days 00:49:00
318,2021-11-18 00:29:00+00:00,2021-11-18 01:18:00+00:00,0 days 00:49:00
319,2021-11-18 00:45:00+00:00,2021-11-18 01:34:00+00:00,0 days 00:49:00
320,2021-11-18 01:57:00+00:00,2021-11-18 03:54:00+00:00,0 days 01:57:00
321,2021-11-18 01:57:00+00:00,2021-11-18 02:50:00+00:00,0 days 00:53:00
322,2021-11-18 04:53:00+00:00,2021-11-18 05:50:00+00:00,0 days 00:57:00
323,2021-11-18 05:57:00+00:00,2021-11-18 06:46:00+00:00,0 days 00:49:00
324,2021-11-18 09:45:00+00:00,2021-11-18 10:38:00+00:00,0 days 00:53:00
325,2021-11-18 13:33:00+00:00,2021-11-18 14:22:00+00:00,0 days 00:49:00
326,2021-11-18 14:25:00+00:00,2021-11-18 15:14:00+00:00,0 days 00:49:00
327,2021-11-18 15:33:00+00:00,2021-11-18 16:26:00+00:00,0 days 00:53:00
328,2021-11-18 16:13:00+00:00,2021-11-18 17:02:00+00:00,0 days 00:49:00
329,2021-11-18 18:17:00+00:00,2021-11-18 19:06:00+00:00,0 days 00:49:00
330,2021-11-18 19:01:00+00:00,2021-11-18 19:50:00+00:00,0 days 00:49:00
331,2021-11-18 19:57:00+00:00,2021-11-18 20:42:00+00:00,0 days 00:45:00
332,2021-11-18 20:05:00+00:00,2021-11-18 20:50:00+00:00,0 days 00:45:00
333,2021-11-20 14:33:00+00:00,2021-11-20 15:26:00+00:00,0 days 00:53:00
334,2021-11-20 14:53:00+00:00,2021-11-20 15:46:00+00:00,0 days 00:53:00
335,2021-11-20 15:29:00+00:00,2021-11-20 16:26:00+00:00,0 days 00:57:00
336,2021-11-20 15:57:00+00:00,2021-11-20 16:58:00+00:00,0 days 01:01:00
337,2021-11-20 16:05:00+00:00,2021-11-20 16:54:00+00:00,0 days 00:49:00
338,2021-11-20 16:17:00+00:00,2021-11-20 17:06:00+00:00,0 days 00:49:00
339,2021-11-20 16:29:00+00:00,2021-11-20 17:18:00+00:00,0 days 00:49:00
340,2021-11-20 16:53:00+00:00,2021-11-20 17:42:00+00:00,0 days 00:49:00
341,2021-11-20 17:21:00+00:00,2021-11-20 18:14:00+00:00,0 days 00:53:00
342,2021-11-20 18:45:00+00:00,2021-11-20 20:30:00+00:00,0 days 01:45:00
343,2021-11-20 19:01:00+00:00,2021-11-20 19:50:00+00:00,0 days 00:49:00
344,2021-11-20 19:53:00+00:00,2021-11-20 20:42:00+00:00,0 days 00:49:00
345,2021-11-20 20:09:00+00:00,2021-11-20 20:58:00+00:00,0 days 00:49:00
346,2021-11-20 20:25:00+00:00,2021-11-20 21:14:00+00:00,0 days 00:49:00
347,2021-11-20 20:49:00+00:00,2021-11-20 21:38:00+00:00,0 days 00:49:00
348,2021-11-20 21:01:00+00:00,2021-11-20 21:50:00+00:00,0 days 00:49:00
349,2021-11-20 22:13:00+00:00,2021-11-20 23:02:00+00:00,0 days 00:49:00
350,2021-11-20 23:01:00+00:00,2021-11-20 23:50:00+00:00,0 days 00:49:00
351,2021-11-21 03:09:00+00:00,2021-11-21 03:58:00+00:00,0 days 00:49:00
352,2021-11-21 03:21:00+00:00,2021-11-21 04:14:00+00:00,0 days 00:53:00
353,2021-11-21 17:09:00+00:00,2021-11-21 17:58:00+00:00,0 days 00:49:00
354,2021-11-21 20:33:00+00:00,2021-11-21 21:22:00+00:00,0 days 00:49:00
355,2021-11-22 08:33:00+00:00,2021-11-22 09:22:00+00:00,0 days 00:49:00
356,2021-11-22 09:17:00+00:00,2021-11-22 10:06:00+00:00,0 days 00:49:00
357,2021-11-22 09:57:00+00:00,2021-11-22 10:46:00+00:00,0 days 00:49:00

答案1

得分: 2

将两列转换为numpy数组并传递给merge函数,最后转换为日期时间并创建Duration列:

#https://stackoverflow.com/a/5679899/2901002
def merge(times):
    saved = list(times[0])
    for st, en in sorted([sorted(t) for t in times]):
        if st <= saved[1]:
            saved[1] = max(saved[1], en)
        else:
            yield tuple(saved)
            saved[0] = st
            saved[1] = en

out = pd.DataFrame(merge(df[['Start','End']].to_numpy()), 
                   columns=['Start','End']).apply(pd.to_datetime)

out['Duration'] = out['End'].sub(out['Start'])
print(out)

输出如下:

                            Start                       End        Duration
0   2020-08-03 21:05:00+00:00 2020-08-03 21:58:00+00:00 0 days 00:53:00
1   2020-08-03 22:53:00+00:00 2020-08-04 00:18:00+00:00 0 days 01:25:00
2   2020-08-04 17:01:00+00:00 2020-08-04 17:50:00+00:00 0 days 00:49:00
3   2020-08-12 02:09:00+00:00 2020-08-12 03:38:00+00:00 0 days 01:29:00
4   2020-08-12 04:45:00+00:00 2020-08-12 07:02:00+00:00 0 days 02:17:00
...
115 2021-11-20 22:13:00+00:00 2021-11-20 23:50:00+00:00 0 days 01:37:00
116 2021-11-21 03:09:00+00:00 2021-11-21 04:14:00+00:00 0 days 01:05:00
117 2021-11-21 17:09:00+00:00 2021-11-21 17:58:00+00:00 0 days 00:49:00
118 2021-11-21 20:33:00+00:00 2021-11-21 21:22:00+00:00 0 days 00:49:00
119 2021-11-22 08:33:00+00:00 2021-11-22 10:46:00+00:00 0 days 02:13:00
[120 rows x 3 columns]
英文:

Convert both columns to numpy array and pass to merge function, last convert to datetimes and create Duration column:

#https://stackoverflow.com/a/5679899/2901002
def merge(times):
saved = list(times[0])
for st, en in sorted([sorted(t) for t in times]):
if st &lt;= saved[1]:
saved[1] = max(saved[1], en)
else:
yield tuple(saved)
saved[0] = st
saved[1] = en
yield tuple(saved)
out = pd.DataFrame(merge(df[[&#39;Start&#39;,&#39;End&#39;]].to_numpy()), 
columns=[&#39;Start&#39;,&#39;End&#39;]).apply(pd.to_datetime)
out[&#39;Duration&#39;] = out[&#39;End&#39;].sub(out[&#39;Start&#39;])
print (out)
Start                       End        Duration
0   2020-08-03 21:05:00+00:00 2020-08-03 21:58:00+00:00 0 days 00:53:00
1   2020-08-03 22:53:00+00:00 2020-08-04 00:18:00+00:00 0 days 01:25:00
2   2020-08-04 17:01:00+00:00 2020-08-04 17:50:00+00:00 0 days 00:49:00
3   2020-08-12 02:09:00+00:00 2020-08-12 03:38:00+00:00 0 days 01:29:00
4   2020-08-12 04:45:00+00:00 2020-08-12 07:02:00+00:00 0 days 02:17:00
..                        ...                       ...             ...
115 2021-11-20 22:13:00+00:00 2021-11-20 23:50:00+00:00 0 days 01:37:00
116 2021-11-21 03:09:00+00:00 2021-11-21 04:14:00+00:00 0 days 01:05:00
117 2021-11-21 17:09:00+00:00 2021-11-21 17:58:00+00:00 0 days 00:49:00
118 2021-11-21 20:33:00+00:00 2021-11-21 21:22:00+00:00 0 days 00:49:00
119 2021-11-22 08:33:00+00:00 2021-11-22 10:46:00+00:00 0 days 02:13:00
[120 rows x 3 columns]

答案2

得分: 2

你可以使用我在这里描述的方法的变种。

总结一下,按照开始和结束对行进行排序,计算一个 cummax 来传播结束时间。然后定义独立的区间为那些其开始时间在前一个传播结束时间之后的区间。

out = (df
   # 这部分是用来识别重叠区间的
   .sort_values(by=['Start', 'End'])
   .assign(max_End=lambda d: d['End'].cummax(),
           group=lambda d: d['Start'].ge(d['max_End'].shift()).cumsum())
   # 这部分是用来聚合组的
   .groupby('group').agg({'Start': 'min', 'End': 'max'})
   .assign(Duration=lambda g: g['End'] - g['Start'])
)

输出:

                          Start                       End        Duration
group                                                                    
0     2020-08-03 21:05:00+00:00 2020-08-03 21:58:00+00:00 0 days 00:53:00
1     2020-08-03 22:53:00+00:00 2020-08-04 00:18:00+00:00 0 days 01:25:00
2     2020-08-04 17:01:00+00:00 2020-08-04 17:50:00+00:00 0 days 00:49:00
3     2020-08-12 02:09:00+00:00 2020-08-12 03:38:00+00:00 0 days 01:29:00
4     2020-08-12 04:45:00+00:00 2020-08-12 07:02:00+00:00 0 days 02:17:00
...                         ...                       ...             ...
115   2021-11-20 22:13:00+00:00 2021-11-20 23:50:00+00:00 0 days 01:37:00
116   2021-11-21 03:09:00+00:00 2021-11-21 04:14:00+00:00 0 days 01:05:00
117   2021-11-21 17:09:00+00:00 2021-11-21 17:58:00+00:00 0 days 00:49:00
118   2021-11-21 20:33:00+00:00 2021-11-21 21:22:00+00:00 0 days 00:49:00
119   2021-11-22 08:33:00+00:00 2021-11-22 10:46:00+00:00 0 days 02:13:00
[120 rows x 3 columns]

合并区间的图形示例(数字是索引,颜色代表组):

如何合并任意链接的重叠时间范围。

在这里,你可以看到完全包含的区间示例(例如,246 包含在 245 中),以及结束时间的 cummax 如何工作的示例:

如何合并任意链接的重叠时间范围。

英文:

You can use of variant of the method which I described here.

In summary, sort the rows by Start, then End, compute a cummax to propagate the End. Then define the standalone intervals as those that have a start that is after the previous propagated End.

out = (df
# this part is to identify the overlapping intervals
.sort_values(by=[&#39;Start&#39;, &#39;End&#39;])
.assign(max_End=lambda d: d[&#39;End&#39;].cummax(),
group=lambda d: d[&#39;Start&#39;].ge(d[&#39;max_End&#39;].shift()).cumsum())
# this part is to aggregate the groups
.groupby(&#39;group&#39;).agg({&#39;Start&#39;: &#39;min&#39;, &#39;End&#39;: &#39;max&#39;})
.assign(Duration=lambda g: g[&#39;End&#39;]-g[&#39;Start&#39;])
)

Output:

                          Start                       End        Duration
group                                                                    
0     2020-08-03 21:05:00+00:00 2020-08-03 21:58:00+00:00 0 days 00:53:00
1     2020-08-03 22:53:00+00:00 2020-08-04 00:18:00+00:00 0 days 01:25:00
2     2020-08-04 17:01:00+00:00 2020-08-04 17:50:00+00:00 0 days 00:49:00
3     2020-08-12 02:09:00+00:00 2020-08-12 03:38:00+00:00 0 days 01:29:00
4     2020-08-12 04:45:00+00:00 2020-08-12 07:02:00+00:00 0 days 02:17:00
...                         ...                       ...             ...
115   2021-11-20 22:13:00+00:00 2021-11-20 23:50:00+00:00 0 days 01:37:00
116   2021-11-21 03:09:00+00:00 2021-11-21 04:14:00+00:00 0 days 01:05:00
117   2021-11-21 17:09:00+00:00 2021-11-21 17:58:00+00:00 0 days 00:49:00
118   2021-11-21 20:33:00+00:00 2021-11-21 21:22:00+00:00 0 days 00:49:00
119   2021-11-22 08:33:00+00:00 2021-11-22 10:46:00+00:00 0 days 02:13:00
[120 rows x 3 columns]

Graphical example of merged intervals (the number is the index, the color is by group):

如何合并任意链接的重叠时间范围。

Here you see examples of fully included intervals (e.g. 246 is contained in 245), and of how the end cummax is working:

如何合并任意链接的重叠时间范围。

huangapple
  • 本文由 发表于 2023年6月19日 19:14:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76506090.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定