在一个datetime pandas列中的初始和最终元素对。

huangapple go评论86阅读模式
英文:

Pairs of initial and final elements in a column of datetime pandas

问题

我有一个时间列,像这样:

时间
7:00:00
7:15:00
7:30:00
8:00:00
8:15:00
8:30:00
8:45:00

我需要获取连续且每隔15分钟变化的时间子集的第一个和最后一个元素。也就是说,第一个子集将是:{"start_time": 7:00:00, "end_time":7:30:00},第二个子集将是:{"start_time":8:00:00, "end_time":8:45:00},因此我需要返回一个包含这两个字典的列表,如下所示:

[{"start_time": 7:00:00, "end_time":7:30:00}, {"start_time":8:00:00, "end_time":8:45:00}]

另一个示例:

时间
7:00:00
7:15:00
7:30:00

返回:[{"start_time": 7:00:00, "end_time":7:30:00}]

最后一个示例:

时间
7:00:00
7:15:00
7:30:00
8:00:00
9:00:00
10:00:00
10:15:00
10:30:00
11:00:00

返回:[{"start_time": 7:00:00, "end_time":7:30:00}, {"start_time":10:00:00, "end_time":10:30:00}]

英文:

I have a time column like this:

Time
7:00:00
7:15:00
7:30:00
8:00:00
8:15:00
8:30:00
8:45:00

and I need to get the first and last element of each subset of times that are continuous and change only every 15 minutes. I.e., the first subset would be: {"start_time": 7:00:00, "end_time":7:30:00} and the second subset would be: {"start_time":8:00:00, "end_time":8:45:00}, so I need to return a list with those two dictionaries like:

[{"start_time": 7:00:00, "end_time":7:30:00}, {"start_time":8:00:00, "end_time":8:45:00}]

another example:

Time
7:00:00
7:15:00
7:30:00

Returns: [{"start_time": 7:00:00, "end_time":7:30:00}]

The last one:

Time
7:00:00
7:15:00
7:30:00
8:00:00
9:00:00
10:00:00
10:15:00
10:30:00
11:00:00

returns: [{"start_time": 7:00:00, "end_time":7:30:00}, {"start_time":10:00:00, "end_time":10:30:00}]

答案1

得分: 2

将列转换为时间差,然后计算前一行和当前行之间的差异,以获取秒数,然后将其与5分钟进行比较以标记更改。然后按连续块(即mask.cumsum())对数据框进行分组,并使用第一个和最后一个聚合时间。

mask = pd.to_timedelta(df['Time']).diff().dt.total_seconds() != 900
df.groupby(mask.cumsum())['Time'].agg(['first', 'last']).to_dict(orient='records')

结果为:

[{'first': '7:00:00', 'last': '7:30:00'},
 {'first': '8:00:00', 'last': '8:45:00'}]
英文:

Covvert the column to timedelta then calculate the diff between previous and current row to get the number of seconds now compare it with 5 minuts to flag the change. Then group the dataframe by continuous blocks (i.e mask.cumsum()) and aggregate Time with first and last

mask = pd.to_timedelta(df['Time']).diff().dt.total_seconds() != 900
df.groupby(mask.cumsum())['Time'].agg(['first', 'last']).to_dict(orient='records')

[{'first': '7:00:00', 'last': '7:30:00'},
 {'first': '8:00:00', 'last': '8:45:00'}]

答案2

得分: 1

以下是翻译好的部分:

import pandas as pd

times = [
"7:00:00",
"7:15:00",
"7:30:00",
"8:00:00",
"9:00:00",
"10:00:00",
"10:15:00",
"10:30:00",
"11:00:00"]
times = [pd.Timedelta(t) for t in times]

df = pd.DataFrame(times, columns=['Times'])

fifteen = pd.Timedelta(minutes=15)
prev = None
for t in df['Times']:
    if prev:
        if t - prev == fifteen:
            prev = t
            continue
        if curr != prev:
            print({'start':curr, 'end':prev})
    curr = prev = t

输出:

{'start': Timedelta('0 days 07:00:00'), 'end': Timedelta('0 days 07:30:00')}
{'start': Timedelta('0 days 10:00:00'), 'end': Timedelta('0 days 10:30:00')}
英文:

Something like this works:

import pandas as pd

times = [
"7:00:00",
"7:15:00",
"7:30:00",
"8:00:00",
"9:00:00",
"10:00:00",
"10:15:00",
"10:30:00",
"11:00:00"]
times = [pd.Timedelta(t) for t in times]

df = pd.DataFrame(times, columns=['Times'])

fifteen = pd.Timedelta(minutes=15)
prev = None
for t in df['Times']:
    if prev:
        if t - prev == fifteen:
            prev = t
            continue
        if curr != prev:
            print({'start':curr, 'end':prev})
    curr = prev = t

Output:

{'start': Timedelta('0 days 07:00:00'), 'end': Timedelta('0 days 07:30:00')}
{'start': Timedelta('0 days 10:00:00'), 'end': Timedelta('0 days 10:30:00')}

huangapple
  • 本文由 发表于 2023年8月4日 01:31:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76830373.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定