英文:
Pairs of initial and final elements in a column of datetime pandas
问题
我有一个时间列,像这样:
时间 |
---|
7:00:00 |
7:15:00 |
7:30:00 |
8:00:00 |
8:15:00 |
8:30:00 |
8:45:00 |
我需要获取连续且每隔15分钟变化的时间子集的第一个和最后一个元素。也就是说,第一个子集将是:{"start_time": 7:00:00, "end_time":7:30:00}
,第二个子集将是:{"start_time":8:00:00, "end_time":8:45:00}
,因此我需要返回一个包含这两个字典的列表,如下所示:
[{"start_time": 7:00:00, "end_time":7:30:00}, {"start_time":8:00:00, "end_time":8:45:00}]
另一个示例:
时间 |
---|
7:00:00 |
7:15:00 |
7:30:00 |
返回:[{"start_time": 7:00:00, "end_time":7:30:00}]
最后一个示例:
时间 |
---|
7:00:00 |
7:15:00 |
7:30:00 |
8:00:00 |
9:00:00 |
10:00:00 |
10:15:00 |
10:30:00 |
11:00:00 |
返回:[{"start_time": 7:00:00, "end_time":7:30:00}, {"start_time":10:00:00, "end_time":10:30:00}]
英文:
I have a time column like this:
Time |
---|
7:00:00 |
7:15:00 |
7:30:00 |
8:00:00 |
8:15:00 |
8:30:00 |
8:45:00 |
and I need to get the first and last element of each subset of times that are continuous and change only every 15 minutes. I.e., the first subset would be: {"start_time": 7:00:00, "end_time":7:30:00}
and the second subset would be: {"start_time":8:00:00, "end_time":8:45:00}
, so I need to return a list with those two dictionaries like:
[{"start_time": 7:00:00, "end_time":7:30:00}, {"start_time":8:00:00, "end_time":8:45:00}]
another example:
Time |
---|
7:00:00 |
7:15:00 |
7:30:00 |
Returns: [{"start_time": 7:00:00, "end_time":7:30:00}]
The last one:
Time |
---|
7:00:00 |
7:15:00 |
7:30:00 |
8:00:00 |
9:00:00 |
10:00:00 |
10:15:00 |
10:30:00 |
11:00:00 |
returns: [{"start_time": 7:00:00, "end_time":7:30:00}, {"start_time":10:00:00, "end_time":10:30:00}]
答案1
得分: 2
将列转换为时间差,然后计算前一行和当前行之间的差异,以获取秒数,然后将其与5分钟进行比较以标记更改。然后按连续块(即mask.cumsum())对数据框进行分组,并使用第一个和最后一个聚合时间。
mask = pd.to_timedelta(df['Time']).diff().dt.total_seconds() != 900
df.groupby(mask.cumsum())['Time'].agg(['first', 'last']).to_dict(orient='records')
结果为:
[{'first': '7:00:00', 'last': '7:30:00'},
{'first': '8:00:00', 'last': '8:45:00'}]
英文:
Covvert the column to timedelta then calculate the diff between previous and current row to get the number of seconds now compare it with 5 minuts to flag the change. Then group the dataframe by continuous blocks (i.e mask.cumsum()) and aggregate Time with first and last
mask = pd.to_timedelta(df['Time']).diff().dt.total_seconds() != 900
df.groupby(mask.cumsum())['Time'].agg(['first', 'last']).to_dict(orient='records')
[{'first': '7:00:00', 'last': '7:30:00'},
{'first': '8:00:00', 'last': '8:45:00'}]
答案2
得分: 1
以下是翻译好的部分:
import pandas as pd
times = [
"7:00:00",
"7:15:00",
"7:30:00",
"8:00:00",
"9:00:00",
"10:00:00",
"10:15:00",
"10:30:00",
"11:00:00"]
times = [pd.Timedelta(t) for t in times]
df = pd.DataFrame(times, columns=['Times'])
fifteen = pd.Timedelta(minutes=15)
prev = None
for t in df['Times']:
if prev:
if t - prev == fifteen:
prev = t
continue
if curr != prev:
print({'start':curr, 'end':prev})
curr = prev = t
输出:
{'start': Timedelta('0 days 07:00:00'), 'end': Timedelta('0 days 07:30:00')}
{'start': Timedelta('0 days 10:00:00'), 'end': Timedelta('0 days 10:30:00')}
英文:
Something like this works:
import pandas as pd
times = [
"7:00:00",
"7:15:00",
"7:30:00",
"8:00:00",
"9:00:00",
"10:00:00",
"10:15:00",
"10:30:00",
"11:00:00"]
times = [pd.Timedelta(t) for t in times]
df = pd.DataFrame(times, columns=['Times'])
fifteen = pd.Timedelta(minutes=15)
prev = None
for t in df['Times']:
if prev:
if t - prev == fifteen:
prev = t
continue
if curr != prev:
print({'start':curr, 'end':prev})
curr = prev = t
Output:
{'start': Timedelta('0 days 07:00:00'), 'end': Timedelta('0 days 07:30:00')}
{'start': Timedelta('0 days 10:00:00'), 'end': Timedelta('0 days 10:30:00')}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论