英文:
How to get the earlier hour and the latest hour in each day on a pandas Dataframe?
问题
我有一个数据框,看起来像这样:
data['date']
date
0 2018-01-01 03:00:00
1 2018-01-01 04:00:00
2 2018-01-01 08:00:00
3 2018-01-02 04:00:00
4 2018-01-02 06:00:00
5 2018-01-02 12:00:00
6 2018-01-03 05:00:00
7 2018-01-03 07:00:00
8 2018-01-03 17:00:00
我想要创建两个新列,分别表示每天最早和最晚的小时,类似这样:
date_daily earlier_hour latest_hour
0 2018-01-01 03:00:00 08:00:00
1 2018-01-02 04:00:00 12:00:00
2 2018-01-03 05:00:00 17:00:00
感谢您的帮助。
英文:
I have a DF that looks like that:
data['date']
date
0 2018-01-01 03:00:00
1 2018-01-01 04:00:00
2 2018-01-01 08:00:00
3 2018-01-02 04:00:00
4 2018-01-02 06:00:00
5 2018-01-02 12:00:00
6 2018-01-03 05:00:00
7 2018-01-03 07:00:00
8 2018-01-03 17:00:00
I would like to create to new columns with the earlier and latest hour in each day, something like that:
date_daily earlier_hour latest_hour
0 2018-01-01 03:00:00 08:00:00
1 2018-01-02 04:00:00 12:00:00
2 2018-01-03 05:00:00 17:00:00
Thanks for your help.
答案1
得分: 4
你可以使用自定义的 groupby.agg
与 dt.normalize
来获取日期部分,然后使用 dt.time
来获取时间部分:
df['date'] = pd.to_datetime(df['date'])
(df.groupby(df['date'].dt.normalize())
['date'].agg(['min', 'max'])
.apply(lambda s: s.dt.time)
.reset_index()
)
输出:
date min max
0 2018-01-01 03:00:00 08:00:00
1 2018-01-02 04:00:00 12:00:00
2 2018-01-03 05:00:00 17:00:00
自定义列名
df['date'] = pd.to_datetime(df['date'])
(df.groupby(df['date'].dt.normalize().rename('date_daily'))
.agg(**{'earlier_hour': ('date', 'min'),
'later_hour': ('date', 'max')})
.apply(lambda s: s.dt.time)
.reset_index()
)
输出:
date_daily earlier_hour later_hour
0 2018-01-01 03:00:00 08:00:00
1 2018-01-02 04:00:00 12:00:00
2 2018-01-03 05:00:00 17:00:00
英文:
You can use a custom groupby.agg
using dt.normalize
to get the date only as grouper, then dt.time
for the time component:
df['date'] = pd.to_datetime(df['date'])
(df.groupby(df['date'].dt.normalize())
['date'].agg(['min', 'max'])
.apply(lambda s: s.dt.time)
.reset_index()
)
Output:
date min max
0 2018-01-01 03:00:00 08:00:00
1 2018-01-02 04:00:00 12:00:00
2 2018-01-03 05:00:00 17:00:00
with custom names
df['date'] = pd.to_datetime(df['date'])
(df.groupby(df['date'].dt.normalize().rename('date_daily'))
.agg(**{'earlier_hour': ('date', 'min'),
'later_hour': ('date', 'max')})
.apply(lambda s: s.dt.time)
.reset_index()
)
Output:
date_daily earlier_hour later_hour
0 2018-01-01 03:00:00 08:00:00
1 2018-01-02 04:00:00 12:00:00
2 2018-01-03 05:00:00 17:00:00
答案2
得分: 1
这是使用 GroupBy.apply
/strftime
的一种选项:
out = (
data.groupby(data["date"].dt.date)["date"]
.agg(["min", "max"]).apply(lambda s: s.dt.strftime("%H:%M:%S"))
.reset_index().set_axis(["date_daily", "earlier_hour", "latest_hour"], axis=1)
)
输出:
print(out)
date_daily earlier_hour latest_hour
0 2018-01-01 03:00:00 08:00:00
1 2018-01-02 04:00:00 12:00:00
2 2018-01-03 05:00:00 17:00:00
英文:
Here is one option with GroupBy.apply
/strftime
:
out = (
data.groupby(data["date"].dt.date)["date"]
.agg(["min", "max"]).apply(lambda s: s.dt.strftime("%H:%M:%S"))
.reset_index().set_axis(["date_daily", "earlier_hour", "latest_hour"], axis=1)
)
Output :
print(out)
date_daily earlier_hour latest_hour
0 2018-01-01 03:00:00 08:00:00
1 2018-01-02 04:00:00 12:00:00
2 2018-01-03 05:00:00 17:00:00
答案3
得分: 0
你可以在分组之前对日期进行排序,以提取所需时间的排名:
df['date'] = pd.to_datetime(df['date'])
# 用 0 表示较早的小时
# 用 1 表示第二早的小时
# 用 -1 表示最晚的小时
out = (df.sort_values('date').drop_duplicates('date')
.groupby(pd.Grouper(freq='D', key='date'))
.agg(earlier_hour=('date', lambda x: x.iloc[0].time()),
second_earlier_hour=('date', lambda x: x.iloc[1].time()),
latest_hour=('date', lambda x: x.iloc[-1].time()))
.reset_index())
输出:
>>> out
date earlier_hour second_earlier_hour latest_hour
0 2018-01-01 03:00:00 04:00:00 08:00:00
1 2018-01-02 04:00:00 06:00:00 12:00:00
2 2018-01-03 05:00:00 07:00:00 17:00:00
英文:
You can sort the date before grouping to extract the rank of the desired time:
df['date'] = pd.to_datetime(df['date'])
# Use 0 for the earlier hour
# Use 1 for the second earlier hour
# Use -1 for the latest hour
out = (df.sort_values('date').drop_duplicates('date')
.groupby(pd.Grouper(freq='D', key='date'))
.agg(earlier_hour=('date', lambda x: x.iloc[0].time()),
second_earlier_hour=('date', lambda x: x.iloc[1].time())
latest_hour=('date', lambda x: x.iloc[-1].time()))
.reset_index())
Output:
>>> out
date earlier_hour second_earlier_hour latest_hour
0 2018-01-01 03:00:00 04:00:00 08:00:00
1 2018-01-02 04:00:00 06:00:00 12:00:00
2 2018-01-03 05:00:00 07:00:00 17:00:00
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论