如何在pandas DataFrame中获取每天的最早时间和最晚时间?

huangapple go评论58阅读模式
英文:

How to get the earlier hour and the latest hour in each day on a pandas Dataframe?

问题

我有一个数据框,看起来像这样:

data['date']

          date

0 2018-01-01 03:00:00
1 2018-01-01 04:00:00
2 2018-01-01 08:00:00
3 2018-01-02 04:00:00
4 2018-01-02 06:00:00
5 2018-01-02 12:00:00
6 2018-01-03 05:00:00
7 2018-01-03 07:00:00
8 2018-01-03 17:00:00

我想要创建两个新列,分别表示每天最早和最晚的小时,类似这样:

 date_daily     earlier_hour   latest_hour

0 2018-01-01 03:00:00 08:00:00
1 2018-01-02 04:00:00 12:00:00
2 2018-01-03 05:00:00 17:00:00

感谢您的帮助。

英文:

I have a DF that looks like that:

data['date']

           date
0   2018-01-01 03:00:00
1   2018-01-01 04:00:00
2   2018-01-01 08:00:00
3   2018-01-02 04:00:00
4   2018-01-02 06:00:00
5   2018-01-02 12:00:00
6   2018-01-03 05:00:00
7   2018-01-03 07:00:00
8   2018-01-03 17:00:00

I would like to create to new columns with the earlier and latest hour in each day, something like that:

     date_daily     earlier_hour   latest_hour
0   2018-01-01      03:00:00       08:00:00
1   2018-01-02      04:00:00       12:00:00
2   2018-01-03      05:00:00       17:00:00

Thanks for your help.

答案1

得分: 4

你可以使用自定义的 groupby.aggdt.normalize 来获取日期部分,然后使用 dt.time 来获取时间部分:

df['date'] = pd.to_datetime(df['date'])

(df.groupby(df['date'].dt.normalize())
   ['date'].agg(['min', 'max'])
   .apply(lambda s: s.dt.time)
   .reset_index()
)

输出:

        date       min       max
0 2018-01-01  03:00:00  08:00:00
1 2018-01-02  04:00:00  12:00:00
2 2018-01-03  05:00:00  17:00:00

自定义列名

df['date'] = pd.to_datetime(df['date'])

(df.groupby(df['date'].dt.normalize().rename('date_daily'))
   .agg(**{'earlier_hour': ('date', 'min'),
           'later_hour': ('date', 'max')})
   .apply(lambda s: s.dt.time)
   .reset_index()
)

输出:

  date_daily earlier_hour later_hour
0 2018-01-01     03:00:00   08:00:00
1 2018-01-02     04:00:00   12:00:00
2 2018-01-03     05:00:00   17:00:00
英文:

You can use a custom groupby.agg using dt.normalize to get the date only as grouper, then dt.time for the time component:

df['date'] = pd.to_datetime(df['date'])


(df.groupby(df['date'].dt.normalize())
   ['date'].agg(['min', 'max'])
   .apply(lambda s: s.dt.time)
   .reset_index()
 )

Output:

        date       min       max
0 2018-01-01  03:00:00  08:00:00
1 2018-01-02  04:00:00  12:00:00
2 2018-01-03  05:00:00  17:00:00

with custom names

df['date'] = pd.to_datetime(df['date'])


(df.groupby(df['date'].dt.normalize().rename('date_daily'))
   .agg(**{'earlier_hour': ('date', 'min'),
           'later_hour': ('date', 'max')})
   .apply(lambda s: s.dt.time)
   .reset_index()
 )

Output:

  date_daily earlier_hour later_hour
0 2018-01-01     03:00:00   08:00:00
1 2018-01-02     04:00:00   12:00:00
2 2018-01-03     05:00:00   17:00:00

答案2

得分: 1

这是使用 GroupBy.apply/strftime 的一种选项:

out = (
    data.groupby(data["date"].dt.date)["date"]
        .agg(["min", "max"]).apply(lambda s: s.dt.strftime("%H:%M:%S"))
        .reset_index().set_axis(["date_daily", "earlier_hour", "latest_hour"], axis=1)
)

输出:

print(out)
 
   date_daily earlier_hour latest_hour
0  2018-01-01     03:00:00    08:00:00
1  2018-01-02     04:00:00    12:00:00
2  2018-01-03     05:00:00    17:00:00
英文:

Here is one option with GroupBy.apply/strftime :

out = (
    data.groupby(data["date"].dt.date)["date"]
        .agg(["min", "max"]).apply(lambda s: s.dt.strftime("%H:%M:%S"))
        .reset_index().set_axis(["date_daily", "earlier_hour", "latest_hour"], axis=1)
)

Output :

print(out)

   date_daily earlier_hour latest_hour
0  2018-01-01     03:00:00    08:00:00
1  2018-01-02     04:00:00    12:00:00
2  2018-01-03     05:00:00    17:00:00

答案3

得分: 0

你可以在分组之前对日期进行排序,以提取所需时间的排名:

df['date'] = pd.to_datetime(df['date'])

# 用 0 表示较早的小时
# 用 1 表示第二早的小时
# 用 -1 表示最晚的小时
out = (df.sort_values('date').drop_duplicates('date')
         .groupby(pd.Grouper(freq='D', key='date'))
         .agg(earlier_hour=('date', lambda x: x.iloc[0].time()),
              second_earlier_hour=('date', lambda x: x.iloc[1].time()),
              latest_hour=('date', lambda x: x.iloc[-1].time()))
         .reset_index())

输出:

>>> out
        date earlier_hour second_earlier_hour latest_hour
0 2018-01-01     03:00:00            04:00:00    08:00:00
1 2018-01-02     04:00:00            06:00:00    12:00:00
2 2018-01-03     05:00:00            07:00:00    17:00:00
英文:

You can sort the date before grouping to extract the rank of the desired time:

df['date'] = pd.to_datetime(df['date'])

# Use 0 for the earlier hour
# Use 1 for the second earlier hour
# Use -1 for the latest hour
out = (df.sort_values('date').drop_duplicates('date')
         .groupby(pd.Grouper(freq='D', key='date'))
         .agg(earlier_hour=('date', lambda x: x.iloc[0].time()),
              second_earlier_hour=('date', lambda x: x.iloc[1].time())
              latest_hour=('date', lambda x: x.iloc[-1].time()))
         .reset_index())

Output:

>>> out
        date earlier_hour second_earlier_hour latest_hour
0 2018-01-01     03:00:00            04:00:00    08:00:00
1 2018-01-02     04:00:00            06:00:00    12:00:00
2 2018-01-03     05:00:00            07:00:00    17:00:00

huangapple
  • 本文由 发表于 2023年5月11日 04:14:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76222247.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定