如何在pandas DataFrame中获取每天的最早时间和最晚时间?

huangapple go评论96阅读模式
英文:

How to get the earlier hour and the latest hour in each day on a pandas Dataframe?

问题

我有一个数据框,看起来像这样:

data['date']

  1. date

0 2018-01-01 03:00:00
1 2018-01-01 04:00:00
2 2018-01-01 08:00:00
3 2018-01-02 04:00:00
4 2018-01-02 06:00:00
5 2018-01-02 12:00:00
6 2018-01-03 05:00:00
7 2018-01-03 07:00:00
8 2018-01-03 17:00:00

我想要创建两个新列,分别表示每天最早和最晚的小时,类似这样:

  1. date_daily earlier_hour latest_hour

0 2018-01-01 03:00:00 08:00:00
1 2018-01-02 04:00:00 12:00:00
2 2018-01-03 05:00:00 17:00:00

感谢您的帮助。

英文:

I have a DF that looks like that:

data['date']

  1. date
  2. 0 2018-01-01 03:00:00
  3. 1 2018-01-01 04:00:00
  4. 2 2018-01-01 08:00:00
  5. 3 2018-01-02 04:00:00
  6. 4 2018-01-02 06:00:00
  7. 5 2018-01-02 12:00:00
  8. 6 2018-01-03 05:00:00
  9. 7 2018-01-03 07:00:00
  10. 8 2018-01-03 17:00:00

I would like to create to new columns with the earlier and latest hour in each day, something like that:

  1. date_daily earlier_hour latest_hour
  2. 0 2018-01-01 03:00:00 08:00:00
  3. 1 2018-01-02 04:00:00 12:00:00
  4. 2 2018-01-03 05:00:00 17:00:00

Thanks for your help.

答案1

得分: 4

你可以使用自定义的 groupby.aggdt.normalize 来获取日期部分,然后使用 dt.time 来获取时间部分:

  1. df['date'] = pd.to_datetime(df['date'])
  2. (df.groupby(df['date'].dt.normalize())
  3. ['date'].agg(['min', 'max'])
  4. .apply(lambda s: s.dt.time)
  5. .reset_index()
  6. )

输出:

  1. date min max
  2. 0 2018-01-01 03:00:00 08:00:00
  3. 1 2018-01-02 04:00:00 12:00:00
  4. 2 2018-01-03 05:00:00 17:00:00

自定义列名

  1. df['date'] = pd.to_datetime(df['date'])
  2. (df.groupby(df['date'].dt.normalize().rename('date_daily'))
  3. .agg(**{'earlier_hour': ('date', 'min'),
  4. 'later_hour': ('date', 'max')})
  5. .apply(lambda s: s.dt.time)
  6. .reset_index()
  7. )

输出:

  1. date_daily earlier_hour later_hour
  2. 0 2018-01-01 03:00:00 08:00:00
  3. 1 2018-01-02 04:00:00 12:00:00
  4. 2 2018-01-03 05:00:00 17:00:00
英文:

You can use a custom groupby.agg using dt.normalize to get the date only as grouper, then dt.time for the time component:

  1. df['date'] = pd.to_datetime(df['date'])
  2. (df.groupby(df['date'].dt.normalize())
  3. ['date'].agg(['min', 'max'])
  4. .apply(lambda s: s.dt.time)
  5. .reset_index()
  6. )

Output:

  1. date min max
  2. 0 2018-01-01 03:00:00 08:00:00
  3. 1 2018-01-02 04:00:00 12:00:00
  4. 2 2018-01-03 05:00:00 17:00:00

with custom names

  1. df['date'] = pd.to_datetime(df['date'])
  2. (df.groupby(df['date'].dt.normalize().rename('date_daily'))
  3. .agg(**{'earlier_hour': ('date', 'min'),
  4. 'later_hour': ('date', 'max')})
  5. .apply(lambda s: s.dt.time)
  6. .reset_index()
  7. )

Output:

  1. date_daily earlier_hour later_hour
  2. 0 2018-01-01 03:00:00 08:00:00
  3. 1 2018-01-02 04:00:00 12:00:00
  4. 2 2018-01-03 05:00:00 17:00:00

答案2

得分: 1

这是使用 GroupBy.apply/strftime 的一种选项:

  1. out = (
  2. data.groupby(data["date"].dt.date)["date"]
  3. .agg(["min", "max"]).apply(lambda s: s.dt.strftime("%H:%M:%S"))
  4. .reset_index().set_axis(["date_daily", "earlier_hour", "latest_hour"], axis=1)
  5. )

输出:

  1. print(out)
  2. date_daily earlier_hour latest_hour
  3. 0 2018-01-01 03:00:00 08:00:00
  4. 1 2018-01-02 04:00:00 12:00:00
  5. 2 2018-01-03 05:00:00 17:00:00
英文:

Here is one option with GroupBy.apply/strftime :

  1. out = (
  2. data.groupby(data["date"].dt.date)["date"]
  3. .agg(["min", "max"]).apply(lambda s: s.dt.strftime("%H:%M:%S"))
  4. .reset_index().set_axis(["date_daily", "earlier_hour", "latest_hour"], axis=1)
  5. )

Output :

  1. print(out)
  2. date_daily earlier_hour latest_hour
  3. 0 2018-01-01 03:00:00 08:00:00
  4. 1 2018-01-02 04:00:00 12:00:00
  5. 2 2018-01-03 05:00:00 17:00:00

答案3

得分: 0

你可以在分组之前对日期进行排序,以提取所需时间的排名:

  1. df['date'] = pd.to_datetime(df['date'])
  2. # 用 0 表示较早的小时
  3. # 用 1 表示第二早的小时
  4. # 用 -1 表示最晚的小时
  5. out = (df.sort_values('date').drop_duplicates('date')
  6. .groupby(pd.Grouper(freq='D', key='date'))
  7. .agg(earlier_hour=('date', lambda x: x.iloc[0].time()),
  8. second_earlier_hour=('date', lambda x: x.iloc[1].time()),
  9. latest_hour=('date', lambda x: x.iloc[-1].time()))
  10. .reset_index())

输出:

  1. >>> out
  2. date earlier_hour second_earlier_hour latest_hour
  3. 0 2018-01-01 03:00:00 04:00:00 08:00:00
  4. 1 2018-01-02 04:00:00 06:00:00 12:00:00
  5. 2 2018-01-03 05:00:00 07:00:00 17:00:00
英文:

You can sort the date before grouping to extract the rank of the desired time:

  1. df['date'] = pd.to_datetime(df['date'])
  2. # Use 0 for the earlier hour
  3. # Use 1 for the second earlier hour
  4. # Use -1 for the latest hour
  5. out = (df.sort_values('date').drop_duplicates('date')
  6. .groupby(pd.Grouper(freq='D', key='date'))
  7. .agg(earlier_hour=('date', lambda x: x.iloc[0].time()),
  8. second_earlier_hour=('date', lambda x: x.iloc[1].time())
  9. latest_hour=('date', lambda x: x.iloc[-1].time()))
  10. .reset_index())

Output:

  1. >>> out
  2. date earlier_hour second_earlier_hour latest_hour
  3. 0 2018-01-01 03:00:00 04:00:00 08:00:00
  4. 1 2018-01-02 04:00:00 06:00:00 12:00:00
  5. 2 2018-01-03 05:00:00 07:00:00 17:00:00

huangapple
  • 本文由 发表于 2023年5月11日 04:14:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76222247.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定