2023年5月11日 04:14:57go评论96阅读模式

英文:

How to get the earlier hour and the latest hour in each day on a pandas Dataframe?

问题

我有一个数据框，看起来像这样：

data['date']

          date

0 2018-01-01 03:00:00
1 2018-01-01 04:00:00
2 2018-01-01 08:00:00
3 2018-01-02 04:00:00
4 2018-01-02 06:00:00
5 2018-01-02 12:00:00
6 2018-01-03 05:00:00
7 2018-01-03 07:00:00
8 2018-01-03 17:00:00

我想要创建两个新列，分别表示每天最早和最晚的小时，类似这样：

 date_daily     earlier_hour   latest_hour

0 2018-01-01 03:00:00 08:00:00
1 2018-01-02 04:00:00 12:00:00
2 2018-01-03 05:00:00 17:00:00

感谢您的帮助。

英文:

I have a DF that looks like that:

data['date']

           date
0   2018-01-01 03:00:00
1   2018-01-01 04:00:00
2   2018-01-01 08:00:00
3   2018-01-02 04:00:00
4   2018-01-02 06:00:00
5   2018-01-02 12:00:00
6   2018-01-03 05:00:00
7   2018-01-03 07:00:00
8   2018-01-03 17:00:00

I would like to create to new columns with the earlier and latest hour in each day, something like that:

     date_daily     earlier_hour   latest_hour
0   2018-01-01      03:00:00       08:00:00
1   2018-01-02      04:00:00       12:00:00
2   2018-01-03      05:00:00       17:00:00

Thanks for your help.

答案1

得分: 4

你可以使用自定义的 groupby.agg 与 dt.normalize 来获取日期部分，然后使用 dt.time 来获取时间部分：

df['date'] = pd.to_datetime(df['date'])
(df.groupby(df['date'].dt.normalize())
   ['date'].agg(['min', 'max'])
   .apply(lambda s: s.dt.time)
   .reset_index()
)

输出：

        date       min       max
0 2018-01-01  03:00:00  08:00:00
1 2018-01-02  04:00:00  12:00:00
2 2018-01-03  05:00:00  17:00:00

自定义列名

df['date'] = pd.to_datetime(df['date'])
(df.groupby(df['date'].dt.normalize().rename('date_daily'))
   .agg(**{'earlier_hour': ('date', 'min'),
           'later_hour': ('date', 'max')})
   .apply(lambda s: s.dt.time)
   .reset_index()
)

输出：

  date_daily earlier_hour later_hour
0 2018-01-01     03:00:00   08:00:00
1 2018-01-02     04:00:00   12:00:00
2 2018-01-03     05:00:00   17:00:00

英文:

You can use a custom groupby.agg using dt.normalize to get the date only as grouper, then dt.time for the time component:

df[&#39;date&#39;] = pd.to_datetime(df[&#39;date&#39;])
(df.groupby(df[&#39;date&#39;].dt.normalize())
   [&#39;date&#39;].agg([&#39;min&#39;, &#39;max&#39;])
   .apply(lambda s: s.dt.time)
   .reset_index()
 )

Output:

        date       min       max
0 2018-01-01  03:00:00  08:00:00
1 2018-01-02  04:00:00  12:00:00
2 2018-01-03  05:00:00  17:00:00

with custom names

df[&#39;date&#39;] = pd.to_datetime(df[&#39;date&#39;])
(df.groupby(df[&#39;date&#39;].dt.normalize().rename(&#39;date_daily&#39;))
   .agg(**{&#39;earlier_hour&#39;: (&#39;date&#39;, &#39;min&#39;),
           &#39;later_hour&#39;: (&#39;date&#39;, &#39;max&#39;)})
   .apply(lambda s: s.dt.time)
   .reset_index()
 )

Output:

  date_daily earlier_hour later_hour
0 2018-01-01     03:00:00   08:00:00
1 2018-01-02     04:00:00   12:00:00
2 2018-01-03     05:00:00   17:00:00

答案2

得分: 1

这是使用 GroupBy.apply/strftime 的一种选项：

out = (
    data.groupby(data["date"].dt.date)["date"]
        .agg(["min", "max"]).apply(lambda s: s.dt.strftime("%H:%M:%S"))
        .reset_index().set_axis(["date_daily", "earlier_hour", "latest_hour"], axis=1)
)

输出：

print(out)
 
   date_daily earlier_hour latest_hour
0  2018-01-01     03:00:00    08:00:00
1  2018-01-02     04:00:00    12:00:00
2  2018-01-03     05:00:00    17:00:00

英文:

Here is one option with GroupBy.apply/strftime :

out = (
    data.groupby(data[&quot;date&quot;].dt.date)[&quot;date&quot;]
        .agg([&quot;min&quot;, &quot;max&quot;]).apply(lambda s: s.dt.strftime(&quot;%H:%M:%S&quot;))
        .reset_index().set_axis([&quot;date_daily&quot;, &quot;earlier_hour&quot;, &quot;latest_hour&quot;], axis=1)
)

Output :

print(out)
   date_daily earlier_hour latest_hour
0  2018-01-01     03:00:00    08:00:00
1  2018-01-02     04:00:00    12:00:00
2  2018-01-03     05:00:00    17:00:00

答案3

得分: 0

你可以在分组之前对日期进行排序，以提取所需时间的排名：

df['date'] = pd.to_datetime(df['date'])
# 用 0 表示较早的小时
# 用 1 表示第二早的小时
# 用 -1 表示最晚的小时
out = (df.sort_values('date').drop_duplicates('date')
         .groupby(pd.Grouper(freq='D', key='date'))
         .agg(earlier_hour=('date', lambda x: x.iloc[0].time()),
              second_earlier_hour=('date', lambda x: x.iloc[1].time()),
              latest_hour=('date', lambda x: x.iloc[-1].time()))
         .reset_index())

输出：

>>> out
        date earlier_hour second_earlier_hour latest_hour
0 2018-01-01     03:00:00            04:00:00    08:00:00
1 2018-01-02     04:00:00            06:00:00    12:00:00
2 2018-01-03     05:00:00            07:00:00    17:00:00

英文:

You can sort the date before grouping to extract the rank of the desired time:

df[&#39;date&#39;] = pd.to_datetime(df[&#39;date&#39;])
# Use 0 for the earlier hour
# Use 1 for the second earlier hour
# Use -1 for the latest hour
out = (df.sort_values(&#39;date&#39;).drop_duplicates(&#39;date&#39;)
         .groupby(pd.Grouper(freq=&#39;D&#39;, key=&#39;date&#39;))
         .agg(earlier_hour=(&#39;date&#39;, lambda x: x.iloc[0].time()),
              second_earlier_hour=(&#39;date&#39;, lambda x: x.iloc[1].time())
              latest_hour=(&#39;date&#39;, lambda x: x.iloc[-1].time()))
         .reset_index())

Output:

&gt;&gt;&gt; out
        date earlier_hour second_earlier_hour latest_hour
0 2018-01-01     03:00:00            04:00:00    08:00:00
1 2018-01-02     04:00:00            06:00:00    12:00:00
2 2018-01-03     05:00:00            07:00:00    17:00:00

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在pandas DataFrame中获取每天的最早时间和最晚时间？

问题

答案1

自定义列名

with custom names

答案2

答案3

在Python 3.X中，是否可能使用FTP传输文件夹？

如何正确使用boto3的if语句

在`init`文件内模拟环境变量。

如何仅输出主要高速公路，而不是当地道路？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论