问题

我有一个包含以下列的DataFrame：'Id'、'Date' 和 'Number'。我需要按 Id 对数据进行分组，并按小时汇总 Number。此外，我需要在 Id 的最小和最大日期时间之间没有数据的小时也要出现。

所以对于这个示例数据集：

Id    Date                    Number
1     01-01-2022 00:00:00     1
1     01-01-2022 00:25:00     3
1     01-01-2022 01:00:10     1
2     01-01-2022 00:00:01     4
2     01-01-2022 03:01:01     2

我期望的结果是：

Id    Date                    Number
1     01-01-2022 00:00:00     4
1     01-01-2022 01:00:00     1
2     01-01-2022 00:00:00     4
2     01-01-2022 01:00:00     NaN
2     01-01-2022 02:00:00     NaN
2     01-01-2022 03:00:00     2

我尝试了使用 groupby 和 Grouper（如下所示），但结果缺少小时：

agg = {'Number': 'sum'} #示例
data = data.groupby(['Id', pd.Grouper(key='Date', freq='1H')]).agg(agg)

这似乎很简单，但我无法让它工作。我漏掉了什么？

英文:

I have a DataFrame with the following columns: 'Id', 'Date', and 'Number'. I need to group the data by Id and aggregate the Number hourly. Also, I need the hours with no data between the min and max datetimes of a Id to be present.

So for the toy dataset:

Id    Date                    Number
1     01-01-2022 00:00:00     1
1     01-01-2022 00:25:00     3
1     01-01-2022 01:00:10     1
2     01-01-2022 00:00:01     4
2     01-01-2022 03:01:01     2

I would get:

Id    Date                    Number
1     01-01-2022 00:00:00     4
1     01-01-2022 01:00:00     1
2     01-01-2022 00:00:00     4
2     01-01-2022 01:00:00     NaN
2     01-01-2022 02:00:00     NaN
2     01-01-2022 03:00:00     2

I tried it with groupby and Grouper (as shown below), but it results in missing hours.

agg = {&#39;Number&#39;: &#39;sum&#39;} #example
data = data.groupby([&#39;Id&#39;, pd.Grouper(key=&#39;Date&#39;, freq=&#39;1H&#39;)]).agg(agg)

It seems simple, but I cannot get it to work. What am I missing?

答案1

得分: 1

你可以使用.groupby().resample()，如果你将日期设置为索引：

df.set_index('Date').groupby('Id').resample('1h')['Number'].sum()

Id  Date               
1   2022-01-01 00:00:00    4
    2022-01-01 01:00:00    1
2   2022-01-01 00:00:00    4
    2022-01-01 01:00:00    0
    2022-01-01 02:00:00    0
    2022-01-01 03:00:00    2
Name: Number, dtype: int64

英文:

You can .groupby().resample() if you set the date as the index:

df.set_index(&#39;Date&#39;).groupby(&#39;Id&#39;).resample(&#39;1h&#39;)[&#39;Number&#39;].sum()

Id  Date               
1   2022-01-01 00:00:00    4
    2022-01-01 01:00:00    1
2   2022-01-01 00:00:00    4
    2022-01-01 01:00:00    0
    2022-01-01 02:00:00    0
    2022-01-01 03:00:00    2
Name: Number, dtype: int64

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用Pandas按id列和每小时的日期时间分组，处理缺失的小时数。

问题

答案1

Discord机器人具有签到和签退功能，但顺序不正确。

如何合并多个图表？

无法安装 YOLOX。

Pytest覆盖夹具的参数默认值

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论