英文:
Pandas Group by id column and hourly datetime with missing hours
问题
我有一个包含以下列的DataFrame:'Id'、'Date' 和 'Number'。我需要按 Id 对数据进行分组,并按小时汇总 Number。此外,我需要在 Id 的最小和最大日期时间之间没有数据的小时也要出现。
所以对于这个示例数据集:
Id Date Number
1 01-01-2022 00:00:00 1
1 01-01-2022 00:25:00 3
1 01-01-2022 01:00:10 1
2 01-01-2022 00:00:01 4
2 01-01-2022 03:01:01 2
我期望的结果是:
Id Date Number
1 01-01-2022 00:00:00 4
1 01-01-2022 01:00:00 1
2 01-01-2022 00:00:00 4
2 01-01-2022 01:00:00 NaN
2 01-01-2022 02:00:00 NaN
2 01-01-2022 03:00:00 2
我尝试了使用 groupby 和 Grouper(如下所示),但结果缺少小时:
agg = {'Number': 'sum'} #示例
data = data.groupby(['Id', pd.Grouper(key='Date', freq='1H')]).agg(agg)
这似乎很简单,但我无法让它工作。我漏掉了什么?
英文:
I have a DataFrame with the following columns: 'Id', 'Date', and 'Number'. I need to group the data by Id and aggregate the Number hourly. Also, I need the hours with no data between the min and max datetimes of a Id to be present.
So for the toy dataset:
Id Date Number
1 01-01-2022 00:00:00 1
1 01-01-2022 00:25:00 3
1 01-01-2022 01:00:10 1
2 01-01-2022 00:00:01 4
2 01-01-2022 03:01:01 2
I would get:
Id Date Number
1 01-01-2022 00:00:00 4
1 01-01-2022 01:00:00 1
2 01-01-2022 00:00:00 4
2 01-01-2022 01:00:00 NaN
2 01-01-2022 02:00:00 NaN
2 01-01-2022 03:00:00 2
I tried it with groupby and Grouper (as shown below), but it results in missing hours.
agg = {'Number': 'sum'} #example
data = data.groupby(['Id', pd.Grouper(key='Date', freq='1H')]).agg(agg)
It seems simple, but I cannot get it to work. What am I missing?
答案1
得分: 1
你可以使用.groupby().resample()
,如果你将日期设置为索引:
df.set_index('Date').groupby('Id').resample('1h')['Number'].sum()
Id Date
1 2022-01-01 00:00:00 4
2022-01-01 01:00:00 1
2 2022-01-01 00:00:00 4
2022-01-01 01:00:00 0
2022-01-01 02:00:00 0
2022-01-01 03:00:00 2
Name: Number, dtype: int64
英文:
You can .groupby().resample()
if you set the date as the index:
df.set_index('Date').groupby('Id').resample('1h')['Number'].sum()
Id Date
1 2022-01-01 00:00:00 4
2022-01-01 01:00:00 1
2 2022-01-01 00:00:00 4
2022-01-01 01:00:00 0
2022-01-01 02:00:00 0
2022-01-01 03:00:00 2
Name: Number, dtype: int64
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论