英文:
Python groupby dataframe and sum datetime[ns] returns NaN
问题
我有一个包含不同ID和相应日期时间值的数据框。我想按ID分组并对所有日期时间值求和,但我遇到了几个问题。要么groupby返回NaN,要么不允许我对datetime[ns]求和。
这是我的数据框:
id | date |
---|---|
1 | 2023-04-13 01:18:11 |
2 | 2023-04-10 10:25:03 |
1 | 2023-03-31 10:30:44 |
3 | 2023-04-04 05:51:33 |
4 | 2023-05-01 23:43:34 |
5 | 2023-04-19 09:05:39 |
2 | 2023-04-12 09:56:50 |
6 | 2023-04-28 12:49:23 |
3 | 2023-04-17 08:10:34 |
data = {'id': [1,2,1,3,4,5,2,6,3], 'date': ['2023-04-13T01:18:11.000000000', '2023-04-10T10:25:03.000000000',
'2023-03-31T10:30:44.000000000', '2023-04-04T05:51:33.000000000',
'2023-05-01T23:43:34.000000000', '2023-04-19T09:05:39.000000000',
'2023-04-12T09:56:50.000000000', '2023-04-28T12:49:23.000000000','2023-04-17T08:10:34.000000000']}
df = pd.DataFrame(data)
df.date = df.date.astype('datetime64[ns]')
我尝试了按ID分组,但它会得到NaN:
grouped_df = df.groupby("id", as_index=False)["date"].sum()
id | date |
---|---|
1 | NaN |
2 | NaN |
3 | NaN |
4 | NaN |
5 | NaN |
6 | NaN |
英文:
I have a dataframe with different id's and corresponding datetime values. I would like to group by the id and sum all the date time values but I run into several problems. Either the groupby returns NaN or it does not allow me to sum datetime[ns].
This is my dataframe:
id | date |
---|---|
1 | 2023-04-13 01:18:11 |
2 | 2023-04-10 10:25:03 |
1 | 2023-03-31 10:30:44 |
3 | 2023-04-04 05:51:33 |
4 | 2023-05-01 23:43:34 |
5 | 2023-04-19 09:05:39 |
2 | 2023-04-12 09:56:50 |
6 | 2023-04-28 12:49:23 |
3 | 2023-04-17 08:10:34 |
data = {'id': [1,2,1,3,4,5,2,6,3], 'date': ['2023-04-13T01:18:11.000000000', '2023-04-10T10:25:03.000000000',
'2023-03-31T10:30:44.000000000', '2023-04-04T05:51:33.000000000',
'2023-05-01T23:43:34.000000000', '2023-04-19T09:05:39.000000000',
'2023-04-12T09:56:50.000000000', '2023-04-28T12:49:23.000000000','2023-04-17T08:10:34.000000000']}
df = pd.DataFrame(data)
df.date = df.date.astype('datetime64[ns]')
I have tried to groupby the id but it sums to NaN:
grouped_df = df.groupby("id", as_index=False)["date"].sum()
id | date |
---|---|
1 | NaN |
2 | NaN |
3 | NaN |
4 | NaN |
5 | NaN |
6 | NaN |
答案1
得分: 1
如果您想要计算日期时间之间的差异总和,您需要计算它们之间的差异。
df.date.diff().sum()
这样,它将计算第二个日期和第一个日期之间的差异,第三个日期和第二个日期之间的差异...以此类推。然后您可以对其进行求和。
由于您想要按id对整个df进行分组。
df.groupby('id', as_index=False)['date'].diff().sum()
这将打印一个单独的短语
Timedelta('2 days 11:03:21')
如果您想要使用它,您需要解析它。
英文:
If you want to sum the difference between datetimes then you need to calculate the difference between them.
df.date.diff().sum()
In this way, it will calculate the difference between the second and the first, the third and the second... and so on. And then you can sum it.
Since you wanted to groupby the whole df by id.
df.groupby('id', as_index=False)['date'].diff().sum()
This will print a single phrase
> Timedelta('2 days 11:03:21')
If you want to use it you need to parse it.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论