Python使用groupby对DataFrame进行分组并对datetime[ns]列求和时返回NaN。

huangapple go评论71阅读模式
英文:

Python groupby dataframe and sum datetime[ns] returns NaN

问题

我有一个包含不同ID和相应日期时间值的数据框。我想按ID分组并对所有日期时间值求和,但我遇到了几个问题。要么groupby返回NaN,要么不允许我对datetime[ns]求和。

这是我的数据框:

id date
1 2023-04-13 01:18:11
2 2023-04-10 10:25:03
1 2023-03-31 10:30:44
3 2023-04-04 05:51:33
4 2023-05-01 23:43:34
5 2023-04-19 09:05:39
2 2023-04-12 09:56:50
6 2023-04-28 12:49:23
3 2023-04-17 08:10:34
data = {'id': [1,2,1,3,4,5,2,6,3], 'date': ['2023-04-13T01:18:11.000000000', '2023-04-10T10:25:03.000000000',
       '2023-03-31T10:30:44.000000000', '2023-04-04T05:51:33.000000000',
       '2023-05-01T23:43:34.000000000', '2023-04-19T09:05:39.000000000',
       '2023-04-12T09:56:50.000000000', '2023-04-28T12:49:23.000000000','2023-04-17T08:10:34.000000000']}
df = pd.DataFrame(data)
df.date = df.date.astype('datetime64[ns]')

我尝试了按ID分组,但它会得到NaN:
grouped_df = df.groupby("id", as_index=False)["date"].sum()

id date
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
英文:

I have a dataframe with different id's and corresponding datetime values. I would like to group by the id and sum all the date time values but I run into several problems. Either the groupby returns NaN or it does not allow me to sum datetime[ns].

This is my dataframe:

id date
1 2023-04-13 01:18:11
2 2023-04-10 10:25:03
1 2023-03-31 10:30:44
3 2023-04-04 05:51:33
4 2023-05-01 23:43:34
5 2023-04-19 09:05:39
2 2023-04-12 09:56:50
6 2023-04-28 12:49:23
3 2023-04-17 08:10:34
data = {'id': [1,2,1,3,4,5,2,6,3], 'date': ['2023-04-13T01:18:11.000000000', '2023-04-10T10:25:03.000000000',
       '2023-03-31T10:30:44.000000000', '2023-04-04T05:51:33.000000000',
       '2023-05-01T23:43:34.000000000', '2023-04-19T09:05:39.000000000',
       '2023-04-12T09:56:50.000000000', '2023-04-28T12:49:23.000000000','2023-04-17T08:10:34.000000000']}
df = pd.DataFrame(data)
df.date = df.date.astype('datetime64[ns]')

I have tried to groupby the id but it sums to NaN:
grouped_df = df.groupby("id", as_index=False)["date"].sum()

id date
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN

答案1

得分: 1

如果您想要计算日期时间之间的差异总和,您需要计算它们之间的差异。

df.date.diff().sum()

这样,它将计算第二个日期和第一个日期之间的差异,第三个日期和第二个日期之间的差异...以此类推。然后您可以对其进行求和。

Pandas diff()

由于您想要按id对整个df进行分组。

df.groupby('id', as_index=False)['date'].diff().sum()

这将打印一个单独的短语

Timedelta('2 days 11:03:21')

如果您想要使用它,您需要解析它。

英文:

If you want to sum the difference between datetimes then you need to calculate the difference between them.

df.date.diff().sum()

In this way, it will calculate the difference between the second and the first, the third and the second... and so on. And then you can sum it.

Pandas diff()

Since you wanted to groupby the whole df by id.

df.groupby('id', as_index=False)['date'].diff().sum()

This will print a single phrase

> Timedelta('2 days 11:03:21')

If you want to use it you need to parse it.

huangapple
  • 本文由 发表于 2023年5月26日 15:05:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/76338377.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定