为什么pandas中的`mean`在处理Series时有效,但在处理GroupBy对象时无效?

huangapple go评论174阅读模式
英文:

Why does pandas mean, on datetime, work on a series but not on a groupby object

问题

我正在尝试按组计算日期的均值。

  1. import pandas as pd
  2. df = pd.DataFrame({'Id': ['A', 'A', 'B', 'B'],
  3. 'Date': [pd.datetime(2000, 12, 31), pd.datetime(2002, 12, 31),
  4. pd.datetime(2000, 6, 30), pd.datetime(2002, 6, 30)]})

这一直是一个让人头疼的问题,所以我很高兴地了解到这似乎在pandas 0.25中已经修复了 https://stackoverflow.com/questions/27907902/datetime-objects-with-pandas-mean-function

  1. df['Date'].mean()
  2. Out[45]: Timestamp('2001-09-30 00:00:00') # 这个可以工作

然而,使用groupby无法做到这一点。

  1. df.groupby('Id')['Date'].mean()
  2. Traceback (most recent call last):
  3. File "<ipython-input-46-5fae5ffac6c6>", line 1, in <module>
  4. df.groupby('Id')['Date'].mean()
  5. File "C:\Users\xxx\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 1205, in mean
  6. "mean", alt=lambda x, axis: Series(x).mean(**kwargs), **kwargs
  7. File "C:\Users\xxx\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 888, in _cython_agg_general
  8. raise DataError("No numeric types to aggregate")
  9. DataError: No numeric types to aggregate
  10. 发生了什么情况是否有一个简单的解决方法
  11. <details>
  12. <summary>英文:</summary>
  13. I am trying to take the mean of dates in by groups.
  14. import pandas as pd
  15. df = pd.DataFrame({&#39;Id&#39;: [&#39;A&#39;, &#39;A&#39;, &#39;B&#39;, &#39;B&#39;],
  16. &#39;Date&#39;: [pd.datetime(2000, 12, 31), pd.datetime(2002, 12, 31),
  17. pd.datetime(2000, 6, 30), pd.datetime(2002, 6, 30)]})
  18. This has always been a pain to do, so I was pleased to learn that this had apparntly been fixed in pandas 0.25 https://stackoverflow.com/questions/27907902/datetime-objects-with-pandas-mean-function.
  19. df[&#39;Date&#39;].mean()
  20. Out[45]: Timestamp(&#39;2001-09-30 00:00:00&#39;) # This works
  21. However, this cant be done using &#180;groupby&#180;
  22. df.groupby(&#39;Id&#39;)[&#39;Date&#39;].mean()
  23. Traceback (most recent call last):
  24. File &quot;&lt;ipython-input-46-5fae5ffac6c6&gt;&quot;, line 1, in &lt;module&gt;
  25. df.groupby(&#39;Id&#39;)[&#39;Date&#39;].mean()
  26. File &quot;C:\Users\xxx\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py&quot;, line 1205, in mean
  27. &quot;mean&quot;, alt=lambda x, axis: Series(x).mean(**kwargs), **kwargs
  28. File &quot;C:\Users\xxx\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py&quot;, line 888, in _cython_agg_general
  29. raise DataError(&quot;No numeric types to aggregate&quot;)
  30. DataError: No numeric types to aggregate
  31. What is going on here, and is there an easy workaround?
  32. </details>
  33. # 答案1
  34. **得分**: 2
  35. 使用lambda函数与[`GroupBy.agg`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.agg.html)或[`GroupBy.apply`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.apply.html)
  36. ```python
  37. print(df.groupby('Id')['Date'].agg(lambda x: x.mean()))
  38. print(df.groupby('Id')['Date'].agg(pd.Series.mean))
  39. print(df.groupby('Id')['Date'].apply(lambda x: x.mean()))
  40. print(df.groupby('Id')['Date'].apply(pd.Series.mean))
  41. Id
  42. A 2001-12-31
  43. B 2001-06-30
  44. Name: Date, dtype: datetime64[ns]

区别在于如果有多个列:

  1. df = pd.DataFrame({'Id': ['A', 'A', 'B', 'B'],
  2. 'Date': [pd.datetime(2000, 12, 31), pd.datetime(2002, 12, 31),
  3. pd.datetime(2000, 6, 30), pd.datetime(2002, 6, 30)]})
  4. df['Date1'] = df['Date']
  5. print(df.groupby('Id').agg(lambda x: x.mean()))
  6. Date Date1
  7. Id
  8. A 2001-12-31 2001-12-31
  9. B 2001-06-30 2001-06-30
  10. print(df.groupby('Id').agg(pd.Series.mean))
  11. Date Date1
  12. Id
  13. A 2001-12-31 2001-12-31
  14. B 2001-06-30 2001-06-30
  15. print(df.groupby('Id').apply(lambda x: x.mean()))
  16. Empty DataFrame
  17. Columns: []
  18. Index: []
  19. print(df.groupby('Id').apply(pd.Series.mean))
  20. Empty DataFrame
  21. Columns: []
  22. Index: []

为什么pandas中的datetime上的mean在Series上有效,但在groupby对象上无效?

一段时间以前,对于Series、Datetime的mean存在问题,可以查看这里,所以在pandas的一些未来版本中,这个问题可能已经解决。

英文:

Use lambda function with GroupBy.agg or GroupBy.apply:

  1. print (df.groupby(&#39;Id&#39;)[&#39;Date&#39;].agg(lambda x: x.mean()))
  2. print (df.groupby(&#39;Id&#39;)[&#39;Date&#39;].agg(pd.Series.mean))
  3. print (df.groupby(&#39;Id&#39;)[&#39;Date&#39;].apply(lambda x: x.mean()))
  4. print (df.groupby(&#39;Id&#39;)[&#39;Date&#39;].apply(pd.Series.mean))
  5. Id
  6. A 2001-12-31
  7. B 2001-06-30
  8. Name: Date, dtype: datetime64[ns]

Difference is if multiple columns:

  1. df = pd.DataFrame({&#39;Id&#39;: [&#39;A&#39;, &#39;A&#39;, &#39;B&#39;, &#39;B&#39;],
  2. &#39;Date&#39;: [pd.datetime(2000, 12, 31), pd.datetime(2002, 12, 31),
  3. pd.datetime(2000, 6, 30), pd.datetime(2002, 6, 30)]})
  4. df[&#39;Date1&#39;] = df[&#39;Date&#39;]
  5. print (df.groupby(&#39;Id&#39;).agg(lambda x: x.mean()))
  6. Date Date1
  7. Id
  8. A 2001-12-31 2001-12-31
  9. B 2001-06-30 2001-06-30
  10. print (df.groupby(&#39;Id&#39;).agg(pd.Series.mean))
  11. Date Date1
  12. Id
  13. A 2001-12-31 2001-12-31
  14. B 2001-06-30 2001-06-30
  15. print (df.groupby(&#39;Id&#39;).apply(lambda x: x.mean()))
  16. Empty DataFrame
  17. Columns: []
  18. Index: []
  19. print (df.groupby(&#39;Id&#39;).apply(pd.Series.mean))
  20. Empty DataFrame
  21. Columns: []
  22. Index: []

>Why does pandas mean, on datetime, work on a series but not on a groupby object

Some time ago it was problem with mean for Series, Datetimes, check this, so possible in some next versions of pandas this should working well.

huangapple
  • 本文由 发表于 2020年1月6日 19:05:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/59610966.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定