How do I prevent 'NotImplementedError' and 'TypeError' when using numeric aggregate functions in Pandas pivot tables with string columns?

huangapple go评论117阅读模式
英文:

How do I prevent 'NotImplementedError' and 'TypeError' when using numeric aggregate functions in Pandas pivot tables with string columns?

问题

我已经多次尝试在数值数据上使用 pandas 执行一些数值聚合方法。然而,我收到了一个 NotImplementedError,随后会引发 TypeError。每当我这样做时,我猜测 pandas 在执行这些数值任务时拒绝忽略字符串列。我该如何防止这种情况发生?

给定一个名为 matrix_data 的透视表,且已将 pandas 导入为 pan

  Account Number  Company      Contact Account Manager     Product  Licenses   
0         2123398   Google  Larry Pager    Edward Thorp   Analytics       150  
1         2123398   Google  Larry Pager    Edward Thorp  Prediction       150   
2         2123398   Google  Larry Pager    Edward Thorp    Tracking       300   
3         2192650     BOBO  Larry Pager    Edward Thorp   Analytics       150   
4          420496     IKEA    Elon Tusk    Edward Thorp   Analytics       300   

   Sale Price        Status  
0     2100000     Presented  
1      700000     Presented  
2      350000  Under Review  
3     2450000          Lost  
4     4550000           Won  

尝试通过公司对所有数值值进行聚合:

pan.pivot_table(matrix_data, index="Company", aggfunc="mean")

会引发如下异常:

NotImplementedError                       Traceback (most recent call last)
File ~\AppData\Roaming\Python\Python311\site-packages\pandas\core\groupby\groupby.py:1490, in GroupBy._cython_agg_general..array_func(values)
   1489 try:
-> 1490     result = self.grouper._cython_operation(
   1491         "aggregate",
   1492         values,
   1493         how,
   1494         axis=data.ndim - 1,
   1495         min_count=min_count,
   1496         **kwargs,
   1497     )
   1498 except NotImplementedError:
   1499     # generally if we have numeric_only=False
   1500     # and non-applicable functions
...
   1698             # e.g. "foo"
-> 1699             raise TypeError(f"Could not convert {x} to numeric") from err
   1700 return x

TypeError: Could not convert Larry PagerLarry PagerLarry Pager to numeric

dataframe.groupby("col_name1").mean() 会引发相同的错误。

我在 Windows 10 上,使用 Python 3.11 和 pandas 版本 2.0.1 进行操作。所有这些操作都是在 Jupyter Notebook 上与 VScode 配合完成的。

英文:

I have tried severally to perform some numeric aggregation methods on numeric data with pandas. However, I have received a NotImplementedError, which then throws a TypeError, whenever I do so. I hypothesize that pandas is refusing to ignore the string columns when performing said numerical tasks. How do I prevent this?

Given a pivot table named matrix_data, and with pandas imported as pan:

  Account Number  Company      Contact Account Manager     Product  Licenses   
0         2123398   Google  Larry Pager    Edward Thorp   Analytics       150  
1         2123398   Google  Larry Pager    Edward Thorp  Prediction       150   
2         2123398   Google  Larry Pager    Edward Thorp    Tracking       300   
3         2192650     BOBO  Larry Pager    Edward Thorp   Analytics       150   
4          420496     IKEA    Elon Tusk    Edward Thorp   Analytics       300   

   Sale Price        Status  
0     2100000     Presented  
1      700000     Presented  
2      350000  Under Review  
3     2450000          Lost  
4     4550000           Won  

Trying to aggregate all numerical values by company:

pan.pivot_table(matrix_data, index = "Company", aggfunc="mean");

throws an exception like so:

NotImplementedError                       Traceback (most recent call last)
File ~\AppData\Roaming\Python\Python311\site-packages\pandas\core\groupby\groupby.py:1490, in GroupBy._cython_agg_general..array_func(values)
   1489 try:
-> 1490     result = self.grouper._cython_operation(
   1491         "aggregate",
   1492         values,
   1493         how,
   1494         axis=data.ndim - 1,
   1495         min_count=min_count,
   1496         **kwargs,
   1497     )
   1498 except NotImplementedError:
   1499     # generally if we have numeric_only=False
   1500     # and non-applicable functions
...
   1698             # e.g. "foo"
-> 1699             raise TypeError(f"Could not convert {x} to numeric") from err
   1700 return x

TypeError: Could not convert Larry PagerLarry PagerLarry Pager to numeric

dataframe.groupby(["col_name1"]).mean() will throw an identical error

I'm on windows 10, python 3.11, with pandas version 2.0.1. All this was performed on Jupyter Notebook with VScode

答案1

得分: 0

已在Pandas 2.0中弃用。这是pandas 1.5.3给出的警告:

> FutureWarning: pivot_table删除了一个列,因为它无法进行聚合。此行为已被弃用,并将在将来的
> pandas版本中引发。只选择可以聚合的列。

现在,您需要选择要进行聚合的具体列。

cols = ['Licenses', 'Sale Price']
pd.pivot_table(matrix_data, values=cols, index="Company", aggfunc="mean")
英文:

This has been deprecated in Pandas 2.0. This is the warning pandas 1.5.3 gives:

> FutureWarning: pivot_table dropped a column because it failed to
> aggregate. This behavior is deprecated and will raise in a future
> version of pandas. Select only the columns that can be aggregated.

You now have to select the specific columns you want to aggregate.

cols = ['Licenses', 'Sale Price']
pd.pivot_table(matrix_data, values=cols, index="Company", aggfunc="mean")

huangapple
  • 本文由 发表于 2023年6月1日 22:15:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/76382864.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定