英文:
How do I prevent 'NotImplementedError' and 'TypeError' when using numeric aggregate functions in Pandas pivot tables with string columns?
问题
我已经多次尝试在数值数据上使用 pandas
执行一些数值聚合方法。然而,我收到了一个 NotImplementedError
,随后会引发 TypeError
。每当我这样做时,我猜测 pandas
在执行这些数值任务时拒绝忽略字符串列。我该如何防止这种情况发生?
给定一个名为 matrix_data
的透视表,且已将 pandas
导入为 pan
:
Account Number Company Contact Account Manager Product Licenses
0 2123398 Google Larry Pager Edward Thorp Analytics 150
1 2123398 Google Larry Pager Edward Thorp Prediction 150
2 2123398 Google Larry Pager Edward Thorp Tracking 300
3 2192650 BOBO Larry Pager Edward Thorp Analytics 150
4 420496 IKEA Elon Tusk Edward Thorp Analytics 300
Sale Price Status
0 2100000 Presented
1 700000 Presented
2 350000 Under Review
3 2450000 Lost
4 4550000 Won
尝试通过公司对所有数值值进行聚合:
pan.pivot_table(matrix_data, index="Company", aggfunc="mean")
会引发如下异常:
NotImplementedError Traceback (most recent call last)
File ~\AppData\Roaming\Python\Python311\site-packages\pandas\core\groupby\groupby.py:1490, in GroupBy._cython_agg_general..array_func(values)
1489 try:
-> 1490 result = self.grouper._cython_operation(
1491 "aggregate",
1492 values,
1493 how,
1494 axis=data.ndim - 1,
1495 min_count=min_count,
1496 **kwargs,
1497 )
1498 except NotImplementedError:
1499 # generally if we have numeric_only=False
1500 # and non-applicable functions
...
1698 # e.g. "foo"
-> 1699 raise TypeError(f"Could not convert {x} to numeric") from err
1700 return x
TypeError: Could not convert Larry PagerLarry PagerLarry Pager to numeric
dataframe.groupby("col_name1").mean()
会引发相同的错误。
我在 Windows 10 上,使用 Python 3.11 和 pandas 版本 2.0.1 进行操作。所有这些操作都是在 Jupyter Notebook 上与 VScode 配合完成的。
英文:
I have tried severally to perform some numeric aggregation methods on numeric data with pandas
. However, I have received a NotImplementedError
, which then throws a TypeError
, whenever I do so. I hypothesize that pandas
is refusing to ignore the string columns when performing said numerical tasks. How do I prevent this?
Given a pivot table named matrix_data
, and with pandas
imported as pan
:
Account Number Company Contact Account Manager Product Licenses
0 2123398 Google Larry Pager Edward Thorp Analytics 150
1 2123398 Google Larry Pager Edward Thorp Prediction 150
2 2123398 Google Larry Pager Edward Thorp Tracking 300
3 2192650 BOBO Larry Pager Edward Thorp Analytics 150
4 420496 IKEA Elon Tusk Edward Thorp Analytics 300
Sale Price Status
0 2100000 Presented
1 700000 Presented
2 350000 Under Review
3 2450000 Lost
4 4550000 Won
Trying to aggregate all numerical values by company:
pan.pivot_table(matrix_data, index = "Company", aggfunc="mean");
throws an exception like so:
NotImplementedError Traceback (most recent call last)
File ~\AppData\Roaming\Python\Python311\site-packages\pandas\core\groupby\groupby.py:1490, in GroupBy._cython_agg_general..array_func(values)
1489 try:
-> 1490 result = self.grouper._cython_operation(
1491 "aggregate",
1492 values,
1493 how,
1494 axis=data.ndim - 1,
1495 min_count=min_count,
1496 **kwargs,
1497 )
1498 except NotImplementedError:
1499 # generally if we have numeric_only=False
1500 # and non-applicable functions
...
1698 # e.g. "foo"
-> 1699 raise TypeError(f"Could not convert {x} to numeric") from err
1700 return x
TypeError: Could not convert Larry PagerLarry PagerLarry Pager to numeric
dataframe.groupby(["col_name1"]).mean()
will throw an identical error
I'm on windows 10, python 3.11, with pandas version 2.0.1. All this was performed on Jupyter Notebook with VScode
答案1
得分: 0
已在Pandas 2.0中弃用。这是pandas 1.5.3给出的警告:
> FutureWarning: pivot_table删除了一个列,因为它无法进行聚合。此行为已被弃用,并将在将来的
> pandas版本中引发。只选择可以聚合的列。
现在,您需要选择要进行聚合的具体列。
cols = ['Licenses', 'Sale Price']
pd.pivot_table(matrix_data, values=cols, index="Company", aggfunc="mean")
英文:
This has been deprecated in Pandas 2.0. This is the warning pandas 1.5.3 gives:
> FutureWarning: pivot_table dropped a column because it failed to
> aggregate. This behavior is deprecated and will raise in a future
> version of pandas. Select only the columns that can be aggregated.
You now have to select the specific columns you want to aggregate.
cols = ['Licenses', 'Sale Price']
pd.pivot_table(matrix_data, values=cols, index="Company", aggfunc="mean")
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论