2023年6月1日 22:15:49go评论137阅读模式

英文:

How do I prevent 'NotImplementedError' and 'TypeError' when using numeric aggregate functions in Pandas pivot tables with string columns?

问题

我已经多次尝试在数值数据上使用 pandas 执行一些数值聚合方法。然而，我收到了一个 NotImplementedError，随后会引发 TypeError。每当我这样做时，我猜测 pandas 在执行这些数值任务时拒绝忽略字符串列。我该如何防止这种情况发生？

给定一个名为 matrix_data 的透视表，且已将 pandas 导入为 pan：

  Account Number  Company      Contact Account Manager     Product  Licenses   
0         2123398   Google  Larry Pager    Edward Thorp   Analytics       150  
1         2123398   Google  Larry Pager    Edward Thorp  Prediction       150   
2         2123398   Google  Larry Pager    Edward Thorp    Tracking       300   
3         2192650     BOBO  Larry Pager    Edward Thorp   Analytics       150   
4          420496     IKEA    Elon Tusk    Edward Thorp   Analytics       300   

   Sale Price        Status  
0     2100000     Presented  
1      700000     Presented  
2      350000  Under Review  
3     2450000          Lost  
4     4550000           Won

尝试通过公司对所有数值值进行聚合：

pan.pivot_table(matrix_data, index="Company", aggfunc="mean")

会引发如下异常：

NotImplementedError                       Traceback (most recent call last)
File ~\AppData\Roaming\Python\Python311\site-packages\pandas\core\groupby\groupby.py:1490, in GroupBy._cython_agg_general..array_func(values)
   1489 try:
-> 1490     result = self.grouper._cython_operation(
   1491         "aggregate",
   1492         values,
   1493         how,
   1494         axis=data.ndim - 1,
   1495         min_count=min_count,
   1496         **kwargs,
   1497     )
   1498 except NotImplementedError:
   1499     # generally if we have numeric_only=False
   1500     # and non-applicable functions
...
   1698             # e.g. "foo"
-> 1699             raise TypeError(f"Could not convert {x} to numeric") from err
   1700 return x

TypeError: Could not convert Larry PagerLarry PagerLarry Pager to numeric

dataframe.groupby("col_name1").mean() 会引发相同的错误。

我在 Windows 10 上，使用 Python 3.11 和 pandas 版本 2.0.1 进行操作。所有这些操作都是在 Jupyter Notebook 上与 VScode 配合完成的。

英文:

I have tried severally to perform some numeric aggregation methods on numeric data with pandas. However, I have received a NotImplementedError, which then throws a TypeError, whenever I do so. I hypothesize that pandas is refusing to ignore the string columns when performing said numerical tasks. How do I prevent this?

Given a pivot table named matrix_data, and with pandas imported as pan:

  Account Number  Company      Contact Account Manager     Product  Licenses   
0         2123398   Google  Larry Pager    Edward Thorp   Analytics       150  
1         2123398   Google  Larry Pager    Edward Thorp  Prediction       150   
2         2123398   Google  Larry Pager    Edward Thorp    Tracking       300   
3         2192650     BOBO  Larry Pager    Edward Thorp   Analytics       150   
4          420496     IKEA    Elon Tusk    Edward Thorp   Analytics       300   

   Sale Price        Status  
0     2100000     Presented  
1      700000     Presented  
2      350000  Under Review  
3     2450000          Lost  
4     4550000           Won

Trying to aggregate all numerical values by company:

pan.pivot_table(matrix_data, index = &quot;Company&quot;, aggfunc=&quot;mean&quot;);

throws an exception like so:

NotImplementedError                       Traceback (most recent call last)
File ~\AppData\Roaming\Python\Python311\site-packages\pandas\core\groupby\groupby.py:1490, in GroupBy._cython_agg_general..array_func(values)
   1489 try:
-&gt; 1490     result = self.grouper._cython_operation(
   1491         &quot;aggregate&quot;,
   1492         values,
   1493         how,
   1494         axis=data.ndim - 1,
   1495         min_count=min_count,
   1496         **kwargs,
   1497     )
   1498 except NotImplementedError:
   1499     # generally if we have numeric_only=False
   1500     # and non-applicable functions
...
   1698             # e.g. &quot;foo&quot;
-&gt; 1699             raise TypeError(f&quot;Could not convert {x} to numeric&quot;) from err
   1700 return x

TypeError: Could not convert Larry PagerLarry PagerLarry Pager to numeric

dataframe.groupby(["col_name1"]).mean() will throw an identical error

I'm on windows 10, python 3.11, with pandas version 2.0.1. All this was performed on Jupyter Notebook with VScode

答案1

得分: 0

已在Pandas 2.0中弃用。这是pandas 1.5.3给出的警告：

> FutureWarning: pivot_table删除了一个列，因为它无法进行聚合。此行为已被弃用，并将在将来的
> pandas版本中引发。只选择可以聚合的列。

现在，您需要选择要进行聚合的具体列。

cols = ['Licenses', 'Sale Price']
pd.pivot_table(matrix_data, values=cols, index="Company", aggfunc="mean")

英文:

This has been deprecated in Pandas 2.0. This is the warning pandas 1.5.3 gives:

> FutureWarning: pivot_table dropped a column because it failed to
> aggregate. This behavior is deprecated and will raise in a future
> version of pandas. Select only the columns that can be aggregated.

You now have to select the specific columns you want to aggregate.

cols = [&#39;Licenses&#39;, &#39;Sale Price&#39;]
pd.pivot_table(matrix_data, values=cols, index=&quot;Company&quot;, aggfunc=&quot;mean&quot;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How do I prevent 'NotImplementedError' and 'TypeError' when using numeric aggregate functions in Pandas pivot tables with string columns?

问题

答案1

如何根据多列中的特定值将行值转换为列？

Pandas逻辑回归不支持混合类型？

从字典在Python中创建Pandas数据框。

创建一个包含变量名称和值的字典列表。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论