2023年3月4日 01:39:38go评论112阅读模式

英文:

Pandas groupby and sum are dropping numeric columns

问题

以下是翻译好的部分：

当这段代码运行时，会产生以下日志：

standardized_df cols are   Customer ID      Customer Name  ... TermDaysAmountProduct DaysToCollectAmountProduct
grouped_df cols are   Customer ID  Amount

所以显然在groupby过程中，TermDaysAmountProduct和DaysToCollectAmountProduct列（它们都是数字，应该被求和）由于某种原因被移除了。在求和后，如何保留这些列在数据框中？

英文:

I have the following Python/Pandas code:

standardized_df = get_somehow()
standardized_df[&#39;TermDaysAmountProduct&#39;] = standardized_df[&#39;TermDays&#39;] * standardized_df[&#39;Amount&#39;]
standardized_df[&#39;DaysToCollectAmountProduct&#39;] = standardized_df[&#39;DaysToCollect&#39;] * standardized_df[&#39;Amount&#39;]
logger.info(&quot;standardized_df cols are {}&quot;.format(standardized_df.head()))
grouped_df = standardized_df.groupby([&quot;Customer ID&quot;], as_index=False).sum()
logger.info(&quot;grouped_df cols are {}&quot;.format(grouped_df.head()))

When this runs it produces the following logs:

standardized_df cols are   Customer ID      Customer Name  ... TermDaysAmountProduct DaysToCollectAmountProduct
grouped_df cols are   Customer ID  Amount

So apparently during the groupby, the TermDaysAmountProduct and DaysToCollectAmountProduct columns (which are both numeric and should be summed) are getting removed for some reason. How can I keep these columns in the dataframe after the sum?

答案1

得分: 1

关于 Pandas，我之前并没有注意到在应用求和时会丢弃非数值列。有趣。无论如何，一种解决方法是手动提供列名给aggregate函数。

grouped_df = standardized_df.groupby(["Customer ID"], as_index=False).aggregate({"<col_1>": sum, "<col_2>": sum})

一般来说，你总是可以将aggregate({foo: bar})应用到pandas.core.groupby.DataFrameGroupBy对象上，其中foo是列名，bar是接受pd.Series参数的函数。

注意，如果你有大量列并且想要对它们进行求和而不想手动输入一个大字典，你总是可以准备一个聚合字典。

aggregates = {col: sum for col in df.columns}

英文:

I hadn't noticed about pandas before that it drops non-numeric columns when applying sums. Interesting. Anyway, a workaround is to supply the column names manually to an aggregate function.

grouped_df = standardized_df.groupby([&quot;Customer ID&quot;], as_index=False).aggregate({&lt;col_1&gt;: sum, &lt;col_2&gt;: sum})

In general you can always apply an aggregate({foo: bar}) to a pandas.core.groupby.DataFrameGroupBy object where foo is column name and bar is a function that takes a pd.Series argument.

Note, If you have some large number of columns and you want them all to be summed without having to type out a big long dictionary, you can always prepare the aggregate dictionary. aggregates = {col: sum for col in df.columns}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas groupby和sum会丢弃数值列。

问题

答案1

Polars将数字字符串转换为列表

Python在Google Cloud Run上的会话会自动注销。

如何使`cv2.HoughLinesP` 仅检测垂直线？

点制作的3D表面的颜色

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。