Pandas groupby和sum会丢弃数值列。

huangapple go评论76阅读模式
英文:

Pandas groupby and sum are dropping numeric columns

问题

以下是翻译好的部分:

当这段代码运行时,会产生以下日志:

standardized_df cols are   Customer ID      Customer Name  ... TermDaysAmountProduct DaysToCollectAmountProduct
grouped_df cols are   Customer ID  Amount

所以显然在groupby过程中,TermDaysAmountProductDaysToCollectAmountProduct列(它们都是数字,应该被求和)由于某种原因被移除了。在求和后,如何保留这些列在数据框中?

英文:

I have the following Python/Pandas code:

standardized_df = get_somehow()
standardized_df['TermDaysAmountProduct'] = standardized_df['TermDays'] * standardized_df['Amount']
standardized_df['DaysToCollectAmountProduct'] = standardized_df['DaysToCollect'] * standardized_df['Amount']
logger.info("standardized_df cols are {}".format(standardized_df.head()))

grouped_df = standardized_df.groupby(["Customer ID"], as_index=False).sum()
logger.info("grouped_df cols are {}".format(grouped_df.head()))

When this runs it produces the following logs:

standardized_df cols are   Customer ID      Customer Name  ... TermDaysAmountProduct DaysToCollectAmountProduct
grouped_df cols are   Customer ID  Amount

So apparently during the groupby, the TermDaysAmountProduct and DaysToCollectAmountProduct columns (which are both numeric and should be summed) are getting removed for some reason. How can I keep these columns in the dataframe after the sum?

答案1

得分: 1

关于 Pandas,我之前并没有注意到在应用求和时会丢弃非数值列。有趣。无论如何,一种解决方法是手动提供列名给aggregate函数。

grouped_df = standardized_df.groupby(["Customer ID"], as_index=False).aggregate({"<col_1>": sum, "<col_2>": sum})

一般来说,你总是可以将aggregate({foo: bar})应用到pandas.core.groupby.DataFrameGroupBy对象上,其中foo是列名,bar是接受pd.Series参数的函数。

注意,如果你有大量列并且想要对它们进行求和而不想手动输入一个大字典,你总是可以准备一个聚合字典。

aggregates = {col: sum for col in df.columns}
英文:

I hadn't noticed about pandas before that it drops non-numeric columns when applying sums. Interesting. Anyway, a workaround is to supply the column names manually to an aggregate function.

grouped_df = standardized_df.groupby([&quot;Customer ID&quot;], as_index=False).aggregate({&lt;col_1&gt;: sum, &lt;col_2&gt;: sum})

In general you can always apply an aggregate({foo: bar}) to a pandas.core.groupby.DataFrameGroupBy object where foo is column name and bar is a function that takes a pd.Series argument.

Note, If you have some large number of columns and you want them all to be summed without having to type out a big long dictionary, you can always prepare the aggregate dictionary.
aggregates = {col: sum for col in df.columns}

huangapple
  • 本文由 发表于 2023年3月4日 01:39:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/75630249.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定