2023年6月8日 16:08:23go评论96阅读模式

英文:

Explode/expand result of groupby and to same ordering/index as before groupby

问题

I will provide the translation for the code-related portion:

以下是代码相关部分的翻译：

import pandas as pd
df = pd.DataFrame({"id": [1, 2, 1, 2], "amount": [-1, 10, 20, -5]})
def is_positive(grp_row):
    return grp_row["amount"].values > 0
df_result = df.groupby("id").apply(is_positive)

请注意，这部分内容已翻译完毕。

英文:

Say I have the following dataframe and function that I want to apply within each group

import pandas as pd
df = pd.DataFrame({&quot;id&quot;: [1, 2, 1, 2], &quot;amount&quot;: [-1, 10, 20, -5]})
def is_positive(grp_row):
    return grp_row[&quot;amount&quot;].values &gt; 0
df_result = df.groupby(&quot;id&quot;).apply(is_positive)
#id   
# 1     False
#       True
# 2     True
#       False

(note this is not my real problem but for illustration purpose only).

How do I explode/expand the resulting dataframe such that it have the same index/ordering as df, such that df.iloc[i] corresponds to df_result.iloc[i]?

I have purposely removed the series-operation from the is_positive, thus the .values>0, such that we don't get the original index in the result (since I don't get that in my real function).

答案1

得分: 1

为了展开结果数据框以匹配数据框的原始顺序，您可以使用 pd.Series.explode 方法。这将会将 Series 中的列表展开为单独的行。

然而，直接在 df_result 上使用 explode 不会保留原始顺序，因为 groupby 操作会对分组的键进行排序。所以您需要根据原始数据框的索引来重新设置顺序。以下是一种实现方式：

import pandas as pd
df = pd.DataFrame({"id": [1, 2, 1, 2], "amount": [-1, 10, 20, -5]})
def is_positive(grp_row):
    return pd.Series(grp_row["amount"].values > 0, index=grp_row.index)
df_result = df.groupby("id").apply(is_positive).reset_index(level=0, drop=True)
# 重命名 Series
df_result.name = 'is_positive'
print(df_result)

这个脚本将为您提供一个数据框，其中每一行对应于原始数据框中相应行的 is_positive 值，并且顺序与原始数据框匹配。

英文:

In order to explode/expand the resulting dataframse so that it matches the original ordering of the dataframe, you can use pd.Series.explode method. This will expand lists in a Series into separate rows.

However, using explode directly on df_result will not retain the original order because the groupby operation sorts the grouped keys. So you need to reset the ordering based on the original dataframe's index. Here's one way to achieve this:

import pandas as pd
df = pd.DataFrame({&quot;id&quot;: [1, 2, 1, 2], &quot;amount&quot;: [-1, 10, 20, -5]})
def is_positive(grp_row):
    return pd.Series(grp_row[&quot;amount&quot;].values &gt; 0, index=grp_row.index)
df_result = df.groupby(&quot;id&quot;).apply(is_positive).reset_index(level=0, drop=True)
# rename the series
df_result.name = &#39;is_positive&#39;
print(df_result)

This script will give you a dataframe where each row corresponds to the is_positive value for the corresponding row in the original dataframe, and the order matches the original dataframe.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将”groupby”的结果展开/扩展，并保持与”groupby”之前相同的排序/索引。

问题

答案1

如何正确创建多输入神经网络

Python：从另一个模块访问变量仅作为引用提供。

如何将数据框转换为时间序列（年度）。

使用f2py将旧的Fortran77代码编译成Python模块。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。