将”groupby”的结果展开/扩展,并保持与”groupby”之前相同的排序/索引。

huangapple go评论70阅读模式
英文:

Explode/expand result of groupby and to same ordering/index as before groupby

问题

I will provide the translation for the code-related portion:

以下是代码相关部分的翻译:

import pandas as pd

df = pd.DataFrame({"id": [1, 2, 1, 2], "amount": [-1, 10, 20, -5]})


def is_positive(grp_row):
    return grp_row["amount"].values > 0

df_result = df.groupby("id").apply(is_positive)

请注意,这部分内容已翻译完毕。

英文:

Say I have the following dataframe and function that I want to apply within each group

import pandas as pd

df = pd.DataFrame({"id": [1, 2, 1, 2], "amount": [-1, 10, 20, -5]})


def is_positive(grp_row):
    return grp_row["amount"].values > 0

df_result = df.groupby("id").apply(is_positive)

#id   
# 1     False
#       True
# 2     True
#       False

(note this is not my real problem but for illustration purpose only).

How do I explode/expand the resulting dataframe such that it have the same index/ordering as df, such that df.iloc[i] corresponds to df_result.iloc[i]?

I have purposely removed the series-operation from the is_positive, thus the .values>0, such that we don't get the original index in the result (since I don't get that in my real function).

答案1

得分: 1

为了展开结果数据框以匹配数据框的原始顺序,您可以使用 pd.Series.explode 方法。这将会将 Series 中的列表展开为单独的行。

然而,直接在 df_result 上使用 explode 不会保留原始顺序,因为 groupby 操作会对分组的键进行排序。所以您需要根据原始数据框的索引来重新设置顺序。以下是一种实现方式:

import pandas as pd

df = pd.DataFrame({"id": [1, 2, 1, 2], "amount": [-1, 10, 20, -5]})

def is_positive(grp_row):
    return pd.Series(grp_row["amount"].values > 0, index=grp_row.index)

df_result = df.groupby("id").apply(is_positive).reset_index(level=0, drop=True)

# 重命名 Series
df_result.name = 'is_positive'

print(df_result)

这个脚本将为您提供一个数据框,其中每一行对应于原始数据框中相应行的 is_positive 值,并且顺序与原始数据框匹配。

英文:

In order to explode/expand the resulting dataframse so that it matches the original ordering of the dataframe, you can use pd.Series.explode method. This will expand lists in a Series into separate rows.

However, using explode directly on df_result will not retain the original order because the groupby operation sorts the grouped keys. So you need to reset the ordering based on the original dataframe's index. Here's one way to achieve this:

import pandas as pd

df = pd.DataFrame({"id": [1, 2, 1, 2], "amount": [-1, 10, 20, -5]})

def is_positive(grp_row):
    return pd.Series(grp_row["amount"].values > 0, index=grp_row.index)

df_result = df.groupby("id").apply(is_positive).reset_index(level=0, drop=True)

# rename the series
df_result.name = 'is_positive'

print(df_result)

This script will give you a dataframe where each row corresponds to the is_positive value for the corresponding row in the original dataframe, and the order matches the original dataframe.

huangapple
  • 本文由 发表于 2023年6月8日 16:08:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76429847.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定