英文:
Explode/expand result of groupby and to same ordering/index as before groupby
问题
I will provide the translation for the code-related portion:
以下是代码相关部分的翻译:
import pandas as pd
df = pd.DataFrame({"id": [1, 2, 1, 2], "amount": [-1, 10, 20, -5]})
def is_positive(grp_row):
return grp_row["amount"].values > 0
df_result = df.groupby("id").apply(is_positive)
请注意,这部分内容已翻译完毕。
英文:
Say I have the following dataframe and function that I want to apply within each group
import pandas as pd
df = pd.DataFrame({"id": [1, 2, 1, 2], "amount": [-1, 10, 20, -5]})
def is_positive(grp_row):
return grp_row["amount"].values > 0
df_result = df.groupby("id").apply(is_positive)
#id
# 1 False
# True
# 2 True
# False
(note this is not my real problem but for illustration purpose only).
How do I explode/expand the resulting dataframe such that it have the same index/ordering as df
, such that df.iloc[i]
corresponds to df_result.iloc[i]
?
I have purposely removed the series-operation from the is_positive
, thus the .values>0
, such that we don't get the original index in the result (since I don't get that in my real function).
答案1
得分: 1
为了展开结果数据框以匹配数据框的原始顺序,您可以使用 pd.Series.explode
方法。这将会将 Series 中的列表展开为单独的行。
然而,直接在 df_result
上使用 explode
不会保留原始顺序,因为 groupby
操作会对分组的键进行排序。所以您需要根据原始数据框的索引来重新设置顺序。以下是一种实现方式:
import pandas as pd
df = pd.DataFrame({"id": [1, 2, 1, 2], "amount": [-1, 10, 20, -5]})
def is_positive(grp_row):
return pd.Series(grp_row["amount"].values > 0, index=grp_row.index)
df_result = df.groupby("id").apply(is_positive).reset_index(level=0, drop=True)
# 重命名 Series
df_result.name = 'is_positive'
print(df_result)
这个脚本将为您提供一个数据框,其中每一行对应于原始数据框中相应行的 is_positive
值,并且顺序与原始数据框匹配。
英文:
In order to explode/expand the resulting dataframse so that it matches the original ordering of the dataframe, you can use pd.Series.explode
method. This will expand lists in a Series into separate rows.
However, using explode directly on df_result
will not retain the original order because the groupby
operation sorts the grouped keys. So you need to reset the ordering based on the original dataframe's index. Here's one way to achieve this:
import pandas as pd
df = pd.DataFrame({"id": [1, 2, 1, 2], "amount": [-1, 10, 20, -5]})
def is_positive(grp_row):
return pd.Series(grp_row["amount"].values > 0, index=grp_row.index)
df_result = df.groupby("id").apply(is_positive).reset_index(level=0, drop=True)
# rename the series
df_result.name = 'is_positive'
print(df_result)
This script will give you a dataframe where each row corresponds to the is_positive
value for the corresponding row in the original dataframe, and the order matches the original dataframe.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论