如何在groupby的DataFrame中应用带条件的ffill fillna()。

huangapple go评论64阅读模式
英文:

How to apply a conditional ffill fillna() with groupby dataframe

问题

我有一个包含多个数据类/组的数据框。一列指示特定行的NaN内容是否应该应用fillna()(True/False)。

import pandas as pd
import numpy as np
df = pd.DataFrame(
            [["Bill", False, 13, 10, 15],
            ["Jane", False, 63, 17, 95],
            ["Bill", True, np.nan, 5, np.nan],
            ["Mary", False, 65, 13, np.nan]],
           columns=['Person','result','data1','data2','data3'])

我的目标是根据 "True" 参数填充Bill的NaN值分别为13和15,同时保留其他NaN值和数据不变。
我找到了解决这个问题较简单版本的方法,但无法通过groupby使其正常工作。

这种方法似乎失去了引用"result"列的能力(KeyError 'result'):

df2 = df.groupby('Person').transform(lambda x: x.fillna(method='ffill', axis=1) if x['result']==True else x)

而这种方法似乎存在类似的问题:

df2 = df.apply(
lambda row: row.fillna(method='ffill') if row['result']==True else row
)

有什么建议吗?

英文:

I have a dataframe with several classes/groups of data. One column indicates (True/False) whether the NaN contents of a specific row should have fillna() applied.

import pandas as pd
import numpy as np
df = pd.DataFrame(
            [["Bill", False, 13, 10, 15],
            ["Jane", False, 63, 17, 95],
            ["Bill", True, np.nan, 5, np.nan],
            ["Mary", False, 65, 13, np.nan]],
           columns=['Person','result','data1','data2','data3'])

My goal is to fill the NaN values for Bill with 13 and 15 respectively, because of the "True" parameter, while leaving other NaNs and data alone.
I've found solutions to simpler versions of this problem, but can't quite get it to work with the groupby.

This approach seems to lose the ability to reference the result column (KeyError 'result'):

    df2 = df.groupby('Person').transform(lambda x: x.fillna(method='ffill', axis=1) if x['result']==True else x) 

And this approach appears to give a similar problem:

    df2 = df.apply(
    lambda row: row.fillna(method='ffill') if row['result']==True else row
    )

Any advice?

答案1

得分: 1

第一种方法几乎可以工作!一旦你进入groupby,你需要使用apply,而不是transform,因为你希望fillna在整个组上应用。另外,你希望轴是0,而不是1,否则它会从相邻的列中填充。

所以,代码应该是这样的:

df.groupby('Person').apply(lambda x: x.fillna(method='ffill', axis=0))

请注意,这并不考虑你在"result"列上的筛选。要考虑到这一点,你可以简单地在源数据库中替换你想要的行:

df[df["result"]] = df.groupby('Person', group_keys=False).apply(lambda x: x.fillna(method='ffill', axis=0))

或者,如果你希望结果在第二个数据框中:

df2 = df.copy()
df2[df2["result"]] = df.groupby('Person', group_keys=False).apply(lambda x: x.fillna(method='ffill', axis=0))
df2
英文:

The first approach almost works! Once you're in the groupby, you need to use apply, instead of transform, since you want the fillna to be applied over the all group. Also, you want the axis to be 0, not 1, otherwise it will fill from the adjacent columns

So, it should look something like:

df.groupby('Person').apply(lambda x: x.fillna(method='ffill', axis=0))

Notice that, this doesn't take into account your filter on "result". To take that into account, you can simply replace only the rows you want in the source database:

df[df["result"]] = df.groupby('Person', group_keys=False).apply(lambda x: x.fillna(method='ffill', axis=0))

Or, if you want the result to be in a second dataframe,

df2 = df.copy()
df2[df2["result"]] = df.groupby('Person', group_keys=False).apply(lambda x: x.fillna(method='ffill', axis=0))
df2

huangapple
  • 本文由 发表于 2023年2月24日 16:32:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/75554232.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定