英文:
How to apply a conditional ffill fillna() with groupby dataframe
问题
我有一个包含多个数据类/组的数据框。一列指示特定行的NaN内容是否应该应用fillna()(True/False)。
import pandas as pd
import numpy as np
df = pd.DataFrame(
[["Bill", False, 13, 10, 15],
["Jane", False, 63, 17, 95],
["Bill", True, np.nan, 5, np.nan],
["Mary", False, 65, 13, np.nan]],
columns=['Person','result','data1','data2','data3'])
我的目标是根据 "True" 参数填充Bill的NaN值分别为13和15,同时保留其他NaN值和数据不变。
我找到了解决这个问题较简单版本的方法,但无法通过groupby使其正常工作。
这种方法似乎失去了引用"result"列的能力(KeyError 'result'):
df2 = df.groupby('Person').transform(lambda x: x.fillna(method='ffill', axis=1) if x['result']==True else x)
而这种方法似乎存在类似的问题:
df2 = df.apply(
lambda row: row.fillna(method='ffill') if row['result']==True else row
)
有什么建议吗?
英文:
I have a dataframe with several classes/groups of data. One column indicates (True/False) whether the NaN contents of a specific row should have fillna() applied.
import pandas as pd
import numpy as np
df = pd.DataFrame(
[["Bill", False, 13, 10, 15],
["Jane", False, 63, 17, 95],
["Bill", True, np.nan, 5, np.nan],
["Mary", False, 65, 13, np.nan]],
columns=['Person','result','data1','data2','data3'])
My goal is to fill the NaN values for Bill with 13 and 15 respectively, because of the "True" parameter, while leaving other NaNs and data alone.
I've found solutions to simpler versions of this problem, but can't quite get it to work with the groupby.
This approach seems to lose the ability to reference the result column (KeyError 'result'):
df2 = df.groupby('Person').transform(lambda x: x.fillna(method='ffill', axis=1) if x['result']==True else x)
And this approach appears to give a similar problem:
df2 = df.apply(
lambda row: row.fillna(method='ffill') if row['result']==True else row
)
Any advice?
答案1
得分: 1
第一种方法几乎可以工作!一旦你进入groupby
,你需要使用apply
,而不是transform
,因为你希望fillna
在整个组上应用。另外,你希望轴是0
,而不是1
,否则它会从相邻的列中填充。
所以,代码应该是这样的:
df.groupby('Person').apply(lambda x: x.fillna(method='ffill', axis=0))
请注意,这并不考虑你在"result"
列上的筛选。要考虑到这一点,你可以简单地在源数据库中替换你想要的行:
df[df["result"]] = df.groupby('Person', group_keys=False).apply(lambda x: x.fillna(method='ffill', axis=0))
或者,如果你希望结果在第二个数据框中:
df2 = df.copy()
df2[df2["result"]] = df.groupby('Person', group_keys=False).apply(lambda x: x.fillna(method='ffill', axis=0))
df2
英文:
The first approach almost works! Once you're in the groupby
, you need to use apply
, instead of transform, since you want the fillna
to be applied over the all group. Also, you want the axis to be 0
, not 1
, otherwise it will fill from the adjacent columns
So, it should look something like:
df.groupby('Person').apply(lambda x: x.fillna(method='ffill', axis=0))
Notice that, this doesn't take into account your filter on "result"
. To take that into account, you can simply replace only the rows you want in the source database:
df[df["result"]] = df.groupby('Person', group_keys=False).apply(lambda x: x.fillna(method='ffill', axis=0))
Or, if you want the result to be in a second dataframe,
df2 = df.copy()
df2[df2["result"]] = df.groupby('Person', group_keys=False).apply(lambda x: x.fillna(method='ffill', axis=0))
df2
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论