2023年2月24日 16:32:59go评论97阅读模式

英文:

How to apply a conditional ffill fillna() with groupby dataframe

问题

我有一个包含多个数据类/组的数据框。一列指示特定行的NaN内容是否应该应用fillna()（True/False）。

import pandas as pd
import numpy as np
df = pd.DataFrame(
            [["Bill", False, 13, 10, 15],
            ["Jane", False, 63, 17, 95],
            ["Bill", True, np.nan, 5, np.nan],
            ["Mary", False, 65, 13, np.nan]],
           columns=['Person','result','data1','data2','data3'])

我的目标是根据 "True" 参数填充Bill的NaN值分别为13和15，同时保留其他NaN值和数据不变。
我找到了解决这个问题较简单版本的方法，但无法通过groupby使其正常工作。

这种方法似乎失去了引用"result"列的能力（KeyError 'result'）：

df2 = df.groupby('Person').transform(lambda x: x.fillna(method='ffill', axis=1) if x['result']==True else x)

而这种方法似乎存在类似的问题：

df2 = df.apply(
lambda row: row.fillna(method='ffill') if row['result']==True else row
)

有什么建议吗？

英文:

I have a dataframe with several classes/groups of data. One column indicates (True/False) whether the NaN contents of a specific row should have fillna() applied.

import pandas as pd
import numpy as np
df = pd.DataFrame(
            [[&quot;Bill&quot;, False, 13, 10, 15],
            [&quot;Jane&quot;, False, 63, 17, 95],
            [&quot;Bill&quot;, True, np.nan, 5, np.nan],
            [&quot;Mary&quot;, False, 65, 13, np.nan]],
           columns=[&#39;Person&#39;,&#39;result&#39;,&#39;data1&#39;,&#39;data2&#39;,&#39;data3&#39;])

My goal is to fill the NaN values for Bill with 13 and 15 respectively, because of the "True" parameter, while leaving other NaNs and data alone.
I've found solutions to simpler versions of this problem, but can't quite get it to work with the groupby.

This approach seems to lose the ability to reference the result column (KeyError 'result'):

    df2 = df.groupby(&#39;Person&#39;).transform(lambda x: x.fillna(method=&#39;ffill&#39;, axis=1) if x[&#39;result&#39;]==True else x)

And this approach appears to give a similar problem:

    df2 = df.apply(
    lambda row: row.fillna(method=&#39;ffill&#39;) if row[&#39;result&#39;]==True else row
    )

Any advice?

答案1

得分: 1

第一种方法几乎可以工作！一旦你进入groupby，你需要使用apply，而不是transform，因为你希望fillna在整个组上应用。另外，你希望轴是0，而不是1，否则它会从相邻的列中填充。

所以，代码应该是这样的：

df.groupby('Person').apply(lambda x: x.fillna(method='ffill', axis=0))

请注意，这并不考虑你在"result"列上的筛选。要考虑到这一点，你可以简单地在源数据库中替换你想要的行：

df[df["result"]] = df.groupby('Person', group_keys=False).apply(lambda x: x.fillna(method='ffill', axis=0))

或者，如果你希望结果在第二个数据框中：

df2 = df.copy()
df2[df2["result"]] = df.groupby('Person', group_keys=False).apply(lambda x: x.fillna(method='ffill', axis=0))
df2

英文:

The first approach almost works! Once you're in the groupby, you need to use apply, instead of transform, since you want the fillna to be applied over the all group. Also, you want the axis to be 0, not 1, otherwise it will fill from the adjacent columns

So, it should look something like:

df.groupby(&#39;Person&#39;).apply(lambda x: x.fillna(method=&#39;ffill&#39;, axis=0))

Notice that, this doesn't take into account your filter on "result". To take that into account, you can simply replace only the rows you want in the source database:

df[df[&quot;result&quot;]] = df.groupby(&#39;Person&#39;, group_keys=False).apply(lambda x: x.fillna(method=&#39;ffill&#39;, axis=0))

Or, if you want the result to be in a second dataframe,

df2 = df.copy()
df2[df2[&quot;result&quot;]] = df.groupby(&#39;Person&#39;, group_keys=False).apply(lambda x: x.fillna(method=&#39;ffill&#39;, axis=0))
df2

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在groupby的DataFrame中应用带条件的ffill fillna()。

问题

答案1

你试图访问一个列，但有多个列具有相同的名称。

部署错误。运行WSGI应用程序时出错。ModuleNotFoundError: 未找到模块名 ‘api.urls’

我该如何将DataFrame按照PyTorch Geometric的节点索引重新排列？

这个错误是什么？数值错误，使用基数10时int()无效文字：”

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。