2023年2月9日 01:28:17go评论173阅读模式

英文:

How to populate NaN values based on conditions from two other columns using Pandas?

问题

我有一个数据框，看起来像这样：

|  ID | hiqual | Wave |
| --- | ------ | ---- |
| 1   |  1.0   |g     |
| 1   |  NaN   |i     |
| 1   |  NaN   |k     |
| 2   |  1.0   |g     |
| 2   |  NaN   |i     |
| 2   |  NaN   |k     |
| 3   |  1.0   |g     |
| 3   |  NaN   |i     |
| 4   |  5.0   |g     |
| 4   |  NaN   |i     |

这是一个长格式的数据框，我有我的`hiqual`变量用于我的第一个测量波次(g)。我想要将后续测量波次(i和k)的NaN值填充为每个ID在波次g中给出的相同值。

我尝试使用fillna()，但我不确定如何提供ID和Wave的两个条件，并根据这些条件填充。对于此问题，我将不胜感激地接受任何帮助/建议。

英文:

I have a dataframe that looks something like this:

ID	hiqual	Wave
1	1.0	g
1	NaN	i
1	NaN	k
2	1.0	g
2	NaN	i
2	NaN	k
3	1.0	g
3	NaN	i
4	5.0	g
4	NaN	i

This is a long format dataframe and I have my hiqual variable for my first measurement wave (g). I would like to populate the NaN values for the subsequent measurement waves (i and k) as the same value give in wave g for each ID.

I tried using fillna() but I am not sure how to provide the two conditions of ID and Wave and how to populate based on that. I would be grateful for any help/suggestions on this?

答案1

得分: 1

确切的预期输出不清楚，但我认为你可能想要的是：

m = df['hiqual'].isna()

df.loc[m, 'hiqual'] = df['Wave'].mask(m).ffill()

英文:

The exact expected output is unclear, but think you might want:

m = df[&#39;hiqual&#39;].isna()

df.loc[m, &#39;hiqual&#39;] = df[&#39;Wave&#39;].mask(m).ffill()

答案2

得分: 0

如果你的数据框已经按ID和Wave列排序，你可以简单地向前填充数值：

df.sort_values(['ID', 'Wave']).ffill()

你也可以明确使用g值：

g_vals = df[df['Wave']=='g'].set_index('ID')['hiqual']
df['hiqual'] = df['hiqual'].fillna(df['ID'].map(g_vals))
print(df)
print(g_vals)

输出：

   ID  hiqual Wave
0   1     1.0    g
1   1     1.0    i
2   1     1.0    k
3   2     1.0    g
4   2     1.0    i
5   2     1.0    k
6   3     1.0    g
7   3     1.0    i
8   4     5.0    g
9   4     5.0    i

g_vals：

ID
1    1.0
2    1.0
3    1.0
4    5.0
Name: hiqual, dtype: float64

英文:

If you dataframe is already ordered by ID and wave columns, you can simply fill forward values:

&gt;&gt;&gt; df.sort_values([&#39;ID&#39;, &#39;Wave&#39;]).ffill()
   ID  hiqual Wave
0   1     1.0    g
1   1     1.0    i
2   1     1.0    k
3   2     1.0    g
4   2     1.0    i
5   2     1.0    k
6   3     1.0    g
7   3     1.0    i
8   4     5.0    g
9   4     5.0    i

You can also use explicitly g values:

g_vals = df[df[&#39;Wave&#39;]==&#39;g&#39;].set_index(&#39;ID&#39;)[&#39;hiqual&#39;]
df[&#39;hiqual&#39;] = df[&#39;hiqual&#39;].fillna(df[&#39;ID&#39;].map(g_vals))
print(df)
print(g_vals)

# Output
   ID  hiqual Wave
0   1     1.0    g
1   1     1.0    i
2   1     1.0    k
3   2     1.0    g
4   2     1.0    i
5   2     1.0    k
6   3     1.0    g
7   3     1.0    i
8   4     5.0    g
9   4     5.0    i

# g_vals
ID
1    1.0
2    1.0
3    1.0
4    5.0
Name: hiqual, dtype: float64

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用Pandas根据其他两列的条件填充NaN值？

问题

答案1

答案2

获取多个项目，根据任意查询 -> Python FastAPI

使用Go（Golang）编写Python扩展

Go Client Connect to an URL with Socket

Pip and flask run are both giving me a killed 9 message

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论