如何使用Pandas根据其他两列的条件填充NaN值?

huangapple go评论69阅读模式
英文:

How to populate NaN values based on conditions from two other columns using Pandas?

问题

我有一个数据框,看起来像这样:

|  ID | hiqual | Wave |
| --- | ------ | ---- |
| 1   |  1.0   |g     |
| 1   |  NaN   |i     |
| 1   |  NaN   |k     |
| 2   |  1.0   |g     |
| 2   |  NaN   |i     |
| 2   |  NaN   |k     |
| 3   |  1.0   |g     |
| 3   |  NaN   |i     |
| 4   |  5.0   |g     |
| 4   |  NaN   |i     |

这是一个长格式的数据框,我有我的`hiqual`变量用于我的第一个测量波次(g)。我想要将后续测量波次(i和k)的NaN值填充为每个ID在波次g中给出的相同值。

我尝试使用fillna(),但我不确定如何提供ID和Wave的两个条件,并根据这些条件填充。对于此问题,我将不胜感激地接受任何帮助/建议。
英文:

I have a dataframe that looks something like this:

ID hiqual Wave
1 1.0 g
1 NaN i
1 NaN k
2 1.0 g
2 NaN i
2 NaN k
3 1.0 g
3 NaN i
4 5.0 g
4 NaN i

This is a long format dataframe and I have my hiqual variable for my first measurement wave (g). I would like to populate the NaN values for the subsequent measurement waves (i and k) as the same value give in wave g for each ID.

I tried using fillna() but I am not sure how to provide the two conditions of ID and Wave and how to populate based on that. I would be grateful for any help/suggestions on this?

答案1

得分: 1

确切的预期输出不清楚,但我认为你可能想要的是:

m = df['hiqual'].isna()

df.loc[m, 'hiqual'] = df['Wave'].mask(m).ffill()
英文:

The exact expected output is unclear, but think you might want:

m = df['hiqual'].isna()

df.loc[m, 'hiqual'] = df['Wave'].mask(m).ffill()

答案2

得分: 0

如果你的数据框已经按IDWave列排序,你可以简单地向前填充数值:

df.sort_values(['ID', 'Wave']).ffill()

你也可以明确使用g值:

g_vals = df[df['Wave']=='g'].set_index('ID')['hiqual']
df['hiqual'] = df['hiqual'].fillna(df['ID'].map(g_vals))
print(df)
print(g_vals)

输出:

   ID  hiqual Wave
0   1     1.0    g
1   1     1.0    i
2   1     1.0    k
3   2     1.0    g
4   2     1.0    i
5   2     1.0    k
6   3     1.0    g
7   3     1.0    i
8   4     5.0    g
9   4     5.0    i

g_vals

ID
1    1.0
2    1.0
3    1.0
4    5.0
Name: hiqual, dtype: float64
英文:

If you dataframe is already ordered by ID and wave columns, you can simply fill forward values:

>>> df.sort_values(['ID', 'Wave']).ffill()
   ID  hiqual Wave
0   1     1.0    g
1   1     1.0    i
2   1     1.0    k
3   2     1.0    g
4   2     1.0    i
5   2     1.0    k
6   3     1.0    g
7   3     1.0    i
8   4     5.0    g
9   4     5.0    i

You can also use explicitly g values:

g_vals = df[df['Wave']=='g'].set_index('ID')['hiqual']
df['hiqual'] = df['hiqual'].fillna(df['ID'].map(g_vals))
print(df)
print(g_vals)

# Output
   ID  hiqual Wave
0   1     1.0    g
1   1     1.0    i
2   1     1.0    k
3   2     1.0    g
4   2     1.0    i
5   2     1.0    k
6   3     1.0    g
7   3     1.0    i
8   4     5.0    g
9   4     5.0    i

# g_vals
ID
1    1.0
2    1.0
3    1.0
4    5.0
Name: hiqual, dtype: float64

huangapple
  • 本文由 发表于 2023年2月9日 01:28:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/75389569.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定