英文:
How to populate NaN values based on conditions from two other columns using Pandas?
问题
我有一个数据框,看起来像这样:
| ID | hiqual | Wave |
| --- | ------ | ---- |
| 1 | 1.0 |g |
| 1 | NaN |i |
| 1 | NaN |k |
| 2 | 1.0 |g |
| 2 | NaN |i |
| 2 | NaN |k |
| 3 | 1.0 |g |
| 3 | NaN |i |
| 4 | 5.0 |g |
| 4 | NaN |i |
这是一个长格式的数据框,我有我的`hiqual`变量用于我的第一个测量波次(g)。我想要将后续测量波次(i和k)的NaN值填充为每个ID在波次g中给出的相同值。
我尝试使用fillna(),但我不确定如何提供ID和Wave的两个条件,并根据这些条件填充。对于此问题,我将不胜感激地接受任何帮助/建议。
英文:
I have a dataframe that looks something like this:
ID | hiqual | Wave |
---|---|---|
1 | 1.0 | g |
1 | NaN | i |
1 | NaN | k |
2 | 1.0 | g |
2 | NaN | i |
2 | NaN | k |
3 | 1.0 | g |
3 | NaN | i |
4 | 5.0 | g |
4 | NaN | i |
This is a long format dataframe and I have my hiqual
variable for my first measurement wave (g). I would like to populate the NaN values for the subsequent measurement waves (i and k) as the same value give in wave g for each ID.
I tried using fillna() but I am not sure how to provide the two conditions of ID and Wave and how to populate based on that. I would be grateful for any help/suggestions on this?
答案1
得分: 1
确切的预期输出不清楚,但我认为你可能想要的是:
m = df['hiqual'].isna()
df.loc[m, 'hiqual'] = df['Wave'].mask(m).ffill()
英文:
The exact expected output is unclear, but think you might want:
m = df['hiqual'].isna()
df.loc[m, 'hiqual'] = df['Wave'].mask(m).ffill()
答案2
得分: 0
如果你的数据框已经按ID
和Wave
列排序,你可以简单地向前填充数值:
df.sort_values(['ID', 'Wave']).ffill()
你也可以明确使用g
值:
g_vals = df[df['Wave']=='g'].set_index('ID')['hiqual']
df['hiqual'] = df['hiqual'].fillna(df['ID'].map(g_vals))
print(df)
print(g_vals)
输出:
ID hiqual Wave
0 1 1.0 g
1 1 1.0 i
2 1 1.0 k
3 2 1.0 g
4 2 1.0 i
5 2 1.0 k
6 3 1.0 g
7 3 1.0 i
8 4 5.0 g
9 4 5.0 i
g_vals
:
ID
1 1.0
2 1.0
3 1.0
4 5.0
Name: hiqual, dtype: float64
英文:
If you dataframe is already ordered by ID
and wave
columns, you can simply fill forward values:
>>> df.sort_values(['ID', 'Wave']).ffill()
ID hiqual Wave
0 1 1.0 g
1 1 1.0 i
2 1 1.0 k
3 2 1.0 g
4 2 1.0 i
5 2 1.0 k
6 3 1.0 g
7 3 1.0 i
8 4 5.0 g
9 4 5.0 i
You can also use explicitly g
values:
g_vals = df[df['Wave']=='g'].set_index('ID')['hiqual']
df['hiqual'] = df['hiqual'].fillna(df['ID'].map(g_vals))
print(df)
print(g_vals)
# Output
ID hiqual Wave
0 1 1.0 g
1 1 1.0 i
2 1 1.0 k
3 2 1.0 g
4 2 1.0 i
5 2 1.0 k
6 3 1.0 g
7 3 1.0 i
8 4 5.0 g
9 4 5.0 i
# g_vals
ID
1 1.0
2 1.0
3 1.0
4 5.0
Name: hiqual, dtype: float64
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论