如何使用Pandas根据其他两列的条件填充NaN值?

huangapple go评论104阅读模式
英文:

How to populate NaN values based on conditions from two other columns using Pandas?

问题

  1. 我有一个数据框,看起来像这样:
  2. | ID | hiqual | Wave |
  3. | --- | ------ | ---- |
  4. | 1 | 1.0 |g |
  5. | 1 | NaN |i |
  6. | 1 | NaN |k |
  7. | 2 | 1.0 |g |
  8. | 2 | NaN |i |
  9. | 2 | NaN |k |
  10. | 3 | 1.0 |g |
  11. | 3 | NaN |i |
  12. | 4 | 5.0 |g |
  13. | 4 | NaN |i |
  14. 这是一个长格式的数据框,我有我的`hiqual`变量用于我的第一个测量波次(g)。我想要将后续测量波次(ik)的NaN值填充为每个ID在波次g中给出的相同值。
  15. 我尝试使用fillna(),但我不确定如何提供IDWave的两个条件,并根据这些条件填充。对于此问题,我将不胜感激地接受任何帮助/建议。
英文:

I have a dataframe that looks something like this:

ID hiqual Wave
1 1.0 g
1 NaN i
1 NaN k
2 1.0 g
2 NaN i
2 NaN k
3 1.0 g
3 NaN i
4 5.0 g
4 NaN i

This is a long format dataframe and I have my hiqual variable for my first measurement wave (g). I would like to populate the NaN values for the subsequent measurement waves (i and k) as the same value give in wave g for each ID.

I tried using fillna() but I am not sure how to provide the two conditions of ID and Wave and how to populate based on that. I would be grateful for any help/suggestions on this?

答案1

得分: 1

确切的预期输出不清楚,但我认为你可能想要的是:

  1. m = df['hiqual'].isna()
  2. df.loc[m, 'hiqual'] = df['Wave'].mask(m).ffill()
英文:

The exact expected output is unclear, but think you might want:

  1. m = df['hiqual'].isna()
  2. df.loc[m, 'hiqual'] = df['Wave'].mask(m).ffill()

答案2

得分: 0

如果你的数据框已经按IDWave列排序,你可以简单地向前填充数值:

  1. df.sort_values(['ID', 'Wave']).ffill()

你也可以明确使用g值:

  1. g_vals = df[df['Wave']=='g'].set_index('ID')['hiqual']
  2. df['hiqual'] = df['hiqual'].fillna(df['ID'].map(g_vals))
  3. print(df)
  4. print(g_vals)

输出:

  1. ID hiqual Wave
  2. 0 1 1.0 g
  3. 1 1 1.0 i
  4. 2 1 1.0 k
  5. 3 2 1.0 g
  6. 4 2 1.0 i
  7. 5 2 1.0 k
  8. 6 3 1.0 g
  9. 7 3 1.0 i
  10. 8 4 5.0 g
  11. 9 4 5.0 i

g_vals

  1. ID
  2. 1 1.0
  3. 2 1.0
  4. 3 1.0
  5. 4 5.0
  6. Name: hiqual, dtype: float64
英文:

If you dataframe is already ordered by ID and wave columns, you can simply fill forward values:

  1. >>> df.sort_values(['ID', 'Wave']).ffill()
  2. ID hiqual Wave
  3. 0 1 1.0 g
  4. 1 1 1.0 i
  5. 2 1 1.0 k
  6. 3 2 1.0 g
  7. 4 2 1.0 i
  8. 5 2 1.0 k
  9. 6 3 1.0 g
  10. 7 3 1.0 i
  11. 8 4 5.0 g
  12. 9 4 5.0 i

You can also use explicitly g values:

  1. g_vals = df[df['Wave']=='g'].set_index('ID')['hiqual']
  2. df['hiqual'] = df['hiqual'].fillna(df['ID'].map(g_vals))
  3. print(df)
  4. print(g_vals)
  5. # Output
  6. ID hiqual Wave
  7. 0 1 1.0 g
  8. 1 1 1.0 i
  9. 2 1 1.0 k
  10. 3 2 1.0 g
  11. 4 2 1.0 i
  12. 5 2 1.0 k
  13. 6 3 1.0 g
  14. 7 3 1.0 i
  15. 8 4 5.0 g
  16. 9 4 5.0 i
  17. # g_vals
  18. ID
  19. 1 1.0
  20. 2 1.0
  21. 3 1.0
  22. 4 5.0
  23. Name: hiqual, dtype: float64

huangapple
  • 本文由 发表于 2023年2月9日 01:28:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/75389569.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定