Pandas数据框内插值使用常数值。

huangapple go评论107阅读模式
英文:

Pandas Dataframe interpolate inside with constant value

问题

如何实现:

  1. [In1]: df = pd.DataFrame({
  2. 'col1': [100, np.nan, np.nan, 100, np.nan, np.nan, np.nan],
  3. 'col2': [np.nan, 100, np.nan, np.nan, np.nan, 100, np.nan]})
  4. df

结果为:

  1. [Out1]: col1 col2
  2. 0 100 NaN
  3. 1 NaN 100
  4. 2 NaN NaN
  5. 3 100 NaN
  6. 4 NaN NaN
  7. 5 NaN 100
  8. 6 NaN NaN

转化为:

  1. [Out2]: col1 col2
  2. 0 100 NaN
  3. 1 0 100
  4. 2 0 0
  5. 3 100 0
  6. 4 NaN NaN
  7. 5 NaN 100
  8. 6 NaN NaN

所以基本上我想要在内部区域进行插值/填充NaN,使用limit=2。请注意,在col2中有三个连续的NaN,但只有其中两个被替换为零。

英文:

How to make:

  1. [In1]: df = pd.DataFrame({
  2. 'col1': [100, np.nan, np.nan, 100, np.nan, np.nan, np.nan],
  3. 'col2': [np.nan, 100, np.nan, np.nan, np.nan, 100, np.nan]})
  4. df
  5. [Out1]: col1 col2
  6. 0 100 NaN
  7. 1 NaN 100
  8. 2 NaN NaN
  9. 3 100 NaN
  10. 4 NaN NaN
  11. 5 NaN 100
  12. 6 NaN NaN

into:

  1. [Out2]: col1 col2
  2. 0 100 NaN
  3. 1 0 100
  4. 2 0 0
  5. 3 100 0
  6. 4 NaN NaN
  7. 5 NaN 100
  8. 6 NaN NaN

So basically I want to interpolate/fill NaN's with zero only for the inside area and a limit=2. Note in col2 there are three consecutive NaN's in the middle and only two of them are replaced with zero.

答案1

得分: 1

以下是翻译好的部分:

你可以构建掩码来识别非-NAs,以及内部的值(借助双重 cummax 函数):

  1. m = df.notna()
  2. m2 = m.cummax() & m[::-1].cummax()
  3. out = df.fillna(df.mask(m, 0).ffill(limit=2).where(m2))

或者使用 interpolate 函数:

  1. m = df.notna()
  2. out = df.fillna(df.mask(m, 0).interpolate(limit=2, limit_area='inside'))
  3. # 或者如果只有数字
  4. out = df.fillna(df.mul(0).interpolate(limit=2, limit_area='inside'))

输出结果:

  1. col1 col2
  2. 0 100.0 NaN
  3. 1 0.0 100.0
  4. 2 0.0 0.0
  5. 3 100.0 0.0
  6. 4 NaN NaN
  7. 5 NaN 100.0
  8. 6 NaN NaN
英文:

You can build masks to identify the non-NAs, and the inner values (with help of a double cummax):

  1. m = df.notna()
  2. m2 = m.cummax() & m[::-1].cummax()
  3. out = df.fillna(df.mask(m, 0).ffill(limit=2).where(m2))

Or with interpolate:

  1. m = df.notna()
  2. out = df.fillna(df.mask(m, 0).interpolate(limit=2, limit_area='inside'))
  3. # or if you only have numbers
  4. out = df.fillna(df.mul(0).interpolate(limit=2, limit_area='inside'))

Output:

  1. col1 col2
  2. 0 100.0 NaN
  3. 1 0.0 100.0
  4. 2 0.0 0.0
  5. 3 100.0 0.0
  6. 4 NaN NaN
  7. 5 NaN 100.0
  8. 6 NaN NaN

答案2

得分: 0

我们可以这样做:

  1. out = df.ffill(limit=2).mask(df.bfill().isna())
  2. out = out.mask(out.ne(df) & out.notna(), 0)
  3. Out[83]:
  4. col1 col2
  5. 0 100.0 NaN
  6. 1 0.0 100.0
  7. 2 0.0 0.0
  8. 3 100.0 0.0
  9. 4 NaN NaN
  10. 5 NaN 100.0
  11. 6 NaN NaN
英文:

We could do

  1. out = df.ffill(limit=2).mask(df.bfill().isna())
  2. out = out.mask(out.ne(df) & out.notna(),0)
  3. Out[83]:
  4. col1 col2
  5. 0 100.0 NaN
  6. 1 0.0 100.0
  7. 2 0.0 0.0
  8. 3 100.0 0.0
  9. 4 NaN NaN
  10. 5 NaN 100.0
  11. 6 NaN NaN

huangapple
  • 本文由 发表于 2023年6月29日 22:35:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/76582088.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定