Pandas数据框内插值使用常数值。

huangapple go评论64阅读模式
英文:

Pandas Dataframe interpolate inside with constant value

问题

如何实现:

[In1]:  df = pd.DataFrame({
            'col1': [100, np.nan, np.nan, 100, np.nan, np.nan, np.nan],
            'col2': [np.nan, 100, np.nan, np.nan, np.nan, 100, np.nan]})
        df

结果为:

[Out1]:       col1    col2
        0      100     NaN
        1      NaN     100
        2      NaN     NaN
        3      100     NaN
        4      NaN     NaN
        5      NaN     100
        6      NaN     NaN

转化为:

[Out2]:       col1    col2
        0      100     NaN
        1        0     100
        2        0       0
        3      100       0
        4      NaN     NaN
        5      NaN     100
        6      NaN     NaN

所以基本上我想要在内部区域进行插值/填充NaN,使用limit=2。请注意,在col2中有三个连续的NaN,但只有其中两个被替换为零。

英文:

How to make:

[In1]:  df = pd.DataFrame({
            'col1': [100, np.nan, np.nan, 100, np.nan, np.nan, np.nan],
            'col2': [np.nan, 100, np.nan, np.nan, np.nan, 100, np.nan]})
        df

[Out1]:       col1    col2
        0      100     NaN
        1      NaN     100
        2      NaN     NaN
        3      100     NaN
        4      NaN     NaN
        5      NaN     100
        6      NaN     NaN

into:

[Out2]:       col1    col2
        0      100     NaN
        1        0     100
        2        0       0
        3      100       0
        4      NaN     NaN
        5      NaN     100
        6      NaN     NaN

So basically I want to interpolate/fill NaN's with zero only for the inside area and a limit=2. Note in col2 there are three consecutive NaN's in the middle and only two of them are replaced with zero.

答案1

得分: 1

以下是翻译好的部分:

你可以构建掩码来识别非-NAs,以及内部的值(借助双重 cummax 函数):

m = df.notna()
m2 = m.cummax() & m[::-1].cummax()

out = df.fillna(df.mask(m, 0).ffill(limit=2).where(m2))

或者使用 interpolate 函数:

m = df.notna()

out = df.fillna(df.mask(m, 0).interpolate(limit=2, limit_area='inside'))

# 或者如果只有数字
out = df.fillna(df.mul(0).interpolate(limit=2, limit_area='inside'))

输出结果:

    col1   col2
0  100.0    NaN
1    0.0  100.0
2    0.0    0.0
3  100.0    0.0
4    NaN    NaN
5    NaN  100.0
6    NaN    NaN
英文:

You can build masks to identify the non-NAs, and the inner values (with help of a double cummax):

m = df.notna()
m2 = m.cummax() & m[::-1].cummax()

out = df.fillna(df.mask(m, 0).ffill(limit=2).where(m2))

Or with interpolate:

m = df.notna()

out = df.fillna(df.mask(m, 0).interpolate(limit=2, limit_area='inside'))

# or if you only have numbers
out = df.fillna(df.mul(0).interpolate(limit=2, limit_area='inside'))

Output:

    col1   col2
0  100.0    NaN
1    0.0  100.0
2    0.0    0.0
3  100.0    0.0
4    NaN    NaN
5    NaN  100.0
6    NaN    NaN

答案2

得分: 0

我们可以这样做:

out = df.ffill(limit=2).mask(df.bfill().isna())
out = out.mask(out.ne(df) & out.notna(), 0)
Out[83]:
    col1   col2
0  100.0    NaN
1    0.0  100.0
2    0.0    0.0
3  100.0    0.0
4    NaN    NaN
5    NaN  100.0
6    NaN    NaN
英文:

We could do

out = df.ffill(limit=2).mask(df.bfill().isna())
out = out.mask(out.ne(df) & out.notna(),0)
Out[83]: 
    col1   col2
0  100.0    NaN
1    0.0  100.0
2    0.0    0.0
3  100.0    0.0
4    NaN    NaN
5    NaN  100.0
6    NaN    NaN

huangapple
  • 本文由 发表于 2023年6月29日 22:35:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/76582088.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定