根据另一列的更改逐行填充NaN值。

huangapple go评论63阅读模式
英文:

Fillna row wise as per change in another column

问题

我有一个数据框,其中有一列包含多个NaN值。数据框如下所示:

            col_1   col_2
2022-10-31  99.094  102.498
2022-11-30  99.001  101.880
2022-12-31     NaN  108.498
2023-01-31     NaN  100.500

我想根据以下简单的计算来填充这些NaN值:

desired_val = (col_2中的当前值 - col_2中的前一个值) + col_1中的前一个值

也就是说,

df.loc['2022-12-31', 'col_1'] 应该等于 (108.498 - 101.880) + 99.001 = 105.619

df.loc['2023-01-31', 'col_1'] 应该等于 (100.500 - 108.498) + 105.619 = 97.621

我发现可以通过逐行操作来解决,但当数据集很大时速度较慢。

for row in df.columns:
        if df.loc[row, 'col_1'] == np.Nan:
            df.loc[row, 'col_1'] = (
              (df['col_2']-df['col_2'].shift(1)
              )
              + df['col_1'].shift(1)
              ).loc[row, 'col_1']

是否有基于pandas的按列解决方案?

英文:

I have a data frame in which there is a column containing several NaN values. The dataframe looks like this:

            col_1	col_2
2022-10-31	99.094	102.498
2022-11-30	99.001	101.880
2022-12-31	   NaN 	108.498
2023-01-31	   NaN 	100.500

I want to fill those NaN based on the simple calculation below:

desired_val = (current value in col_2 - previous value in col_2) + previous value in col_1

which means,

df.loc['2022-12-31', 'col_1'] should be = (108.498 - 101.880) + 99.001 = 105.619

and df.loc['2023-01-31', 'col_1'] should be = (100.500 - 108.498) + 105.619 = 97.621

I found solution by using row by row operation but it is slow when the dataset is big.

for row in df.columns:
        if df.loc[row, 'col_1'] == np.Nan:
            df.loc[row, 'col_1'] = (
              (df['col_2']-df['col_2'].shift(1)
              )
              + df['col_1'].shift(1)
              ).loc[row, 'col_1']

Is there any column wise pandas solution for that?

答案1

得分: 1

你可以使用cumsum来累积col_2中的差异,并使用这些差异来基于最后一个可用值计算col_1

df["col_1"] = df["col_1"].fillna(df["col_2"].diff().where(df["col_1"].isna()).cumsum() + df["col_1"].ffill())
df
              col_1    col_2
2022-10-31   99.094  102.498
2022-11-30   99.001  101.880
2022-12-31  105.619  108.498
2023-01-31   97.621  100.500

请注意,这是一个Python代码示例,用于在DataFrame中执行上述操作。

英文:

You can use cumsum to cumulate differences in col_2 and use those to calculate col_1 based on the last available value:

df["col_1"] = df["col_1"].fillna(df["col_2"].diff().where(df["col_1"].isna()).cumsum()+df["col_1"].ffill())

>>> df
              col_1    col_2
2022-10-31   99.094  102.498
2022-11-30   99.001  101.880
2022-12-31  105.619  108.498
2023-01-31   97.621  100.500

huangapple
  • 本文由 发表于 2023年7月12日 22:27:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76671660.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定