根据另一列的更改逐行填充NaN值。

huangapple go评论91阅读模式
英文:

Fillna row wise as per change in another column

问题

我有一个数据框,其中有一列包含多个NaN值。数据框如下所示:

  1. col_1 col_2
  2. 2022-10-31 99.094 102.498
  3. 2022-11-30 99.001 101.880
  4. 2022-12-31 NaN 108.498
  5. 2023-01-31 NaN 100.500

我想根据以下简单的计算来填充这些NaN值:

  1. desired_val = (col_2中的当前值 - col_2中的前一个值) + col_1中的前一个值

也就是说,

df.loc['2022-12-31', 'col_1'] 应该等于 (108.498 - 101.880) + 99.001 = 105.619

df.loc['2023-01-31', 'col_1'] 应该等于 (100.500 - 108.498) + 105.619 = 97.621

我发现可以通过逐行操作来解决,但当数据集很大时速度较慢。

  1. for row in df.columns:
  2. if df.loc[row, 'col_1'] == np.Nan:
  3. df.loc[row, 'col_1'] = (
  4. (df['col_2']-df['col_2'].shift(1)
  5. )
  6. + df['col_1'].shift(1)
  7. ).loc[row, 'col_1']

是否有基于pandas的按列解决方案?

英文:

I have a data frame in which there is a column containing several NaN values. The dataframe looks like this:

  1. col_1 col_2
  2. 2022-10-31 99.094 102.498
  3. 2022-11-30 99.001 101.880
  4. 2022-12-31 NaN 108.498
  5. 2023-01-31 NaN 100.500

I want to fill those NaN based on the simple calculation below:

  1. desired_val = (current value in col_2 - previous value in col_2) + previous value in col_1

which means,

df.loc['2022-12-31', 'col_1'] should be = (108.498 - 101.880) + 99.001 = 105.619

and df.loc['2023-01-31', 'col_1'] should be = (100.500 - 108.498) + 105.619 = 97.621

I found solution by using row by row operation but it is slow when the dataset is big.

  1. for row in df.columns:
  2. if df.loc[row, 'col_1'] == np.Nan:
  3. df.loc[row, 'col_1'] = (
  4. (df['col_2']-df['col_2'].shift(1)
  5. )
  6. + df['col_1'].shift(1)
  7. ).loc[row, 'col_1']

Is there any column wise pandas solution for that?

答案1

得分: 1

你可以使用cumsum来累积col_2中的差异,并使用这些差异来基于最后一个可用值计算col_1

  1. df["col_1"] = df["col_1"].fillna(df["col_2"].diff().where(df["col_1"].isna()).cumsum() + df["col_1"].ffill())
  1. df
  2. col_1 col_2
  3. 2022-10-31 99.094 102.498
  4. 2022-11-30 99.001 101.880
  5. 2022-12-31 105.619 108.498
  6. 2023-01-31 97.621 100.500

请注意,这是一个Python代码示例,用于在DataFrame中执行上述操作。

英文:

You can use cumsum to cumulate differences in col_2 and use those to calculate col_1 based on the last available value:

  1. df["col_1"] = df["col_1"].fillna(df["col_2"].diff().where(df["col_1"].isna()).cumsum()+df["col_1"].ffill())
  2. >>> df
  3. col_1 col_2
  4. 2022-10-31 99.094 102.498
  5. 2022-11-30 99.001 101.880
  6. 2022-12-31 105.619 108.498
  7. 2023-01-31 97.621 100.500

huangapple
  • 本文由 发表于 2023年7月12日 22:27:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76671660.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定