英文:
Fillna row wise as per change in another column
问题
我有一个数据框,其中有一列包含多个NaN值。数据框如下所示:
col_1 col_2
2022-10-31 99.094 102.498
2022-11-30 99.001 101.880
2022-12-31 NaN 108.498
2023-01-31 NaN 100.500
我想根据以下简单的计算来填充这些NaN值:
desired_val = (col_2中的当前值 - col_2中的前一个值) + col_1中的前一个值
也就是说,
df.loc['2022-12-31', 'col_1']
应该等于 (108.498 - 101.880) + 99.001 = 105.619
和 df.loc['2023-01-31', 'col_1']
应该等于 (100.500 - 108.498) + 105.619 = 97.621
我发现可以通过逐行操作来解决,但当数据集很大时速度较慢。
for row in df.columns:
if df.loc[row, 'col_1'] == np.Nan:
df.loc[row, 'col_1'] = (
(df['col_2']-df['col_2'].shift(1)
)
+ df['col_1'].shift(1)
).loc[row, 'col_1']
是否有基于pandas的按列解决方案?
英文:
I have a data frame in which there is a column containing several NaN values. The dataframe looks like this:
col_1 col_2
2022-10-31 99.094 102.498
2022-11-30 99.001 101.880
2022-12-31 NaN 108.498
2023-01-31 NaN 100.500
I want to fill those NaN based on the simple calculation below:
desired_val = (current value in col_2 - previous value in col_2) + previous value in col_1
which means,
df.loc['2022-12-31', 'col_1']
should be = (108.498 - 101.880) + 99.001 = 105.619
and df.loc['2023-01-31', 'col_1']
should be = (100.500 - 108.498) + 105.619 = 97.621
I found solution by using row by row operation but it is slow when the dataset is big.
for row in df.columns:
if df.loc[row, 'col_1'] == np.Nan:
df.loc[row, 'col_1'] = (
(df['col_2']-df['col_2'].shift(1)
)
+ df['col_1'].shift(1)
).loc[row, 'col_1']
Is there any column wise pandas solution for that?
答案1
得分: 1
你可以使用cumsum
来累积col_2
中的差异,并使用这些差异来基于最后一个可用值计算col_1
:
df["col_1"] = df["col_1"].fillna(df["col_2"].diff().where(df["col_1"].isna()).cumsum() + df["col_1"].ffill())
df
col_1 col_2
2022-10-31 99.094 102.498
2022-11-30 99.001 101.880
2022-12-31 105.619 108.498
2023-01-31 97.621 100.500
请注意,这是一个Python代码示例,用于在DataFrame中执行上述操作。
英文:
You can use cumsum
to cumulate differences in col_2
and use those to calculate col_1
based on the last available value:
df["col_1"] = df["col_1"].fillna(df["col_2"].diff().where(df["col_1"].isna()).cumsum()+df["col_1"].ffill())
>>> df
col_1 col_2
2022-10-31 99.094 102.498
2022-11-30 99.001 101.880
2022-12-31 105.619 108.498
2023-01-31 97.621 100.500
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论