2023年7月12日 22:27:40go评论91阅读模式

英文:

Fillna row wise as per change in another column

问题

我有一个数据框，其中有一列包含多个NaN值。数据框如下所示：

            col_1   col_2
2022-10-31  99.094  102.498
2022-11-30  99.001  101.880
2022-12-31     NaN  108.498
2023-01-31     NaN  100.500

我想根据以下简单的计算来填充这些NaN值：

desired_val = (col_2中的当前值 - col_2中的前一个值) + col_1中的前一个值

也就是说，

df.loc['2022-12-31', 'col_1'] 应该等于 (108.498 - 101.880) + 99.001 = 105.619

和 df.loc['2023-01-31', 'col_1'] 应该等于 (100.500 - 108.498) + 105.619 = 97.621

我发现可以通过逐行操作来解决，但当数据集很大时速度较慢。

for row in df.columns:
        if df.loc[row, 'col_1'] == np.Nan:
            df.loc[row, 'col_1'] = (
              (df['col_2']-df['col_2'].shift(1)
              )
              + df['col_1'].shift(1)
              ).loc[row, 'col_1']

是否有基于pandas的按列解决方案？

英文:

I have a data frame in which there is a column containing several NaN values. The dataframe looks like this:

            col_1	col_2
2022-10-31	99.094	102.498
2022-11-30	99.001	101.880
2022-12-31	   NaN 	108.498
2023-01-31	   NaN 	100.500

I want to fill those NaN based on the simple calculation below:

desired_val = (current value in col_2 - previous value in col_2) + previous value in col_1

which means,

df.loc['2022-12-31', 'col_1'] should be = (108.498 - 101.880) + 99.001 = 105.619

and df.loc['2023-01-31', 'col_1'] should be = (100.500 - 108.498) + 105.619 = 97.621

I found solution by using row by row operation but it is slow when the dataset is big.

for row in df.columns:
        if df.loc[row, &#39;col_1&#39;] == np.Nan:
            df.loc[row, &#39;col_1&#39;] = (
              (df[&#39;col_2&#39;]-df[&#39;col_2&#39;].shift(1)
              )
              + df[&#39;col_1&#39;].shift(1)
              ).loc[row, &#39;col_1&#39;]

Is there any column wise pandas solution for that?

答案1

得分: 1

你可以使用cumsum来累积col_2中的差异，并使用这些差异来基于最后一个可用值计算col_1：

df["col_1"] = df["col_1"].fillna(df["col_2"].diff().where(df["col_1"].isna()).cumsum() + df["col_1"].ffill())

df
              col_1    col_2
2022-10-31   99.094  102.498
2022-11-30   99.001  101.880
2022-12-31  105.619  108.498
2023-01-31   97.621  100.500

请注意，这是一个Python代码示例，用于在DataFrame中执行上述操作。

英文:

You can use cumsum to cumulate differences in col_2 and use those to calculate col_1 based on the last available value:

df[&quot;col_1&quot;] = df[&quot;col_1&quot;].fillna(df[&quot;col_2&quot;].diff().where(df[&quot;col_1&quot;].isna()).cumsum()+df[&quot;col_1&quot;].ffill())
&gt;&gt;&gt; df
              col_1    col_2
2022-10-31   99.094  102.498
2022-11-30   99.001  101.880
2022-12-31  105.619  108.498
2023-01-31   97.621  100.500

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

根据另一列的更改逐行填充NaN值。

问题

答案1

如何检查行趋势并将失败案例的差异和差异百分比分别添加到单独的列中

减小维度的可视化，用于真实值与预测值。

Webscraping使用Selenium在Raspberry Pi Zero上：状态代码为：-4

voxel51 / fiftyone: error while loading shared libraries: libcrypto.so.1.1: cannot open shared object file: No such file or directory

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。