2023年6月8日 15:07:16go评论129阅读模式

英文:

Pandas data manipulation, calculating a column value based on other rows of the same column

问题

我希望在pandas数据框中进行如下数据操作：
a = {'idx': range(8),
     'col': [47, 33, 23, 33, 32, 31, 22, 5]}
df = pd.DataFrame(a)
print(df)
idx    col
0    47
1    33
2    23
3    33
4    32
5    31
6    22
7     5
dtype: int64
我的期望输出是：
idx    col    desired
0    47         14
1    33         10
2    23        -10
3    33          1
4    32          1
5    31          9
6    22        17
7     5           5
计算如下所示。

英文:

I wish to do a data manipulation as follows in a pandas dataframe:

a = {&#39;idx&#39;: range(8),
     &#39;col&#39;: [47,33,23,33,32,31,22,5],
     }
df = pd.DataFrame(a)
print(df)
idx	col
0	47
1	33
2	23
3	33
4	32
5	31
6	22
7	5

My desired output is:

idx	col	desired
0	47	14
1	33	10
2	23	-10
3	33	1
4	32	1
5	31	9
6	22	17
7	5	5

The calculation is as follows.

答案1

得分: 3

import numpy as np
df['desired'] = -np.diff(df['col'], append=0)

英文:

Same solution as @mozway with numpy (which is faster):

import numpy as np
df[&#39;desired&#39;] = -np.diff(df[&#39;col&#39;], append=0)

Output:

&gt;&gt;&gt; df
   idx  col  desired
0    0   47       14
1    1   33       10
2    2   23      -10
3    3   33        1
4    4   32        1
5    5   31        9
6    6   22       17
7    7    5        5

For 10k records:

# @mozway
&gt;&gt;&gt; %timeit df[&#39;col&#39;].diff(-1).fillna(df[&#39;col&#39;])
281 &#181;s &#177; 14 &#181;s per loop (mean &#177; std. dev. of 7 runs, 1000 loops each)
# @GodIsOne
&gt;&gt;&gt; %timeit df[&#39;col&#39;] - df[&#39;col&#39;].shift(-1, fill_value=0)
144 &#181;s &#177; 4.18 &#181;s per loop (mean &#177; std. dev. of 7 runs, 10000 loops each)
# @Corralien
&gt;&gt;&gt; %timeit (-np.diff(df[&#39;col&#39;], append=0))
32.7 &#181;s &#177; 951 ns per loop (mean &#177; std. dev. of 7 runs, 10000 loops each)

答案2

得分: 1

IIUC，您需要一个反转的 [`diff`](https://pandas.pydata.org/docs/reference/api/pandas.Series.diff.html) 和 [`fillna`](https://pandas.pydata.org/docs/reference/api/pandas.Series.fillna.html)：

df['desired'] = df['col'].diff(-1).fillna(df['col'])

输出：

idx col desired
0 0 47 14.0
1 1 33 10.0
2 2 23 -10.0
3 3 33 1.0
4 4 32 1.0
5 5 31 9.0
6 6 22 17.0
7 7 5 5.0


<details>
<summary>英文:</summary>
IIUC, you need a reversed [`diff`](https://pandas.pydata.org/docs/reference/api/pandas.Series.diff.html), and [`fillna`](https://pandas.pydata.org/docs/reference/api/pandas.Series.fillna.html):

df['desired'] = df['col'].diff(-1).fillna(df['col'])

Output:

idx col desired
0 0 47 14.0
1 1 33 10.0
2 2 23 -10.0
3 3 33 1.0
4 4 32 1.0
5 5 31 9.0
6 6 22 17.0
7 7 5 5.0


</details>
# 答案3
**得分**: 1
a = {'idx': range(8), 'col': [47, 33, 23, 33, 32, 31, 22, 5]}
df = pd.DataFrame(a)
df['col'] - df['col'].shift(-1, fill_value=0)
0    14
1    10
2   -10
3     1
4     1
5     9
6    17
7     5
Name: col, dtype: int64
**************
df['desired'] = df['col'] - df['col'].shift(-1, fill_value=0)
   idx  col  desired
0    0   47       14
1    1   33       10
2    2   23      -10
3    3   33        1
4    4   32        1
5    5   31        9
6    6   22       17
7    7    5        5
<details>
<summary>英文:</summary>
    a = {&#39;idx&#39;: range(8),
         &#39;col&#39;: [47,33,23,33,32,31,22,5],
         }
    
    df = pd.DataFrame(a)
    df[&#39;col&#39;] - df[&#39;col&#39;].shift(-1, fill_value=0)
    0    14
    1    10
    2   -10
    3     1
    4     1
    5     9
    6    17
    7     5
    Name: col, dtype: int64
**************
    df[&#39;desired&#39;] = df[&#39;col&#39;] - df[&#39;col&#39;].shift(-1, fill_value=0)
    
       idx  col  desired
    0    0   47       14
    1    1   33       10
    2    2   23      -10
    3    3   33        1
    4    4   32        1
    5    5   31        9
    6    6   22       17
    7    7    5        5
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas数据操作，根据同一列的其他行计算列值

问题

答案1

答案2

在 pandas DataFrame 中一次性重新排序多个列级

运行Salome的Python API。

如何在实例化期间正确使用ttk.Treeview.bbox？

绘制元组列表（包含图表编号）

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。