Pandas数据操作,根据同一列的其他行计算列值

huangapple go评论95阅读模式
英文:

Pandas data manipulation, calculating a column value based on other rows of the same column

问题

我希望在pandas数据框中进行如下数据操作:

a = {'idx': range(8),
     'col': [47, 33, 23, 33, 32, 31, 22, 5]}

df = pd.DataFrame(a)
print(df)

idx    col
0    47
1    33
2    23
3    33
4    32
5    31
6    22
7     5
dtype: int64

我的期望输出是:

idx    col    desired
0    47         14
1    33         10
2    23        -10
3    33          1
4    32          1
5    31          9
6    22        17
7     5           5

计算如下所示。
英文:

I wish to do a data manipulation as follows in a pandas dataframe:

a = {'idx': range(8),
     'col': [47,33,23,33,32,31,22,5],
     }

df = pd.DataFrame(a)
print(df)

idx	col
0	47
1	33
2	23
3	33
4	32
5	31
6	22
7	5

My desired output is:

idx	col	desired
0	47	14
1	33	10
2	23	-10
3	33	1
4	32	1
5	31	9
6	22	17
7	5	5

The calculation is as follows.

Pandas数据操作,根据同一列的其他行计算列值

答案1

得分: 3

import numpy as np

df['desired'] = -np.diff(df['col'], append=0)
英文:

Same solution as @mozway with numpy (which is faster):

import numpy as np

df['desired'] = -np.diff(df['col'], append=0)

Output:

>>> df
   idx  col  desired
0    0   47       14
1    1   33       10
2    2   23      -10
3    3   33        1
4    4   32        1
5    5   31        9
6    6   22       17
7    7    5        5

For 10k records:

# @mozway
>>> %timeit df['col'].diff(-1).fillna(df['col'])
281 µs ± 14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# @GodIsOne
>>> %timeit df['col'] - df['col'].shift(-1, fill_value=0)
144 µs ± 4.18 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

# @Corralien
>>> %timeit (-np.diff(df['col'], append=0))
32.7 µs ± 951 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

答案2

得分: 1

IIUC,您需要一个反转的 [`diff`](https://pandas.pydata.org/docs/reference/api/pandas.Series.diff.html) 和 [`fillna`](https://pandas.pydata.org/docs/reference/api/pandas.Series.fillna.html):

df['desired'] = df['col'].diff(-1).fillna(df['col'])

输出:

idx col desired
0 0 47 14.0
1 1 33 10.0
2 2 23 -10.0
3 3 33 1.0
4 4 32 1.0
5 5 31 9.0
6 6 22 17.0
7 7 5 5.0


<details>
<summary>英文:</summary>

IIUC, you need a reversed [`diff`](https://pandas.pydata.org/docs/reference/api/pandas.Series.diff.html), and [`fillna`](https://pandas.pydata.org/docs/reference/api/pandas.Series.fillna.html):

df['desired'] = df['col'].diff(-1).fillna(df['col'])

Output:

idx col desired
0 0 47 14.0
1 1 33 10.0
2 2 23 -10.0
3 3 33 1.0
4 4 32 1.0
5 5 31 9.0
6 6 22 17.0
7 7 5 5.0


</details>



# 答案3
**得分**: 1

a = {'idx': range(8), 'col': [47, 33, 23, 33, 32, 31, 22, 5]}

df = pd.DataFrame(a)

df['col'] - df['col'].shift(-1, fill_value=0)

0    14
1    10
2   -10
3     1
4     1
5     9
6    17
7     5
Name: col, dtype: int64
**************

df['desired'] = df['col'] - df['col'].shift(-1, fill_value=0)

   idx  col  desired
0    0   47       14
1    1   33       10
2    2   23      -10
3    3   33        1
4    4   32        1
5    5   31        9
6    6   22       17
7    7    5        5

<details>
<summary>英文:</summary>

    a = {&#39;idx&#39;: range(8),
         &#39;col&#39;: [47,33,23,33,32,31,22,5],
         }
    
    df = pd.DataFrame(a)

    df[&#39;col&#39;] - df[&#39;col&#39;].shift(-1, fill_value=0)

    0    14
    1    10
    2   -10
    3     1
    4     1
    5     9
    6    17
    7     5
    Name: col, dtype: int64
**************

    df[&#39;desired&#39;] = df[&#39;col&#39;] - df[&#39;col&#39;].shift(-1, fill_value=0)
    
       idx  col  desired
    0    0   47       14
    1    1   33       10
    2    2   23      -10
    3    3   33        1
    4    4   32        1
    5    5   31        9
    6    6   22       17
    7    7    5        5



</details>



huangapple
  • 本文由 发表于 2023年6月8日 15:07:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/76429378.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定