英文:
Pandas data manipulation, calculating a column value based on other rows of the same column
问题
我希望在pandas数据框中进行如下数据操作:
a = {'idx': range(8),
     'col': [47, 33, 23, 33, 32, 31, 22, 5]}
df = pd.DataFrame(a)
print(df)
idx    col
0    47
1    33
2    23
3    33
4    32
5    31
6    22
7     5
dtype: int64
我的期望输出是:
idx    col    desired
0    47         14
1    33         10
2    23        -10
3    33          1
4    32          1
5    31          9
6    22        17
7     5           5
计算如下所示。
英文:
I wish to do a data manipulation as follows in a pandas dataframe:
a = {'idx': range(8),
     'col': [47,33,23,33,32,31,22,5],
     }
df = pd.DataFrame(a)
print(df)
idx	col
0	47
1	33
2	23
3	33
4	32
5	31
6	22
7	5
My desired output is:
idx	col	desired
0	47	14
1	33	10
2	23	-10
3	33	1
4	32	1
5	31	9
6	22	17
7	5	5
The calculation is as follows.
答案1
得分: 3
import numpy as np
df['desired'] = -np.diff(df['col'], append=0)
英文:
Same solution as @mozway with numpy (which is faster):
import numpy as np
df['desired'] = -np.diff(df['col'], append=0)
Output:
>>> df
   idx  col  desired
0    0   47       14
1    1   33       10
2    2   23      -10
3    3   33        1
4    4   32        1
5    5   31        9
6    6   22       17
7    7    5        5
For 10k records:
# @mozway
>>> %timeit df['col'].diff(-1).fillna(df['col'])
281 µs ± 14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# @GodIsOne
>>> %timeit df['col'] - df['col'].shift(-1, fill_value=0)
144 µs ± 4.18 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# @Corralien
>>> %timeit (-np.diff(df['col'], append=0))
32.7 µs ± 951 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
答案2
得分: 1
IIUC,您需要一个反转的 [`diff`](https://pandas.pydata.org/docs/reference/api/pandas.Series.diff.html) 和 [`fillna`](https://pandas.pydata.org/docs/reference/api/pandas.Series.fillna.html):
df['desired'] = df['col'].diff(-1).fillna(df['col'])
输出:
idx  col  desired
0    0   47     14.0
1    1   33     10.0
2    2   23    -10.0
3    3   33      1.0
4    4   32      1.0
5    5   31      9.0
6    6   22     17.0
7    7    5      5.0
<details>
<summary>英文:</summary>
IIUC, you need a reversed [`diff`](https://pandas.pydata.org/docs/reference/api/pandas.Series.diff.html), and [`fillna`](https://pandas.pydata.org/docs/reference/api/pandas.Series.fillna.html):
df['desired'] = df['col'].diff(-1).fillna(df['col'])
Output:
idx  col  desired
0    0   47     14.0
1    1   33     10.0
2    2   23    -10.0
3    3   33      1.0
4    4   32      1.0
5    5   31      9.0
6    6   22     17.0
7    7    5      5.0
</details>
# 答案3
**得分**: 1
a = {'idx': range(8), 'col': [47, 33, 23, 33, 32, 31, 22, 5]}
df = pd.DataFrame(a)
df['col'] - df['col'].shift(-1, fill_value=0)
0    14
1    10
2   -10
3     1
4     1
5     9
6    17
7     5
Name: col, dtype: int64
**************
df['desired'] = df['col'] - df['col'].shift(-1, fill_value=0)
   idx  col  desired
0    0   47       14
1    1   33       10
2    2   23      -10
3    3   33        1
4    4   32        1
5    5   31        9
6    6   22       17
7    7    5        5
<details>
<summary>英文:</summary>
    a = {'idx': range(8),
         'col': [47,33,23,33,32,31,22,5],
         }
    
    df = pd.DataFrame(a)
    df['col'] - df['col'].shift(-1, fill_value=0)
    0    14
    1    10
    2   -10
    3     1
    4     1
    5     9
    6    17
    7     5
    Name: col, dtype: int64
**************
    df['desired'] = df['col'] - df['col'].shift(-1, fill_value=0)
    
       idx  col  desired
    0    0   47       14
    1    1   33       10
    2    2   23      -10
    3    3   33        1
    4    4   32        1
    5    5   31        9
    6    6   22       17
    7    7    5        5
</details>
				通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。



评论