英文:
Pandas data manipulation, calculating a column value based on other rows of the same column
问题
我希望在pandas数据框中进行如下数据操作:
a = {'idx': range(8),
'col': [47, 33, 23, 33, 32, 31, 22, 5]}
df = pd.DataFrame(a)
print(df)
idx col
0 47
1 33
2 23
3 33
4 32
5 31
6 22
7 5
dtype: int64
我的期望输出是:
idx col desired
0 47 14
1 33 10
2 23 -10
3 33 1
4 32 1
5 31 9
6 22 17
7 5 5
计算如下所示。
英文:
I wish to do a data manipulation as follows in a pandas dataframe:
a = {'idx': range(8),
'col': [47,33,23,33,32,31,22,5],
}
df = pd.DataFrame(a)
print(df)
idx col
0 47
1 33
2 23
3 33
4 32
5 31
6 22
7 5
My desired output is:
idx col desired
0 47 14
1 33 10
2 23 -10
3 33 1
4 32 1
5 31 9
6 22 17
7 5 5
The calculation is as follows.
答案1
得分: 3
import numpy as np
df['desired'] = -np.diff(df['col'], append=0)
英文:
Same solution as @mozway with numpy
(which is faster):
import numpy as np
df['desired'] = -np.diff(df['col'], append=0)
Output:
>>> df
idx col desired
0 0 47 14
1 1 33 10
2 2 23 -10
3 3 33 1
4 4 32 1
5 5 31 9
6 6 22 17
7 7 5 5
For 10k records:
# @mozway
>>> %timeit df['col'].diff(-1).fillna(df['col'])
281 µs ± 14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# @GodIsOne
>>> %timeit df['col'] - df['col'].shift(-1, fill_value=0)
144 µs ± 4.18 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# @Corralien
>>> %timeit (-np.diff(df['col'], append=0))
32.7 µs ± 951 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
答案2
得分: 1
IIUC,您需要一个反转的 [`diff`](https://pandas.pydata.org/docs/reference/api/pandas.Series.diff.html) 和 [`fillna`](https://pandas.pydata.org/docs/reference/api/pandas.Series.fillna.html):
df['desired'] = df['col'].diff(-1).fillna(df['col'])
输出:
idx col desired
0 0 47 14.0
1 1 33 10.0
2 2 23 -10.0
3 3 33 1.0
4 4 32 1.0
5 5 31 9.0
6 6 22 17.0
7 7 5 5.0
<details>
<summary>英文:</summary>
IIUC, you need a reversed [`diff`](https://pandas.pydata.org/docs/reference/api/pandas.Series.diff.html), and [`fillna`](https://pandas.pydata.org/docs/reference/api/pandas.Series.fillna.html):
df['desired'] = df['col'].diff(-1).fillna(df['col'])
Output:
idx col desired
0 0 47 14.0
1 1 33 10.0
2 2 23 -10.0
3 3 33 1.0
4 4 32 1.0
5 5 31 9.0
6 6 22 17.0
7 7 5 5.0
</details>
# 答案3
**得分**: 1
a = {'idx': range(8), 'col': [47, 33, 23, 33, 32, 31, 22, 5]}
df = pd.DataFrame(a)
df['col'] - df['col'].shift(-1, fill_value=0)
0 14
1 10
2 -10
3 1
4 1
5 9
6 17
7 5
Name: col, dtype: int64
**************
df['desired'] = df['col'] - df['col'].shift(-1, fill_value=0)
idx col desired
0 0 47 14
1 1 33 10
2 2 23 -10
3 3 33 1
4 4 32 1
5 5 31 9
6 6 22 17
7 7 5 5
<details>
<summary>英文:</summary>
a = {'idx': range(8),
'col': [47,33,23,33,32,31,22,5],
}
df = pd.DataFrame(a)
df['col'] - df['col'].shift(-1, fill_value=0)
0 14
1 10
2 -10
3 1
4 1
5 9
6 17
7 5
Name: col, dtype: int64
**************
df['desired'] = df['col'] - df['col'].shift(-1, fill_value=0)
idx col desired
0 0 47 14
1 1 33 10
2 2 23 -10
3 3 33 1
4 4 32 1
5 5 31 9
6 6 22 17
7 7 5 5
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论