Pandas数据操作,根据同一列的其他行计算列值

huangapple go评论129阅读模式
英文:

Pandas data manipulation, calculating a column value based on other rows of the same column

问题

  1. 我希望在pandas数据框中进行如下数据操作:
  2. a = {'idx': range(8),
  3. 'col': [47, 33, 23, 33, 32, 31, 22, 5]}
  4. df = pd.DataFrame(a)
  5. print(df)
  6. idx col
  7. 0 47
  8. 1 33
  9. 2 23
  10. 3 33
  11. 4 32
  12. 5 31
  13. 6 22
  14. 7 5
  15. dtype: int64
  16. 我的期望输出是:
  17. idx col desired
  18. 0 47 14
  19. 1 33 10
  20. 2 23 -10
  21. 3 33 1
  22. 4 32 1
  23. 5 31 9
  24. 6 22 17
  25. 7 5 5
  26. 计算如下所示。
英文:

I wish to do a data manipulation as follows in a pandas dataframe:

  1. a = {'idx': range(8),
  2. 'col': [47,33,23,33,32,31,22,5],
  3. }
  4. df = pd.DataFrame(a)
  5. print(df)
  6. idx col
  7. 0 47
  8. 1 33
  9. 2 23
  10. 3 33
  11. 4 32
  12. 5 31
  13. 6 22
  14. 7 5

My desired output is:

  1. idx col desired
  2. 0 47 14
  3. 1 33 10
  4. 2 23 -10
  5. 3 33 1
  6. 4 32 1
  7. 5 31 9
  8. 6 22 17
  9. 7 5 5

The calculation is as follows.

Pandas数据操作,根据同一列的其他行计算列值

答案1

得分: 3

  1. import numpy as np
  2. df['desired'] = -np.diff(df['col'], append=0)
英文:

Same solution as @mozway with numpy (which is faster):

  1. import numpy as np
  2. df['desired'] = -np.diff(df['col'], append=0)

Output:

  1. >>> df
  2. idx col desired
  3. 0 0 47 14
  4. 1 1 33 10
  5. 2 2 23 -10
  6. 3 3 33 1
  7. 4 4 32 1
  8. 5 5 31 9
  9. 6 6 22 17
  10. 7 7 5 5

For 10k records:

  1. # @mozway
  2. >>> %timeit df['col'].diff(-1).fillna(df['col'])
  3. 281 µs ± 14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
  4. # @GodIsOne
  5. >>> %timeit df['col'] - df['col'].shift(-1, fill_value=0)
  6. 144 µs ± 4.18 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  7. # @Corralien
  8. >>> %timeit (-np.diff(df['col'], append=0))
  9. 32.7 µs ± 951 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

答案2

得分: 1

  1. IIUC,您需要一个反转的 [`diff`](https://pandas.pydata.org/docs/reference/api/pandas.Series.diff.html) 和 [`fillna`](https://pandas.pydata.org/docs/reference/api/pandas.Series.fillna.html):

df['desired'] = df['col'].diff(-1).fillna(df['col'])

  1. 输出:

idx col desired
0 0 47 14.0
1 1 33 10.0
2 2 23 -10.0
3 3 33 1.0
4 4 32 1.0
5 5 31 9.0
6 6 22 17.0
7 7 5 5.0

  1. <details>
  2. <summary>英文:</summary>
  3. IIUC, you need a reversed [`diff`](https://pandas.pydata.org/docs/reference/api/pandas.Series.diff.html), and [`fillna`](https://pandas.pydata.org/docs/reference/api/pandas.Series.fillna.html):

df['desired'] = df['col'].diff(-1).fillna(df['col'])

  1. Output:

idx col desired
0 0 47 14.0
1 1 33 10.0
2 2 23 -10.0
3 3 33 1.0
4 4 32 1.0
5 5 31 9.0
6 6 22 17.0
7 7 5 5.0

  1. </details>
  2. # 答案3
  3. **得分**: 1
  4. a = {'idx': range(8), 'col': [47, 33, 23, 33, 32, 31, 22, 5]}
  5. df = pd.DataFrame(a)
  6. df['col'] - df['col'].shift(-1, fill_value=0)
  7. 0 14
  8. 1 10
  9. 2 -10
  10. 3 1
  11. 4 1
  12. 5 9
  13. 6 17
  14. 7 5
  15. Name: col, dtype: int64
  16. **************
  17. df['desired'] = df['col'] - df['col'].shift(-1, fill_value=0)
  18. idx col desired
  19. 0 0 47 14
  20. 1 1 33 10
  21. 2 2 23 -10
  22. 3 3 33 1
  23. 4 4 32 1
  24. 5 5 31 9
  25. 6 6 22 17
  26. 7 7 5 5
  27. <details>
  28. <summary>英文:</summary>
  29. a = {&#39;idx&#39;: range(8),
  30. &#39;col&#39;: [47,33,23,33,32,31,22,5],
  31. }
  32. df = pd.DataFrame(a)
  33. df[&#39;col&#39;] - df[&#39;col&#39;].shift(-1, fill_value=0)
  34. 0 14
  35. 1 10
  36. 2 -10
  37. 3 1
  38. 4 1
  39. 5 9
  40. 6 17
  41. 7 5
  42. Name: col, dtype: int64
  43. **************
  44. df[&#39;desired&#39;] = df[&#39;col&#39;] - df[&#39;col&#39;].shift(-1, fill_value=0)
  45. idx col desired
  46. 0 0 47 14
  47. 1 1 33 10
  48. 2 2 23 -10
  49. 3 3 33 1
  50. 4 4 32 1
  51. 5 5 31 9
  52. 6 6 22 17
  53. 7 7 5 5
  54. </details>

huangapple
  • 本文由 发表于 2023年6月8日 15:07:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/76429378.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定