在Pandas层次化索引中应用函数

huangapple go评论77阅读模式
英文:

apply function in pandas hierarchical index

问题

  1. points tiger lion bear
  2. 0 425 72 1878 1.40000
  3. 1 441 -211 -5238 -4.00000
  4. 2 1048 47 -1266 2.90000
英文:

I have a pandas dataframe as below.

  1. df = pd.DataFrame({'team' : ['A', 'B', 'A', 'B', 'A', 'B'],
  2. 'tiger' : [87, 159, 351, 140, 72, 119],
  3. 'lion' : [1843, 3721, 6905, 1667, 2865, 1599],
  4. 'bear' : [1.9, 3.3, 6.3, 2.3, 1.2, 4.1],
  5. 'points' : [425, 425, 441, 441, 1048, 1048]})
  6. grouped = df.groupby(['points', 'team'])[['tiger', 'lion', 'bear']].median()
  7. print(grouped)
  8. tiger lion bear
  9. points team
  10. 425 A 87.00000 1843.00000 1.90000
  11. B 159.00000 3721.00000 3.30000
  12. 441 A 351.00000 6905.00000 6.30000
  13. B 140.00000 1667.00000 2.30000
  14. 1048 A 72.00000 2865.00000 1.20000
  15. B 119.00000 1599.00000 4.10000

I would like to take the difference between teams A and B for each of the animal (tiger, lion, bear) and points levels. So the difference between team A (87) and B (159) within points 425 and tiger. I'm not sure how to do this with an hierarchical index. It would look something like below. Thanks.

  1. points tiger lion bear
  2. 0 425 72 1878 1.40000
  3. 1 441 -211 -5238 -4.00000
  4. 2 1048 47 -1266 2.90000
  5. </details>
  6. # 答案1
  7. **得分**: 1
  8. 你可以使用 [`swaplevel`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.swaplevel.html) 和切片:
  9. ```python
  10. grouped = (df.groupby(['points', 'team'])[['tiger', 'lion', 'bear']].median()
  11. .swaplevel()
  12. )
  13. grouped.loc['A'] - grouped.loc['B']

或者使用 xs

  1. grouped = df.groupby(['points', 'team'])[['tiger', 'lion', 'bear']].median()
  2. grouped.xs('A', level='team') - grouped.xs('B', level='team')

输出结果:

  1. tiger lion bear
  2. points
  3. 425 -72.0 -1878.0 -1.4
  4. 441 211.0 5238.0 4.0
  5. 1048 -47.0 1266.0 -2.9
英文:

You can swaplevel and slice:

  1. grouped = (df.groupby([&#39;points&#39;, &#39;team&#39;])[[&#39;tiger&#39;, &#39;lion&#39;, &#39;bear&#39;]].median()
  2. .swaplevel()
  3. )
  4. grouped.loc[&#39;A&#39;]-grouped.loc[&#39;B&#39;]

Or use xs:

  1. grouped = df.groupby([&#39;points&#39;, &#39;team&#39;])[[&#39;tiger&#39;, &#39;lion&#39;, &#39;bear&#39;]].median()
  2. grouped.xs(&#39;A&#39;, level=&#39;team&#39;)-grouped.xs(&#39;B&#39;, level=&#39;team&#39;)

Output:

  1. tiger lion bear
  2. points
  3. 425 -72.0 -1878.0 -1.4
  4. 441 211.0 5238.0 4.0
  5. 1048 -47.0 1266.0 -2.9

答案2

得分: 0

grouped.groupby(level=0).apply(lambda dd:dd.diff().tail(1)).droplevel([1,2])

out

  1. 老虎 狮子

points
425 72.0 1878.0 1.4
441 -211.0 -5238.0 -4.0
1048 47.0 -1266.0 2.9

英文:
  1. grouped.groupby(level=0).apply(lambda dd:dd.diff().tail(1)).droplevel([1,2])

out

  1. tiger lion bear
  2. points
  3. 425 72.0 1878.0 1.4
  4. 441 -211.0 -5238.0 -4.0
  5. 1048 47.0 -1266.0 2.9

huangapple
  • 本文由 发表于 2023年2月7日 04:02:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/75366021.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定