如何检查pandas数据帧组中条目属于哪个箱子?

huangapple go评论102阅读模式
英文:

How to check in which bin an entry of a pandas data frame group belongs?

问题

我明白你想要的是如何创建一个quantile_bin列,该列包含整数值从1到10,用于指示特定Playerwins列中的数字属于哪个分位数。其中1代表在0到0.1分位数之间,2代表在0.1到0.2分位数之间,以此类推,10代表在0.9到1.0分位数之间。

你可以通过以下方式定义magic_quantile_function来实现这一目标:

  1. import pandas as pd
  2. import numpy as np
  3. def magic_quantile_function(x):
  4. quantiles = np.arange(0.1, 1.1, 0.1) # 定义分位数的分割点
  5. bins = pd.qcut(x, quantiles, labels=False, duplicates='drop') + 1
  6. return bins
  7. example = {'Player': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
  8. 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B',
  9. 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
  10. 'wins': [2, 4, 4, 5, 6, 9, 13, 13, 15, 15, 14, 13,
  11. 11, 9, 9, 8, 8, 16, 19, 21, 14, 20, 19, 18]}
  12. df = pd.DataFrame(example)
  13. df['quantile_bin'] = df.groupby("Player")["wins"].transform(magic_quantile_function)
  14. print(df)

这段代码首先定义了magic_quantile_function,它使用pd.qcut函数来计算每个Playerwins列所属的分位数,并将结果存储在quantile_bin列中。然后,使用df.groupby("Player")["wins"].transform将这个函数应用到每个分组中,最终得到你想要的结果。

输出的结果应该类似于你所期望的那样。

英文:

I have a following problem.
I have a pandas data frame, eg something like this:

  1. example = {'Player': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
  2. 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B',
  3. 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
  4. 'wins': [2, 4, 4, 5, 6, 9, 13, 13, 15, 15, 14, 13,
  5. 11, 9, 9, 8, 8, 16, 19, 21, 14, 20, 19, 18]}
  6. df = pd.DataFrame(example)

What I want to build, is a column quantile_bin, which has integer values from 1 to 10, which tells me, to which quantile the specific number in wins for a specific Player belongs, where 1 is between 0 and 0.1 quantile, 2 is between 0.1 and 0.2 quantile, ..., and 10 is between 0.9 and 1.0 quantile.

In other words

  1. df.groupby("Player")["wins"].apply(magic_quantile_function)

The result should be something like

  1. >>> df
  2. Player wins quantile_bin
  3. 0 A 2 1
  4. 1 A 4 2
  5. 2 A 4 2
  6. 3 A 5 5
  7. 4 A 6 6
  8. 5 A 9 8
  9. 6 A 13 10
  10. 7 A 13 10
  11. 8 B 15 10
  12. 9 B 15 10
  13. 10 B 14 9
  14. 11 B 13 6
  15. 12 B 11 5
  16. 13 B 9 2
  17. 14 B 9 2
  18. 15 B 8 1
  19. 16 C 8 1
  20. 17 C 16 3
  21. 18 C 19 7
  22. 19 C 21 10
  23. 20 C 14 2
  24. 21 C 20 9
  25. 22 C 19 7
  26. 23 C 18 5

I would be grateful if one could help me defining this magic_quantile_function, or at least giving me a hint on how to pass a grouped wins distribution to a user defined function which is applied to a groupby object.

Thank you in advance!

答案1

得分: 1

看起来你可以使用qcut

  1. df['quantile'] = df.groupby('Player')['wins'].transform(lambda s: pd.qcut(s, q=10, duplicates='drop', labels=range(2, 11)))

  1. Player wins quantile
  2. 0 A 2 2
  3. 1 A 4 3
  4. 2 A 4 3
  5. 3 A 5 6
  6. 4 A 6 7
  7. 5 A 9 9
  8. 6 A 13 10
  9. 7 A 13 10
  10. 8 B 15 10
  11. 9 B 15 10
  12. 10 B 14 9
  13. 11 B 13 7
  14. 12 B 11 6
  15. 13 B 9 3
  16. 14 B 9 3
  17. 15 B 8 2
  18. 16 C 8 2
  19. 17 C 16 4
  20. 18 C 19 7
  21. 19 C 21 10
  22. 20 C 14 3
  23. 21 C 20 9
  24. 22 C 19 7
  25. 23 C 18 6
英文:

Seems like you could use qcut

  1. df['quantile'] = df.groupby('Player')['wins'].transform(lambda s: pd.qcut(s,
  2. q=10,
  3. duplicates='drop',
  4. labels=range(2, 11)))

  1. Player wins quantile
  2. 0 A 2 2
  3. 1 A 4 3
  4. 2 A 4 3
  5. 3 A 5 6
  6. 4 A 6 7
  7. 5 A 9 9
  8. 6 A 13 10
  9. 7 A 13 10
  10. 8 B 15 10
  11. 9 B 15 10
  12. 10 B 14 9
  13. 11 B 13 7
  14. 12 B 11 6
  15. 13 B 9 3
  16. 14 B 9 3
  17. 15 B 8 2
  18. 16 C 8 2
  19. 17 C 16 4
  20. 18 C 19 7
  21. 19 C 21 10
  22. 20 C 14 3
  23. 21 C 20 9
  24. 22 C 19 7
  25. 23 C 18 6

huangapple
  • 本文由 发表于 2023年3月3日 23:54:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/75629264.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定