如何检查pandas数据帧组中条目属于哪个箱子?

huangapple go评论65阅读模式
英文:

How to check in which bin an entry of a pandas data frame group belongs?

问题

我明白你想要的是如何创建一个quantile_bin列,该列包含整数值从1到10,用于指示特定Playerwins列中的数字属于哪个分位数。其中1代表在0到0.1分位数之间,2代表在0.1到0.2分位数之间,以此类推,10代表在0.9到1.0分位数之间。

你可以通过以下方式定义magic_quantile_function来实现这一目标:

import pandas as pd
import numpy as np

def magic_quantile_function(x):
    quantiles = np.arange(0.1, 1.1, 0.1)  # 定义分位数的分割点
    bins = pd.qcut(x, quantiles, labels=False, duplicates='drop') + 1
    return bins

example = {'Player': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
                      'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B',
                      'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
           'wins': [2, 4, 4, 5, 6, 9, 13, 13, 15, 15, 14, 13,
                    11, 9, 9, 8, 8, 16, 19, 21, 14, 20, 19, 18]}

df = pd.DataFrame(example)
df['quantile_bin'] = df.groupby("Player")["wins"].transform(magic_quantile_function)

print(df)

这段代码首先定义了magic_quantile_function,它使用pd.qcut函数来计算每个Playerwins列所属的分位数,并将结果存储在quantile_bin列中。然后,使用df.groupby("Player")["wins"].transform将这个函数应用到每个分组中,最终得到你想要的结果。

输出的结果应该类似于你所期望的那样。

英文:

I have a following problem.
I have a pandas data frame, eg something like this:

example = {'Player': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
                        'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B',
                        'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
            'wins': [2, 4, 4, 5, 6, 9, 13, 13, 15, 15, 14, 13,
                 11, 9, 9, 8, 8, 16, 19, 21, 14, 20, 19, 18]}

df = pd.DataFrame(example)

What I want to build, is a column quantile_bin, which has integer values from 1 to 10, which tells me, to which quantile the specific number in wins for a specific Player belongs, where 1 is between 0 and 0.1 quantile, 2 is between 0.1 and 0.2 quantile, ..., and 10 is between 0.9 and 1.0 quantile.

In other words

df.groupby("Player")["wins"].apply(magic_quantile_function)

The result should be something like

>>> df
   Player  wins  quantile_bin
0       A     2             1
1       A     4             2
2       A     4             2
3       A     5             5
4       A     6             6
5       A     9             8
6       A    13            10
7       A    13            10
8       B    15            10
9       B    15            10
10      B    14             9
11      B    13             6
12      B    11             5
13      B     9             2
14      B     9             2
15      B     8             1
16      C     8             1
17      C    16             3
18      C    19             7
19      C    21            10
20      C    14             2
21      C    20             9
22      C    19             7
23      C    18             5

I would be grateful if one could help me defining this magic_quantile_function, or at least giving me a hint on how to pass a grouped wins distribution to a user defined function which is applied to a groupby object.

Thank you in advance!

答案1

得分: 1

看起来你可以使用qcut

df['quantile'] = df.groupby('Player')['wins'].transform(lambda s: pd.qcut(s, q=10, duplicates='drop', labels=range(2, 11)))

   Player  wins quantile
0       A     2        2
1       A     4        3
2       A     4        3
3       A     5        6
4       A     6        7
5       A     9        9
6       A    13       10
7       A    13       10
8       B    15       10
9       B    15       10
10      B    14        9
11      B    13        7
12      B    11        6
13      B     9        3
14      B     9        3
15      B     8        2
16      C     8        2
17      C    16        4
18      C    19        7
19      C    21       10
20      C    14        3
21      C    20        9
22      C    19        7
23      C    18        6
英文:

Seems like you could use qcut

df['quantile'] = df.groupby('Player')['wins'].transform(lambda s: pd.qcut(s, 
                                                                          q=10, 
                                                                          duplicates='drop',
                                                                          labels=range(2, 11)))

   Player  wins quantile
0       A     2        2
1       A     4        3
2       A     4        3
3       A     5        6
4       A     6        7
5       A     9        9
6       A    13       10
7       A    13       10
8       B    15       10
9       B    15       10
10      B    14        9
11      B    13        7
12      B    11        6
13      B     9        3
14      B     9        3
15      B     8        2
16      C     8        2
17      C    16        4
18      C    19        7
19      C    21       10
20      C    14        3
21      C    20        9
22      C    19        7
23      C    18        6

huangapple
  • 本文由 发表于 2023年3月3日 23:54:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/75629264.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定