2023年3月3日 23:54:46go评论102阅读模式

英文:

How to check in which bin an entry of a pandas data frame group belongs?

问题

我明白你想要的是如何创建一个quantile_bin列，该列包含整数值从1到10，用于指示特定Player的wins列中的数字属于哪个分位数。其中1代表在0到0.1分位数之间，2代表在0.1到0.2分位数之间，以此类推，10代表在0.9到1.0分位数之间。

你可以通过以下方式定义magic_quantile_function来实现这一目标：

import pandas as pd
import numpy as np
def magic_quantile_function(x):
    quantiles = np.arange(0.1, 1.1, 0.1)  # 定义分位数的分割点
    bins = pd.qcut(x, quantiles, labels=False, duplicates='drop') + 1
    return bins
example = {'Player': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
                      'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B',
                      'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
           'wins': [2, 4, 4, 5, 6, 9, 13, 13, 15, 15, 14, 13,
                    11, 9, 9, 8, 8, 16, 19, 21, 14, 20, 19, 18]}
df = pd.DataFrame(example)
df['quantile_bin'] = df.groupby("Player")["wins"].transform(magic_quantile_function)
print(df)

这段代码首先定义了magic_quantile_function，它使用pd.qcut函数来计算每个Player的wins列所属的分位数，并将结果存储在quantile_bin列中。然后，使用df.groupby("Player")["wins"].transform将这个函数应用到每个分组中，最终得到你想要的结果。

输出的结果应该类似于你所期望的那样。

英文:

I have a following problem.
I have a pandas data frame, eg something like this:

example = {&#39;Player&#39;: [&#39;A&#39;, &#39;A&#39;, &#39;A&#39;, &#39;A&#39;, &#39;A&#39;, &#39;A&#39;, &#39;A&#39;, &#39;A&#39;,
                        &#39;B&#39;, &#39;B&#39;, &#39;B&#39;, &#39;B&#39;, &#39;B&#39;, &#39;B&#39;, &#39;B&#39;, &#39;B&#39;,
                        &#39;C&#39;, &#39;C&#39;, &#39;C&#39;, &#39;C&#39;, &#39;C&#39;, &#39;C&#39;, &#39;C&#39;, &#39;C&#39;],
            &#39;wins&#39;: [2, 4, 4, 5, 6, 9, 13, 13, 15, 15, 14, 13,
                 11, 9, 9, 8, 8, 16, 19, 21, 14, 20, 19, 18]}
df = pd.DataFrame(example)

What I want to build, is a column quantile_bin, which has integer values from 1 to 10, which tells me, to which quantile the specific number in wins for a specific Player belongs, where 1 is between 0 and 0.1 quantile, 2 is between 0.1 and 0.2 quantile, ..., and 10 is between 0.9 and 1.0 quantile.

In other words

df.groupby(&quot;Player&quot;)[&quot;wins&quot;].apply(magic_quantile_function)

The result should be something like

&gt;&gt;&gt; df
   Player  wins  quantile_bin
0       A     2             1
1       A     4             2
2       A     4             2
3       A     5             5
4       A     6             6
5       A     9             8
6       A    13            10
7       A    13            10
8       B    15            10
9       B    15            10
10      B    14             9
11      B    13             6
12      B    11             5
13      B     9             2
14      B     9             2
15      B     8             1
16      C     8             1
17      C    16             3
18      C    19             7
19      C    21            10
20      C    14             2
21      C    20             9
22      C    19             7
23      C    18             5

I would be grateful if one could help me defining this magic_quantile_function, or at least giving me a hint on how to pass a grouped wins distribution to a user defined function which is applied to a groupby object.

Thank you in advance!

答案1

得分: 1

看起来你可以使用qcut

df['quantile'] = df.groupby('Player')['wins'].transform(lambda s: pd.qcut(s, q=10, duplicates='drop', labels=range(2, 11)))

   Player  wins quantile
0       A     2        2
1       A     4        3
2       A     4        3
3       A     5        6
4       A     6        7
5       A     9        9
6       A    13       10
7       A    13       10
8       B    15       10
9       B    15       10
10      B    14        9
11      B    13        7
12      B    11        6
13      B     9        3
14      B     9        3
15      B     8        2
16      C     8        2
17      C    16        4
18      C    19        7
19      C    21       10
20      C    14        3
21      C    20        9
22      C    19        7
23      C    18        6

英文:

Seems like you could use qcut

df[&#39;quantile&#39;] = df.groupby(&#39;Player&#39;)[&#39;wins&#39;].transform(lambda s: pd.qcut(s, 
                                                                          q=10, 
                                                                          duplicates=&#39;drop&#39;,
                                                                          labels=range(2, 11)))

   Player  wins quantile
0       A     2        2
1       A     4        3
2       A     4        3
3       A     5        6
4       A     6        7
5       A     9        9
6       A    13       10
7       A    13       10
8       B    15       10
9       B    15       10
10      B    14        9
11      B    13        7
12      B    11        6
13      B     9        3
14      B     9        3
15      B     8        2
16      C     8        2
17      C    16        4
18      C    19        7
19      C    21       10
20      C    14        3
21      C    20        9
22      C    19        7
23      C    18        6

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何检查pandas数据帧组中条目属于哪个箱子？

问题

答案1

pip 无法被 VS Code 找到。

@vectorize函数的输出作为元组：我应该使用什么签名？

TypeError: 无法实例化抽象类。为什么？

翻译结果：Jit是一个从字典中选择函数的JAX函数。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。