Select n行的数据框,其中n是列A中的值,n的值等于列C中的值。

huangapple go评论68阅读模式
英文:

Select n rows of dataframe for each value in column A where n is the value in column C

问题

以下是要翻译的内容:

我有下面的数据框

从中,我想选择每个销售员的n行,其中n是每个销售员的相应销售限制

因此,从下面的示例创建的新数据框将不包含第3行(客户ID - ghi)和第六行(客户ID - pqr),因为A和B的相应销售限制都为2。
对于C,将选择所有行,因为限制(4)高于分配的行数。

销售员 客户ID 销售限制
A abc 2
A def 2
A ghi 2
B jkl 2
B mno 2
B pqr 2
C stu 4
C vwx 4
C yz 4

请注意:销售员列表是动态的,表格中的名称和限制将每周更改

我有一个想法,可以根据销售员名称进行分组,然后选择每个销售员的:n-1行,但由于表格行数可能会更改,所以我无法找到考虑到这一点的方法。
有人能否请提供Python代码来实现上述更改,先谢谢。

英文:

I have the below dataframe

From it, I want to select only n number of rows per salesman where n is the corresponding sales_limit for each salesperson

So, the new dataframe created from the below example, will not have the 3rd row (clint id - ghi) and the sixth rows (client id - pqr) as the corresponding sales limit for both A and B are 2.
For C, all will be selected as the limit (4) is higher than the number of rows assigned.

salesman client_id sales_limit
A abc 2
A def 2
A ghi 2
B jkl 2
B mno 2
B pqr 2
C stu 4
C vwx 4
C yz 4

Please note: the list of salesman is dynamic, the names and their limits will change every week in the table

I had an idea of doing a groupby based on the salesman names and then select :n-1 rows per salesman but since the table row numbers might change, I wasnt able to figure a way to account for that.
Can someone please help with the code for above changes in python, thanks in advance

答案1

得分: 3

你可以使用 groupby 然后用 sales_limit 切片每个分组。

out = (df.groupby('salesman', as_index=False, group_keys=False)
       .apply(lambda g: g.iloc[:g['sales_limit'].iloc[0]]))
print(out)

  salesman client_id  sales_limit
0        A       abc            2
1        A       def            2
3        B       jkl            2
4        B       mno            2
6        C       stu            4
7        C       vwx            4
8        C        yz            4
英文:

You can do groupby then slice each group with sales_limit

out = (df.groupby('salesman', as_index=False, group_keys=False)
       .apply(lambda g: g.iloc[:g['sales_limit'].iloc[0]]))
print(out)

  salesman client_id  sales_limit
0        A       abc            2
1        A       def            2
3        B       jkl            2
4        B       mno            2
6        C       stu            4
7        C       vwx            4
8        C        yz            4

答案2

得分: 2

为了提高效率,使用 groupby.cumcount布尔索引 进行比较:

out = df[df.groupby('salesman').cumcount().lt(df['sales_limit'])]

输出:

  salesman client_id  sales_limit
0        A       abc            2
1        A       def            2
3        B       jkl            2
4        B       mno            2
6        C       stu            4
7        C       vwx            4
8        C        yz            4

中间结果:

  salesman client_id  sales_limit  cumcount  cumcount < sales_limit
0        A       abc            2         0                    True
1        A       def            2         1                    True
2        A       ghi            2         2                   False
3        B       jkl            2         0                    True
4        B       mno            2         1                    True
5        B       pqr            2         2                   False
6        C       stu            4         0                    True
7        C       vwx            4         1                    True
8        C        yz            4         2                    True
英文:

For efficiency, use a comparison with groupby.cumcount and boolean indexing:

out = df[df.groupby('salesman').cumcount().lt(df['sales_limit'])]

Output:

  salesman client_id  sales_limit
0        A       abc            2
1        A       def            2
3        B       jkl            2
4        B       mno            2
6        C       stu            4
7        C       vwx            4
8        C        yz            4

Intermediates:

  salesman client_id  sales_limit  cumcount  cumcount < sales_limit
0        A       abc            2         0                    True
1        A       def            2         1                    True
2        A       ghi            2         2                   False
3        B       jkl            2         0                    True
4        B       mno            2         1                    True
5        B       pqr            2         2                   False
6        C       stu            4         0                    True
7        C       vwx            4         1                    True
8        C        yz            4         2                    True

答案3

得分: 0

以下是翻译好的部分:

为此,我将利用 pandas groupbyapply

导入 pandas 并生成 df

import pandas as pd

df = pd.DataFrame({'salesman':['A']*3+['B']*3+['C']*3,'client_id':list('ABCDEFGHI'),'sales_limit':[2]*6+[4]*3})

生成销售员限制的字典:

limits = df.groupby('salesman')['sales_limit'].max().to_dict()

使用 applyiloc 打印前 x 行:

df.groupby('salesman').apply(lambda x:x.iloc[:limits[x.name]])
英文:

For this I would leverage pandas groupby, as well as apply.

import pandas and generate df:

import pandas as pd

df = pd.DataFrame({'salesman':['A']*3+['B']*3+['C']*3,'client_id':list('ABCDEFGHI'),'sales_limit':[2]*6+[4]*3})

generate a dictionary with the limit per salesperson

limits = df.groupby('salesman')['sales_limit'].max().to_dict()

use apply and iloc to print the first x rows

df.groupby('salesman').apply(lambda x:x.iloc[:limits[x.name]])

huangapple
  • 本文由 发表于 2023年5月22日 20:50:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76306383.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定