Select n行的数据框,其中n是列A中的值,n的值等于列C中的值。

huangapple go评论97阅读模式
英文:

Select n rows of dataframe for each value in column A where n is the value in column C

问题

以下是要翻译的内容:

我有下面的数据框

从中,我想选择每个销售员的n行,其中n是每个销售员的相应销售限制

因此,从下面的示例创建的新数据框将不包含第3行(客户ID - ghi)和第六行(客户ID - pqr),因为A和B的相应销售限制都为2。
对于C,将选择所有行,因为限制(4)高于分配的行数。

销售员 客户ID 销售限制
A abc 2
A def 2
A ghi 2
B jkl 2
B mno 2
B pqr 2
C stu 4
C vwx 4
C yz 4

请注意:销售员列表是动态的,表格中的名称和限制将每周更改

我有一个想法,可以根据销售员名称进行分组,然后选择每个销售员的:n-1行,但由于表格行数可能会更改,所以我无法找到考虑到这一点的方法。
有人能否请提供Python代码来实现上述更改,先谢谢。

英文:

I have the below dataframe

From it, I want to select only n number of rows per salesman where n is the corresponding sales_limit for each salesperson

So, the new dataframe created from the below example, will not have the 3rd row (clint id - ghi) and the sixth rows (client id - pqr) as the corresponding sales limit for both A and B are 2.
For C, all will be selected as the limit (4) is higher than the number of rows assigned.

salesman client_id sales_limit
A abc 2
A def 2
A ghi 2
B jkl 2
B mno 2
B pqr 2
C stu 4
C vwx 4
C yz 4

Please note: the list of salesman is dynamic, the names and their limits will change every week in the table

I had an idea of doing a groupby based on the salesman names and then select :n-1 rows per salesman but since the table row numbers might change, I wasnt able to figure a way to account for that.
Can someone please help with the code for above changes in python, thanks in advance

答案1

得分: 3

你可以使用 groupby 然后用 sales_limit 切片每个分组。

  1. out = (df.groupby('salesman', as_index=False, group_keys=False)
  2. .apply(lambda g: g.iloc[:g['sales_limit'].iloc[0]]))
  1. print(out)
  2. salesman client_id sales_limit
  3. 0 A abc 2
  4. 1 A def 2
  5. 3 B jkl 2
  6. 4 B mno 2
  7. 6 C stu 4
  8. 7 C vwx 4
  9. 8 C yz 4
英文:

You can do groupby then slice each group with sales_limit

  1. out = (df.groupby('salesman', as_index=False, group_keys=False)
  2. .apply(lambda g: g.iloc[:g['sales_limit'].iloc[0]]))
  1. print(out)
  2. salesman client_id sales_limit
  3. 0 A abc 2
  4. 1 A def 2
  5. 3 B jkl 2
  6. 4 B mno 2
  7. 6 C stu 4
  8. 7 C vwx 4
  9. 8 C yz 4

答案2

得分: 2

为了提高效率,使用 groupby.cumcount布尔索引 进行比较:

  1. out = df[df.groupby('salesman').cumcount().lt(df['sales_limit'])]

输出:

  1. salesman client_id sales_limit
  2. 0 A abc 2
  3. 1 A def 2
  4. 3 B jkl 2
  5. 4 B mno 2
  6. 6 C stu 4
  7. 7 C vwx 4
  8. 8 C yz 4

中间结果:

  1. salesman client_id sales_limit cumcount cumcount < sales_limit
  2. 0 A abc 2 0 True
  3. 1 A def 2 1 True
  4. 2 A ghi 2 2 False
  5. 3 B jkl 2 0 True
  6. 4 B mno 2 1 True
  7. 5 B pqr 2 2 False
  8. 6 C stu 4 0 True
  9. 7 C vwx 4 1 True
  10. 8 C yz 4 2 True
英文:

For efficiency, use a comparison with groupby.cumcount and boolean indexing:

  1. out = df[df.groupby('salesman').cumcount().lt(df['sales_limit'])]

Output:

  1. salesman client_id sales_limit
  2. 0 A abc 2
  3. 1 A def 2
  4. 3 B jkl 2
  5. 4 B mno 2
  6. 6 C stu 4
  7. 7 C vwx 4
  8. 8 C yz 4

Intermediates:

  1. salesman client_id sales_limit cumcount cumcount < sales_limit
  2. 0 A abc 2 0 True
  3. 1 A def 2 1 True
  4. 2 A ghi 2 2 False
  5. 3 B jkl 2 0 True
  6. 4 B mno 2 1 True
  7. 5 B pqr 2 2 False
  8. 6 C stu 4 0 True
  9. 7 C vwx 4 1 True
  10. 8 C yz 4 2 True

答案3

得分: 0

以下是翻译好的部分:

为此,我将利用 pandas groupbyapply

导入 pandas 并生成 df

  1. import pandas as pd
  2. df = pd.DataFrame({'salesman':['A']*3+['B']*3+['C']*3,'client_id':list('ABCDEFGHI'),'sales_limit':[2]*6+[4]*3})

生成销售员限制的字典:

  1. limits = df.groupby('salesman')['sales_limit'].max().to_dict()

使用 applyiloc 打印前 x 行:

  1. df.groupby('salesman').apply(lambda x:x.iloc[:limits[x.name]])
英文:

For this I would leverage pandas groupby, as well as apply.

import pandas and generate df:

  1. import pandas as pd
  2. df = pd.DataFrame({'salesman':['A']*3+['B']*3+['C']*3,'client_id':list('ABCDEFGHI'),'sales_limit':[2]*6+[4]*3})

generate a dictionary with the limit per salesperson

  1. limits = df.groupby('salesman')['sales_limit'].max().to_dict()

use apply and iloc to print the first x rows

  1. df.groupby('salesman').apply(lambda x:x.iloc[:limits[x.name]])

huangapple
  • 本文由 发表于 2023年5月22日 20:50:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76306383.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定