英文:
Select n rows of dataframe for each value in column A where n is the value in column C
问题
以下是要翻译的内容:
我有下面的数据框
从中,我想选择每个销售员的n行,其中n是每个销售员的相应销售限制
因此,从下面的示例创建的新数据框将不包含第3行(客户ID - ghi)和第六行(客户ID - pqr),因为A和B的相应销售限制都为2。
对于C,将选择所有行,因为限制(4)高于分配的行数。
销售员 | 客户ID | 销售限制 |
---|---|---|
A | abc | 2 |
A | def | 2 |
A | ghi | 2 |
B | jkl | 2 |
B | mno | 2 |
B | pqr | 2 |
C | stu | 4 |
C | vwx | 4 |
C | yz | 4 |
请注意:销售员列表是动态的,表格中的名称和限制将每周更改
我有一个想法,可以根据销售员名称进行分组,然后选择每个销售员的:n-1行,但由于表格行数可能会更改,所以我无法找到考虑到这一点的方法。
有人能否请提供Python代码来实现上述更改,先谢谢。
英文:
I have the below dataframe
From it, I want to select only n number of rows per salesman where n is the corresponding sales_limit for each salesperson
So, the new dataframe created from the below example, will not have the 3rd row (clint id - ghi) and the sixth rows (client id - pqr) as the corresponding sales limit for both A and B are 2.
For C, all will be selected as the limit (4) is higher than the number of rows assigned.
salesman | client_id | sales_limit |
---|---|---|
A | abc | 2 |
A | def | 2 |
A | ghi | 2 |
B | jkl | 2 |
B | mno | 2 |
B | pqr | 2 |
C | stu | 4 |
C | vwx | 4 |
C | yz | 4 |
Please note: the list of salesman is dynamic, the names and their limits will change every week in the table
I had an idea of doing a groupby based on the salesman names and then select :n-1 rows per salesman but since the table row numbers might change, I wasnt able to figure a way to account for that.
Can someone please help with the code for above changes in python, thanks in advance
答案1
得分: 3
你可以使用 groupby
然后用 sales_limit
切片每个分组。
out = (df.groupby('salesman', as_index=False, group_keys=False)
.apply(lambda g: g.iloc[:g['sales_limit'].iloc[0]]))
print(out)
salesman client_id sales_limit
0 A abc 2
1 A def 2
3 B jkl 2
4 B mno 2
6 C stu 4
7 C vwx 4
8 C yz 4
英文:
You can do groupby
then slice each group with sales_limit
out = (df.groupby('salesman', as_index=False, group_keys=False)
.apply(lambda g: g.iloc[:g['sales_limit'].iloc[0]]))
print(out)
salesman client_id sales_limit
0 A abc 2
1 A def 2
3 B jkl 2
4 B mno 2
6 C stu 4
7 C vwx 4
8 C yz 4
答案2
得分: 2
为了提高效率,使用 groupby.cumcount
和布尔索引 进行比较:
out = df[df.groupby('salesman').cumcount().lt(df['sales_limit'])]
输出:
salesman client_id sales_limit
0 A abc 2
1 A def 2
3 B jkl 2
4 B mno 2
6 C stu 4
7 C vwx 4
8 C yz 4
中间结果:
salesman client_id sales_limit cumcount cumcount < sales_limit
0 A abc 2 0 True
1 A def 2 1 True
2 A ghi 2 2 False
3 B jkl 2 0 True
4 B mno 2 1 True
5 B pqr 2 2 False
6 C stu 4 0 True
7 C vwx 4 1 True
8 C yz 4 2 True
英文:
For efficiency, use a comparison with groupby.cumcount
and boolean indexing:
out = df[df.groupby('salesman').cumcount().lt(df['sales_limit'])]
Output:
salesman client_id sales_limit
0 A abc 2
1 A def 2
3 B jkl 2
4 B mno 2
6 C stu 4
7 C vwx 4
8 C yz 4
Intermediates:
salesman client_id sales_limit cumcount cumcount < sales_limit
0 A abc 2 0 True
1 A def 2 1 True
2 A ghi 2 2 False
3 B jkl 2 0 True
4 B mno 2 1 True
5 B pqr 2 2 False
6 C stu 4 0 True
7 C vwx 4 1 True
8 C yz 4 2 True
答案3
得分: 0
以下是翻译好的部分:
为此,我将利用 pandas groupby
和 apply
。
导入 pandas
并生成 df
:
import pandas as pd
df = pd.DataFrame({'salesman':['A']*3+['B']*3+['C']*3,'client_id':list('ABCDEFGHI'),'sales_limit':[2]*6+[4]*3})
生成销售员限制的字典:
limits = df.groupby('salesman')['sales_limit'].max().to_dict()
使用 apply
和 iloc
打印前 x 行:
df.groupby('salesman').apply(lambda x:x.iloc[:limits[x.name]])
英文:
For this I would leverage pandas groupby
, as well as apply
.
import pandas
and generate df
:
import pandas as pd
df = pd.DataFrame({'salesman':['A']*3+['B']*3+['C']*3,'client_id':list('ABCDEFGHI'),'sales_limit':[2]*6+[4]*3})
generate a dictionary with the limit per salesperson
limits = df.groupby('salesman')['sales_limit'].max().to_dict()
use apply
and iloc
to print the first x rows
df.groupby('salesman').apply(lambda x:x.iloc[:limits[x.name]])
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论