2023年5月22日 20:50:53go评论97阅读模式

英文:

Select n rows of dataframe for each value in column A where n is the value in column C

问题

以下是要翻译的内容：

我有下面的数据框

从中，我想选择每个销售员的n行，其中n是每个销售员的相应销售限制

因此，从下面的示例创建的新数据框将不包含第3行（客户ID - ghi）和第六行（客户ID - pqr），因为A和B的相应销售限制都为2。
对于C，将选择所有行，因为限制（4）高于分配的行数。

销售员	客户ID	销售限制
A	abc	2
A	def	2
A	ghi	2
B	jkl	2
B	mno	2
B	pqr	2
C	stu	4
C	vwx	4
C	yz	4

请注意：销售员列表是动态的，表格中的名称和限制将每周更改

我有一个想法，可以根据销售员名称进行分组，然后选择每个销售员的:n-1行，但由于表格行数可能会更改，所以我无法找到考虑到这一点的方法。
有人能否请提供Python代码来实现上述更改，先谢谢。

英文:

I have the below dataframe

From it, I want to select only n number of rows per salesman where n is the corresponding sales_limit for each salesperson

So, the new dataframe created from the below example, will not have the 3rd row (clint id - ghi) and the sixth rows (client id - pqr) as the corresponding sales limit for both A and B are 2.
For C, all will be selected as the limit (4) is higher than the number of rows assigned.

salesman	client_id	sales_limit
A	abc	2
A	def	2
A	ghi	2
B	jkl	2
B	mno	2
B	pqr	2
C	stu	4
C	vwx	4
C	yz	4

Please note: the list of salesman is dynamic, the names and their limits will change every week in the table

I had an idea of doing a groupby based on the salesman names and then select :n-1 rows per salesman but since the table row numbers might change, I wasnt able to figure a way to account for that.
Can someone please help with the code for above changes in python, thanks in advance

答案1

得分: 3

你可以使用 groupby 然后用 sales_limit 切片每个分组。

out = (df.groupby('salesman', as_index=False, group_keys=False)
       .apply(lambda g: g.iloc[:g['sales_limit'].iloc[0]]))

print(out)
  salesman client_id  sales_limit
0        A       abc            2
1        A       def            2
3        B       jkl            2
4        B       mno            2
6        C       stu            4
7        C       vwx            4
8        C        yz            4

英文:

You can do groupby then slice each group with sales_limit

out = (df.groupby(&#39;salesman&#39;, as_index=False, group_keys=False)
       .apply(lambda g: g.iloc[:g[&#39;sales_limit&#39;].iloc[0]]))

print(out)
  salesman client_id  sales_limit
0        A       abc            2
1        A       def            2
3        B       jkl            2
4        B       mno            2
6        C       stu            4
7        C       vwx            4
8        C        yz            4

答案2

得分: 2

为了提高效率，使用 groupby.cumcount 和布尔索引进行比较：

out = df[df.groupby('salesman').cumcount().lt(df['sales_limit'])]

输出：

  salesman client_id  sales_limit
0        A       abc            2
1        A       def            2
3        B       jkl            2
4        B       mno            2
6        C       stu            4
7        C       vwx            4
8        C        yz            4

中间结果：

  salesman client_id  sales_limit  cumcount  cumcount &lt; sales_limit
0        A       abc            2         0                    True
1        A       def            2         1                    True
2        A       ghi            2         2                   False
3        B       jkl            2         0                    True
4        B       mno            2         1                    True
5        B       pqr            2         2                   False
6        C       stu            4         0                    True
7        C       vwx            4         1                    True
8        C        yz            4         2                    True

英文:

For efficiency, use a comparison with groupby.cumcount and boolean indexing:

out = df[df.groupby(&#39;salesman&#39;).cumcount().lt(df[&#39;sales_limit&#39;])]

Output:

  salesman client_id  sales_limit
0        A       abc            2
1        A       def            2
3        B       jkl            2
4        B       mno            2
6        C       stu            4
7        C       vwx            4
8        C        yz            4

Intermediates:

  salesman client_id  sales_limit  cumcount  cumcount &lt; sales_limit
0        A       abc            2         0                    True
1        A       def            2         1                    True
2        A       ghi            2         2                   False
3        B       jkl            2         0                    True
4        B       mno            2         1                    True
5        B       pqr            2         2                   False
6        C       stu            4         0                    True
7        C       vwx            4         1                    True
8        C        yz            4         2                    True

答案3

得分: 0

以下是翻译好的部分：

为此，我将利用 pandas groupby 和 apply。

导入 pandas 并生成 df：

import pandas as pd
df = pd.DataFrame({'salesman':['A']*3+['B']*3+['C']*3,'client_id':list('ABCDEFGHI'),'sales_limit':[2]*6+[4]*3})

生成销售员限制的字典：

limits = df.groupby('salesman')['sales_limit'].max().to_dict()

使用 apply 和 iloc 打印前 x 行：

df.groupby('salesman').apply(lambda x:x.iloc[:limits[x.name]])

英文:

For this I would leverage pandas groupby, as well as apply.

import pandas and generate df:

import pandas as pd
df = pd.DataFrame({&#39;salesman&#39;:[&#39;A&#39;]*3+[&#39;B&#39;]*3+[&#39;C&#39;]*3,&#39;client_id&#39;:list(&#39;ABCDEFGHI&#39;),&#39;sales_limit&#39;:[2]*6+[4]*3})

generate a dictionary with the limit per salesperson

limits = df.groupby(&#39;salesman&#39;)[&#39;sales_limit&#39;].max().to_dict()

use apply and iloc to print the first x rows

df.groupby(&#39;salesman&#39;).apply(lambda x:x.iloc[:limits[x.name]])

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Select n行的数据框，其中n是列A中的值，n的值等于列C中的值。

问题

答案1

答案2

答案3

将CSV转换为JSON在Dataflow中

Python BeautifulSoup 爬取和收集数据

`python keras.preprocessing.sequence`没有`pad_sequences`属性。

Django: 模板语法错误 – 无法解析余下的部分

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。