英文:
Multiple conditions Pandas groupby, keeping other column values
问题
我有一个类似这样的数据框:
Launch Article Sequence Machine Quantity Date …
68033 F2500 10 lathe 1 200 01/02/2022 …
68033 F2500 20 lathe 1 190 01/02/2022 …
68033 F2500 30 borer 3 175 02/02/2022 …
68033 F2500 40 milling 1 175 03/03/2022 …
71562 F2500 10 lathe 3 632 12/12/2022 …
71562 F2500 20 lathe 4 593 15/12/2022 …
71562 F2500 30 borer 3 560 16/12/2022 …
71562 F2500 40 milling 2 555 16/12/2022 …
69872 F302 10 lathe 2 5463 04/06/2022 …
69872 F302 30 lathe 3 5102 11/06/2022 …
70444 F302 20 lathe 1 3125 27/07/2022 …
70444 F302 30 lathe 3 2965 31/07/2022 …
… … … … … … …
124.531 rows x 12 columns
我需要做的是一种分组操作,对于每个文章(Article),我选择最大的发射号(Launch),然后选择具有最小序列号的相应机器(Machine)。
最终结果应该如下所示:
Article Launch Sequence Machine
F2500 71562 10 lathe 3
F302 70444 20 lathe 1
… … … …
我尝试使用pandas的groupby和agg方法,但它不起作用。例如,以下代码为我提供了整体的最大发射号和最小序列号,而不是与最大发射号相关的最小序列号。我尝试了一些其他方法,如sort_values等,但没有成功。
Last_Lathe_df = Last_Lathe_df.groupby(['Article'], as_index=False).agg({'Launch': 'max', 'Sequence': 'min', 'Machine': 'first'})
希望这个翻译对你有帮助。
英文:
I have a dataframe like this:
Launch Article Sequence Machine Quantity Date …
68033 F2500 10 lathe 1 200 01/02/2022 …
68033 F2500 20 lathe 1 190 01/02/2022 …
68033 F2500 30 borer 3 175 02/02/2022 …
68033 F2500 40 milling 1 175 03/03/2022 …
71562 F2500 10 lathe 3 632 12/12/2022 …
71562 F2500 20 lathe 4 593 15/12/2022 …
71562 F2500 30 borer 3 560 16/12/2022 …
71562 F2500 40 milling 2 555 16/12/2022 …
69872 F302 10 lathe 2 5463 04/06/2022 …
69872 F302 30 lathe 3 5102 11/06/2022 …
70444 F302 20 lathe 1 3125 27/07/2022 …
70444 F302 30 lathe 3 2965 31/07/2022 …
… … … … … … …
124.531 rows x 12 columns
What i need to do is a some kind of group by where, for each article i select the maximum launch number, and after that, the minimum sequence number with its relative machine.
The end result should look like this:
Article Launch Sequence Machine
F2500 71562 10 lathe 3
F302 70444 20 lathe 1
… … … …
I've tried to do it with pandas groupby with .agg, but it doesn't work. The following code, for example, gives me the max launch and min sequence overall and not the min sequence related to the max launch. I've tried some other approaches with sort_values and such, but with no success.
Last_Lathe_df = Last_Lathe_df.groupby(['Article'], as_index=False).agg({'Launch': 'max', 'Sequence': 'min', 'Machine': 'first'})
答案1
得分: 2
获取每篇文章的最大启动次数并筛选行
m = df.groupby('Article')['Launch'].max()
df2 = df.loc[df['Launch'].isin(m)]
获取具有最小顺序的行
Last_Lathe_df = df2.loc[df2.groupby('Article')['Sequence'].idxmin()]
英文:
I would use:
# get max Launch per Article and filter rows
m = df.groupby('Article')['Launch'].max()
df2 = df.loc[df['Launch'].isin(m)]
# get rows with min sequence
Last_Lathe_df = df2.loc[df2.groupby('Article')['Sequence'].idxmin()]
Output:
Launch Article Sequence Machine Quantity Date
4 71562 F2500 10 lathe 3 632 12/12/2022
10 70444 F302 20 lathe 1 3125 27/07/2022
答案2
得分: 2
以简单的方式:
df.groupby('Article').apply(lambda x: x[x['Launch'].eq(x['Launch'].max())]
.sort_values(by=['Sequence']).head(1)).reset_index(drop=True)
----------
Launch Article Sequence Machine Quantity Date
0 71562 F2500 10 lathe 3 632 12/12/2022
1 70444 F302 20 lathe 1 3125 27/07/2022
英文:
In straightforward way:
df.groupby('Article').apply(lambda x: x[x['Launch'].eq(x['Launch'].max())]
.sort_values(by=['Sequence']).head(1)).reset_index(drop=True)
Launch Article Sequence Machine Quantity Date
0 71562 F2500 10 lathe 3 632 12/12/2022
1 70444 F302 20 lathe 1 3125 27/07/2022
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论