Pandas groupby多个条件,保留其他列数值。

huangapple go评论69阅读模式
英文:

Multiple conditions Pandas groupby, keeping other column values

问题

我有一个类似这样的数据框:

Launch	Article	Sequence	Machine	    Quantity	Date	    
68033	F2500	10	        lathe 1	    200	        01/02/2022	
68033	F2500	20	        lathe 1	    190	        01/02/2022	
68033	F2500	30	        borer 3	    175	        02/02/2022	
68033	F2500	40	        milling 1	175	        03/03/2022	
71562	F2500	10	        lathe 3	    632	        12/12/2022	
71562	F2500	20	        lathe 4	    593	        15/12/2022	
71562	F2500	30	        borer 3	    560	        16/12/2022	
71562	F2500	40	        milling 2	555	        16/12/2022	
69872	F302	10	        lathe 2	    5463	    04/06/2022	
69872	F302	30	        lathe 3	    5102	    11/06/2022	
70444	F302	20	        lathe 1	    3125	    27/07/2022	
70444	F302	30	        lathe 3	    2965	    31/07/2022	
	    	    	        	        	        	        

124.531 rows x 12 columns

我需要做的是一种分组操作,对于每个文章(Article),我选择最大的发射号(Launch),然后选择具有最小序列号的相应机器(Machine)。

最终结果应该如下所示:

Article	Launch	Sequence	Machine
F2500	71562	10	        lathe 3
F302	70444	20	        lathe 1
                         

我尝试使用pandas的groupby和agg方法,但它不起作用。例如,以下代码为我提供了整体的最大发射号和最小序列号,而不是与最大发射号相关的最小序列号。我尝试了一些其他方法,如sort_values等,但没有成功。

Last_Lathe_df = Last_Lathe_df.groupby(['Article'], as_index=False).agg({'Launch': 'max', 'Sequence': 'min', 'Machine': 'first'})

希望这个翻译对你有帮助。

英文:

I have a dataframe like this:

Launch	Article	Sequence	Machine	    Quantity	Date	    …
68033	F2500	10	        lathe 1	    200	        01/02/2022	…
68033	F2500	20	        lathe 1	    190	        01/02/2022	…
68033	F2500	30	        borer 3	    175	        02/02/2022	…
68033	F2500	40	        milling 1	175	        03/03/2022	…
71562	F2500	10	        lathe 3	    632	        12/12/2022	…
71562	F2500	20	        lathe 4	    593	        15/12/2022	…
71562	F2500	30	        borer 3	    560	        16/12/2022	…
71562	F2500	40	        milling 2	555	        16/12/2022	…
69872	F302	10	        lathe 2	    5463	    04/06/2022	…
69872	F302	30	        lathe 3	    5102	    11/06/2022	…
70444	F302	20	        lathe 1	    3125	    27/07/2022	…
70444	F302	30	        lathe 3	    2965	    31/07/2022	…
…	    …	    …	        …	        …	        …	        …

124.531 rows x 12 columns

What i need to do is a some kind of group by where, for each article i select the maximum launch number, and after that, the minimum sequence number with its relative machine.

The end result should look like this:

Article	Launch	Sequence	Machine
F2500	71562	10	        lathe 3
F302	70444	20	        lathe 1
…       …       …           …

I've tried to do it with pandas groupby with .agg, but it doesn't work. The following code, for example, gives me the max launch and min sequence overall and not the min sequence related to the max launch. I've tried some other approaches with sort_values and such, but with no success.

Last_Lathe_df = Last_Lathe_df.groupby(['Article'], as_index=False).agg({'Launch': 'max', 'Sequence': 'min', 'Machine': 'first'})

答案1

得分: 2

获取每篇文章的最大启动次数并筛选行

m = df.groupby('Article')['Launch'].max()
df2 = df.loc[df['Launch'].isin(m)]

获取具有最小顺序的行

Last_Lathe_df = df2.loc[df2.groupby('Article')['Sequence'].idxmin()]

英文:

I would use:

# get max Launch per Article and filter rows
m = df.groupby('Article')['Launch'].max()
df2 = df.loc[df['Launch'].isin(m)]

# get rows with min sequence
Last_Lathe_df = df2.loc[df2.groupby('Article')['Sequence'].idxmin()]

Output:

    Launch Article  Sequence  Machine  Quantity        Date
4    71562   F2500        10  lathe 3       632  12/12/2022
10   70444    F302        20  lathe 1      3125  27/07/2022

答案2

得分: 2

以简单的方式

    df.groupby('Article').apply(lambda x: x[x['Launch'].eq(x['Launch'].max())]
                                .sort_values(by=['Sequence']).head(1)).reset_index(drop=True)


----------

      Launch Article  Sequence  Machine  Quantity        Date
    0   71562   F2500        10  lathe 3       632  12/12/2022
    1   70444    F302        20  lathe 1      3125  27/07/2022
英文:

In straightforward way:

df.groupby('Article').apply(lambda x: x[x['Launch'].eq(x['Launch'].max())]
                            .sort_values(by=['Sequence']).head(1)).reset_index(drop=True)

  Launch Article  Sequence  Machine  Quantity        Date
0   71562   F2500        10  lathe 3       632  12/12/2022
1   70444    F302        20  lathe 1      3125  27/07/2022

huangapple
  • 本文由 发表于 2023年3月15日 18:25:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/75743412.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定