Pandas groupby多个条件,保留其他列数值。

huangapple go评论102阅读模式
英文:

Multiple conditions Pandas groupby, keeping other column values

问题

我有一个类似这样的数据框:

  1. Launch Article Sequence Machine Quantity Date
  2. 68033 F2500 10 lathe 1 200 01/02/2022
  3. 68033 F2500 20 lathe 1 190 01/02/2022
  4. 68033 F2500 30 borer 3 175 02/02/2022
  5. 68033 F2500 40 milling 1 175 03/03/2022
  6. 71562 F2500 10 lathe 3 632 12/12/2022
  7. 71562 F2500 20 lathe 4 593 15/12/2022
  8. 71562 F2500 30 borer 3 560 16/12/2022
  9. 71562 F2500 40 milling 2 555 16/12/2022
  10. 69872 F302 10 lathe 2 5463 04/06/2022
  11. 69872 F302 30 lathe 3 5102 11/06/2022
  12. 70444 F302 20 lathe 1 3125 27/07/2022
  13. 70444 F302 30 lathe 3 2965 31/07/2022
  14. 124.531 rows x 12 columns

我需要做的是一种分组操作,对于每个文章(Article),我选择最大的发射号(Launch),然后选择具有最小序列号的相应机器(Machine)。

最终结果应该如下所示:

  1. Article Launch Sequence Machine
  2. F2500 71562 10 lathe 3
  3. F302 70444 20 lathe 1

我尝试使用pandas的groupby和agg方法,但它不起作用。例如,以下代码为我提供了整体的最大发射号和最小序列号,而不是与最大发射号相关的最小序列号。我尝试了一些其他方法,如sort_values等,但没有成功。

  1. Last_Lathe_df = Last_Lathe_df.groupby(['Article'], as_index=False).agg({'Launch': 'max', 'Sequence': 'min', 'Machine': 'first'})

希望这个翻译对你有帮助。

英文:

I have a dataframe like this:

  1. Launch Article Sequence Machine Quantity Date
  2. 68033 F2500 10 lathe 1 200 01/02/2022
  3. 68033 F2500 20 lathe 1 190 01/02/2022
  4. 68033 F2500 30 borer 3 175 02/02/2022
  5. 68033 F2500 40 milling 1 175 03/03/2022
  6. 71562 F2500 10 lathe 3 632 12/12/2022
  7. 71562 F2500 20 lathe 4 593 15/12/2022
  8. 71562 F2500 30 borer 3 560 16/12/2022
  9. 71562 F2500 40 milling 2 555 16/12/2022
  10. 69872 F302 10 lathe 2 5463 04/06/2022
  11. 69872 F302 30 lathe 3 5102 11/06/2022
  12. 70444 F302 20 lathe 1 3125 27/07/2022
  13. 70444 F302 30 lathe 3 2965 31/07/2022
  14. 124.531 rows x 12 columns

What i need to do is a some kind of group by where, for each article i select the maximum launch number, and after that, the minimum sequence number with its relative machine.

The end result should look like this:

  1. Article Launch Sequence Machine
  2. F2500 71562 10 lathe 3
  3. F302 70444 20 lathe 1

I've tried to do it with pandas groupby with .agg, but it doesn't work. The following code, for example, gives me the max launch and min sequence overall and not the min sequence related to the max launch. I've tried some other approaches with sort_values and such, but with no success.

  1. Last_Lathe_df = Last_Lathe_df.groupby(['Article'], as_index=False).agg({'Launch': 'max', 'Sequence': 'min', 'Machine': 'first'})

答案1

得分: 2

获取每篇文章的最大启动次数并筛选行

m = df.groupby('Article')['Launch'].max()
df2 = df.loc[df['Launch'].isin(m)]

获取具有最小顺序的行

Last_Lathe_df = df2.loc[df2.groupby('Article')['Sequence'].idxmin()]

英文:

I would use:

  1. # get max Launch per Article and filter rows
  2. m = df.groupby('Article')['Launch'].max()
  3. df2 = df.loc[df['Launch'].isin(m)]
  4. # get rows with min sequence
  5. Last_Lathe_df = df2.loc[df2.groupby('Article')['Sequence'].idxmin()]

Output:

  1. Launch Article Sequence Machine Quantity Date
  2. 4 71562 F2500 10 lathe 3 632 12/12/2022
  3. 10 70444 F302 20 lathe 1 3125 27/07/2022

答案2

得分: 2

  1. 以简单的方式
  2. df.groupby('Article').apply(lambda x: x[x['Launch'].eq(x['Launch'].max())]
  3. .sort_values(by=['Sequence']).head(1)).reset_index(drop=True)
  4. ----------
  5. Launch Article Sequence Machine Quantity Date
  6. 0 71562 F2500 10 lathe 3 632 12/12/2022
  7. 1 70444 F302 20 lathe 1 3125 27/07/2022
英文:

In straightforward way:

  1. df.groupby('Article').apply(lambda x: x[x['Launch'].eq(x['Launch'].max())]
  2. .sort_values(by=['Sequence']).head(1)).reset_index(drop=True)

  1. Launch Article Sequence Machine Quantity Date
  2. 0 71562 F2500 10 lathe 3 632 12/12/2022
  3. 1 70444 F302 20 lathe 1 3125 27/07/2022

huangapple
  • 本文由 发表于 2023年3月15日 18:25:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/75743412.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定