2023年3月15日 18:25:03go评论115阅读模式

英文:

Multiple conditions Pandas groupby, keeping other column values

问题

我有一个类似这样的数据框：

Launch	Article	Sequence	Machine	    Quantity	Date	    …
68033	F2500	10	        lathe 1	    200	        01/02/2022	…
68033	F2500	20	        lathe 1	    190	        01/02/2022	…
68033	F2500	30	        borer 3	    175	        02/02/2022	…
68033	F2500	40	        milling 1	175	        03/03/2022	…
71562	F2500	10	        lathe 3	    632	        12/12/2022	…
71562	F2500	20	        lathe 4	    593	        15/12/2022	…
71562	F2500	30	        borer 3	    560	        16/12/2022	…
71562	F2500	40	        milling 2	555	        16/12/2022	…
69872	F302	10	        lathe 2	    5463	    04/06/2022	…
69872	F302	30	        lathe 3	    5102	    11/06/2022	…
70444	F302	20	        lathe 1	    3125	    27/07/2022	…
70444	F302	30	        lathe 3	    2965	    31/07/2022	…
…	    …	    …	        …	        …	        …	        …
124.531 rows x 12 columns

我需要做的是一种分组操作，对于每个文章（Article），我选择最大的发射号（Launch），然后选择具有最小序列号的相应机器（Machine）。

最终结果应该如下所示：

Article	Launch	Sequence	Machine
F2500	71562	10	        lathe 3
F302	70444	20	        lathe 1
…       …       …           …

我尝试使用pandas的groupby和agg方法，但它不起作用。例如，以下代码为我提供了整体的最大发射号和最小序列号，而不是与最大发射号相关的最小序列号。我尝试了一些其他方法，如sort_values等，但没有成功。

Last_Lathe_df = Last_Lathe_df.groupby(['Article'], as_index=False).agg({'Launch': 'max', 'Sequence': 'min', 'Machine': 'first'})

希望这个翻译对你有帮助。

英文:

I have a dataframe like this:

Launch	Article	Sequence	Machine	    Quantity	Date	    …
68033	F2500	10	        lathe 1	    200	        01/02/2022	…
68033	F2500	20	        lathe 1	    190	        01/02/2022	…
68033	F2500	30	        borer 3	    175	        02/02/2022	…
68033	F2500	40	        milling 1	175	        03/03/2022	…
71562	F2500	10	        lathe 3	    632	        12/12/2022	…
71562	F2500	20	        lathe 4	    593	        15/12/2022	…
71562	F2500	30	        borer 3	    560	        16/12/2022	…
71562	F2500	40	        milling 2	555	        16/12/2022	…
69872	F302	10	        lathe 2	    5463	    04/06/2022	…
69872	F302	30	        lathe 3	    5102	    11/06/2022	…
70444	F302	20	        lathe 1	    3125	    27/07/2022	…
70444	F302	30	        lathe 3	    2965	    31/07/2022	…
…	    …	    …	        …	        …	        …	        …
124.531 rows x 12 columns

What i need to do is a some kind of group by where, for each article i select the maximum launch number, and after that, the minimum sequence number with its relative machine.

The end result should look like this:

Article	Launch	Sequence	Machine
F2500	71562	10	        lathe 3
F302	70444	20	        lathe 1
…       …       …           …

I've tried to do it with pandas groupby with .agg, but it doesn't work. The following code, for example, gives me the max launch and min sequence overall and not the min sequence related to the max launch. I've tried some other approaches with sort_values and such, but with no success.

Last_Lathe_df = Last_Lathe_df.groupby([&#39;Article&#39;], as_index=False).agg({&#39;Launch&#39;: &#39;max&#39;, &#39;Sequence&#39;: &#39;min&#39;, &#39;Machine&#39;: &#39;first&#39;})

答案1

得分: 2

获取每篇文章的最大启动次数并筛选行

m = df.groupby('Article')['Launch'].max()
df2 = df.loc[df['Launch'].isin(m)]

获取具有最小顺序的行

Last_Lathe_df = df2.loc[df2.groupby('Article')['Sequence'].idxmin()]

英文:

I would use:

# get max Launch per Article and filter rows
m = df.groupby(&#39;Article&#39;)[&#39;Launch&#39;].max()
df2 = df.loc[df[&#39;Launch&#39;].isin(m)]
# get rows with min sequence
Last_Lathe_df = df2.loc[df2.groupby(&#39;Article&#39;)[&#39;Sequence&#39;].idxmin()]

Output:

    Launch Article  Sequence  Machine  Quantity        Date
4    71562   F2500        10  lathe 3       632  12/12/2022
10   70444    F302        20  lathe 1      3125  27/07/2022

答案2

得分: 2

以简单的方式：
    df.groupby('Article').apply(lambda x: x[x['Launch'].eq(x['Launch'].max())]
                                .sort_values(by=['Sequence']).head(1)).reset_index(drop=True)
----------
      Launch Article  Sequence  Machine  Quantity        Date
    0   71562   F2500        10  lathe 3       632  12/12/2022
    1   70444    F302        20  lathe 1      3125  27/07/2022

英文:

In straightforward way:

df.groupby(&#39;Article&#39;).apply(lambda x: x[x[&#39;Launch&#39;].eq(x[&#39;Launch&#39;].max())]
                            .sort_values(by=[&#39;Sequence&#39;]).head(1)).reset_index(drop=True)

  Launch Article  Sequence  Machine  Quantity        Date
0   71562   F2500        10  lathe 3       632  12/12/2022
1   70444    F302        20  lathe 1      3125  27/07/2022

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas groupby多个条件，保留其他列数值。

问题

答案1

获取每篇文章的最大启动次数并筛选行

获取具有最小顺序的行

答案2

“在WSL上调用`os.mkdir`时出现“找不到文件或目录”错误”

从 rootContext 获取上下文时发生类型错误

如何在Python中测试格兰杰因果性（Toda＆Yamamoto风格）？

打印 pandas 行的单元格为字符串

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。