2023年6月15日 12:49:54go评论80阅读模式

英文:

How to select a row with minimum value over some other column

问题

# 选择基于另一列聚合的最小值的数据框行
# 例如

a = pl.DataFrame({'a': [1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3], 'b': [10, 5, 10, 6, 5, 4, 5, 30, 25, 24, 30, 30], 'c': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]})
print(a)

shape: (12, 3)
┌─────┬─────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1 ┆ 10 ┆ 1 │
│ 1 ┆ 5 ┆ 2 │
│ 1 ┆ 10 ┆ 3 │
│ 2 ┆ 6 ┆ 4 │
│ 2 ┆ 5 ┆ 5 │
│ 2 ┆ 4 ┆ 6 │
│ 2 ┆ 5 ┆ 7 │
│ 3 ┆ 30 ┆ 8 │
│ 3 ┆ 25 ┆ 9 │
│ 3 ┆ 24 ┆ 10 │
│ 3 ┆ 30 ┆ 11 │
│ 3 ┆ 30 ┆ 12 │
└─────┴─────┴─────┘

我需要以下数据框

shape: (3, 3)
┌─────┬─────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1 ┆ 5 ┆ 2 │
│ 2 ┆ 4 ┆ 6 │
│ 3 ┆ 24 ┆ 10 │
└─────┴─────┴─────┘

所以它就像是按照 'a' 进行分组，然后取 'b' 的最小值，但我需要选择整行的最小值。
谢谢。

英文:

How to select a rows from dataframe based on minimum value in one column aggregated over another column. For example

a = pl.DataFrame({&#39;a&#39;: [1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3], &#39;b&#39;: [10, 5, 10, 6, 5, 4, 5, 30, 25, 24, 30, 30], &#39;c&#39;: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]})
print(a)
-----------------
shape: (12, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 10  ┆ 1   │
│ 1   ┆ 5   ┆ 2   │
│ 1   ┆ 10  ┆ 3   │
│ 2   ┆ 6   ┆ 4   │
│ 2   ┆ 5   ┆ 5   │
│ 2   ┆ 4   ┆ 6   │
│ 2   ┆ 5   ┆ 7   │
│ 3   ┆ 30  ┆ 8   │
│ 3   ┆ 25  ┆ 9   │
│ 3   ┆ 24  ┆ 10  │
│ 3   ┆ 30  ┆ 11  │
│ 3   ┆ 30  ┆ 12  │
└─────┴─────┴─────┘

I need the following dataframe

shape: (3, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 5   ┆ 2   │
│ 2   ┆ 4   ┆ 6   │
│ 3   ┆ 24  ┆ 10  │
└─────┴─────┴─────┘

so it's like groupby(by='a'), them min('b'), but I need to select entire row with minimum b.

Thanks.

答案1

得分: 2

看起来像是一个 .filter + .over：

df.filter(pl.col('b') == pl.min('b').over('a'))

形状: (3, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 5   ┆ 2   │
│ 2   ┆ 4   ┆ 6   │
│ 3   ┆ 24  ┆ 10  │
└─────┴─────┴─────┘

要处理重复值，你可以添加 .unique(subset='a')。

它也可以写成一个 .groupby：

(df.groupby('a')
   .agg(
      pl.col('b', 'c').filter(pl.col('b') == pl.min('b')).first()
   )
)

英文:

Looks like a .filter + .over

df.filter(pl.col(&#39;b&#39;) == pl.min(&#39;b&#39;).over(&#39;a&#39;))

shape: (3, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 5   ┆ 2   │
│ 2   ┆ 4   ┆ 6   │
│ 3   ┆ 24  ┆ 10  │
└─────┴─────┴─────┘

To handle duplicates you could add .unique(subset='a')

It could also be written as a .groupby:

(df.groupby(&#39;a&#39;)
   .agg(
      pl.col(&#39;b&#39;, &#39;c&#39;).filter(pl.col(&#39;b&#39;) == pl.min(&#39;b&#39;)).first()
   )
)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何选择具有某一列上最小值的行。

问题

a = pl.DataFrame({'a': [1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3], 'b': [10, 5, 10, 6, 5, 4, 5, 30, 25, 24, 30, 30], 'c': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]})
print(a)

答案1

在R中，如果数据框中的项目被特殊字符如”+”分隔，可以创建一个新的行。

从分组的数据框创建堆叠的NumPy数组。

数据框中的新列不保留 POSIXct 类。

将两个数据框基于它们的分组/ID 进行映射，使用更接近的数值。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论

问题

a = pl.DataFrame({'a': [1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3], 'b': [10, 5, 10, 6, 5, 4, 5, 30, 25, 24, 30, 30], 'c': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]}) print(a)

答案1

发表评论

a = pl.DataFrame({'a': [1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3], 'b': [10, 5, 10, 6, 5, 4, 5, 30, 25, 24, 30, 30], 'c': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]})
print(a)