问题

我想要检查我的数据框中一组列的空值百分比，这些列根据另一组列进行分组。

数据框中的列可能会变化，所以我希望能够传递一个要按组分组的列列表以及一个要计算空值的列列表。

轻松分组可以使用列名列表，但我找不到一种方法来对计数进行类似的操作。我只能为要计算空值的每一列编写一行代码：

```python
df = df.groupby(by=list_of_grouping_fields)\
    .agg([
        pl.count().alias('Row_Count'),
        pl.col("INVOICE NUMBER").null_count().alias('INVOICENUMBER_nullcount'),
        pl.col("ORDER NUMBER").null_count().alias('ORDERNUMBER_nullcount')
        ])

这似乎效率低下，如果列发生变化，我需要回来编辑代码。

理想情况下，我希望能做类似这样的事情：

df = df.groupby(by=list_of_grouping_fields)\
    .agg([
        pl.count().alias('Row_Count'),
        pl.col(list_of_agg_fields).null_count().alias(list_of_aliases)
        ])

在Polars中是否有一种方法可以实现这个目标？



<details>
<summary>英文:</summary>

I&#39;d like to check the percentage of nulls for a set of columns in my dataframe, grouped by a different set of columns.


The columns in the dataframe may change, so I&#39;d like to be able to pass in a list of columns to group by and a list of columns to count nulls for.

Grouping easily takes a list of column names, but I can&#39;t find a way to do this for the counts. I can only do it by writing a line for each column I want to count nulls for:

df = df.groupby(by=list_of_grouping_fields)
.agg([
pl.count().alias('Row_Count'),
pl.col("INVOICE NUMBER").null_count().alias('INVOICENUMBER_nullcount'),
pl.col("ORDER NUMBER").null_count().alias('ORDERNUMBER_nullcount')
])


This seems inefficient and would require me to come back and edit the code if the columns changed.

Ideally I&#39;d like to do something like this:

df = df.groupby(by=list_of_grouping_fields)
.agg([
pl.count().alias('Row_Count'),
pl.col([list_of_agg_fields]).null_count().alias([list_of_aliases])
])


Is there a way to do this with Polars?

</details>


# 答案1
**得分**: 1

[`agg`](https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/api/polars.dataframe.groupby.GroupBy.agg.html#polars.dataframe.groupby.GroupBy.agg) 接受可变的位置参数和关键字参数，因此您应该能够执行以下操作：

```python
.agg(
  pl.count().alias('Row_Count'),
  *[pl.col(af).null_count().alias(al) for af, al in zip(list_of_agg_fields, list_of_aliases)]
)

或者，由于关键字参数是别名的快捷方式，您也可以这样做：

.agg(
  pl.count().alias('Row_Count'), # 或者 Row_Count=pl.count()
  **{al: af.null_count() for af, al in zip(list_of_agg_fields, list_of_aliases)}
)

几个月前曾经进行了大力改进，使许多核心函数的签名更加灵活，就像这样。请注意，groupby 也接受可变的位置参数，第一个参数可以是单个表达式或表达式的可迭代对象，这有点是之前这个努力的遗留物。

英文:

agg takes variadic positional args and keyword args, so you should be able to do

.agg(
  pl.count().alias(&#39;Row_Count&#39;),
  *[pl.col(af).null_count().alias(al) for af,al in zip(list_of_agg_fields, list_of_aliases)]
)

or, since keyword args is a shortcut for alias:

.agg(
  pl.count().alias(&#39;Row_Count&#39;), # or Row_Count=pl.count()
  **{al : af.null_count() for af,al in zip(list_of_agg_fields, list_of_aliases}
)

There was a big effort a few months back to make a lot of core function signatures be more flexible like this. Note that groupby takes variadic positional as well, the first argument being either a single expression or iterable of expressions is a bit of a leftover from before this effort.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Polars 中的动态聚合

问题

如何解决 AttributeError: 类型对象 ‘LibraryItem’ 没有属性 ‘Book_Title’？

在使用Plotly绘制3D凸包图。

灰盒参数估计与GEKKO（@error：模型不足）

How to plot daily data against a 24 hour axis (00:00 – 23:59:59) while keeping the order of the custom sort of the time?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论