2023年5月10日 22:26:35go评论111阅读模式

英文:

How to find the no. of nulls in every column in a polars dataframe?

问题

In pandas, one can do:

import pandas as pd
d = {"foo":[1,2,3, None], "bar":[4,None, None, 6]}
df_pandas = pd.DataFrame.from_dict(d)
dict(df_pandas.isnull().sum())

[out]:

{'foo': 1, 'bar': 2}

In polars, you can achieve the same using a dictionary comprehension:

import polars as pl
d = {"foo":[1,2,3, None], "bar":[4,None, None, 6]}
df_polars = pl.from_dict(d)
{col: df_polars[col].is_null().sum() for col in df_polars.columns}

Looping through columns in polars can be cumbersome, especially with LazyFrame, where aggregation may require chunk-wise collection.

英文:

In pandas, one can do:

import pandas as pd
d = {&quot;foo&quot;:[1,2,3, None], &quot;bar&quot;:[4,None, None, 6]}
df_pandas = pd.DataFrame.from_dict(d)
dict(df_pandas.isnull().sum())

[out]:

{&#39;foo&#39;: 1, &#39;bar&#39;: 2}

In polars it's possible to do the same by looping through the columns:

import polars as pl
d = {&quot;foo&quot;:[1,2,3, None], &quot;bar&quot;:[4,None, None, 6]}
df_polars = pl.from_dict(d)
{col:df_polars[col].is_null().sum() for col in df_polars.columns}

Looping through the columns in polars is particularly painful when using LazyFrame, then the .collect() has to be done in chunks to do the aggregation.

Is there a way to find no. of nulls in every column in a polars dataframe without looping through each columns?

答案1

得分: 3

假设您不固守于输出格式，按惯例执行的方式是...

df.select(pl.all().is_null().sum())

但如果您真的喜欢字典格式的输出，您可以轻松地获得它...

df.select(pl.all().is_null().sum()).to_dicts()[0]

这个工作原理是，在select内部，我们从pl.all()开始，表示所有的列，然后，就像在pandas版本中一样，我们应用is_null，它会返回True/False。然后我们链式应用sum，将True变成1，从而得到每一列中的空值数量。

也可以使用专用的null_count()，这样您就不必链式应用is_null().sum()，感谢@jqurious提供的建议。

英文:

Assuming you're not married to the output format the idiomatic way to do it is...

df.select(pl.all().is_null().sum())

However if you really like the dict output you can easily get it...

df.select(pl.all().is_null().sum()).to_dicts()[0]

The way this works is that inside the select we start with pl.all() which means all of the columns and then, much like in the pandas version, we apply is_null which would return True/False. From that we chain sum which turns the Trues into 1s and gives you the number of nulls in each column.

There's also the dedicated null_count() so you don't have to chain is_null().sum() thanks to @jqurious for that tip.

答案2

得分: 0

如果您想按行计数，请改用以下代码：df.hstack(df.transpose().select(pl.all().is_null().sum()).transpose().rename({"column_0": "null_count"}))

英文:

If you want row wise counts use this instead: df.hstack(df.transpose().select(pl.all().is_null().sum()).transpose().rename({"column_0": "null_count"}))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在一个 Polars 数据框中如何找到每列的空值数量？

问题

答案1

答案2

如何格式化多列文本，每列都左对齐？

Selenium的find_element在for循环中出现问题。

如何在polars中从日期计算月初和月末日期？

整数转字节 – 转换问题 – Python

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。