2023年5月26日 14:46:03go评论79阅读模式

英文:

Polars and the Lazy API: How to drop columns that contain only null values?

问题

我正在使用Polars，需要在数据预处理过程中删除只包含空值的列。但是，我在使用Lazy API时遇到了困难。

例如，给定下面的表格，我应该如何使用Polars的Lazy API删除列"a"？

df = pl.DataFrame(
    {
        &quot;a&quot;: [None, None, None, None],
        &quot;b&quot;: [1, 2, None, 1],
        &quot;c&quot;: [1, None, None, 1],
    }
)
df

英文:

I am working with Polars and need to drop columns that contain only null values during my data preprocessing. However, I am having trouble using the Lazy API to accomplish this.

For instance, given the table below, how can I drop column "a" using Polars' Lazy API?

df = pl.DataFrame(
    {
        &quot;a&quot;: [None, None, None, None],
        &quot;b&quot;: [1, 2, None, 1],
        &quot;c&quot;: [1, None, None, 1],
    }
)
df

shape: (4, 3)
┌──────┬──────┬──────┐
│ a    ┆ b    ┆ c    │
│ ---  ┆ ---  ┆ ---  │
│ f64  ┆ i64  ┆ i64  │
╞══════╪══════╪══════╡
│ null ┆ 1    ┆ 1    │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ null ┆ 2    ┆ null │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ null ┆ null ┆ null │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ null ┆ 1    ┆ 1    │
└──────┴──────┴──────┘

I am aware of Issue #1613 and the solution of filtering columns where all values are null, but this is not Lazy API.

FYI,

# filter columns where all values are null
df[:, [not (s.null_count() == df.height) for s in df]]

I am also aware of the drop_nulls function in Polars, which can only drop all rows that contain null values, unlike the dropna function in Pandas that can take two arguments, axis and how.
Can someone provide an example of how to drop columns with all null values in Polars using the Lazy API?

答案1

得分: 0

你无法以你想要的方式来做到，至少目前不能。Polars不知道LazyFrame中哪些列只包含空值，直到你进行collect操作。这意呢你需要进行一次collect操作来获取你想要的列，然后再进行另一次以实现你想要的列。

让我们将你的df=df.lazy()转换为以下两个步骤：

步骤1：

(df.select(pl.all().is_null().all())
    .melt()
    .filter(pl.col('value')==False)
    .select('variable')
    .collect()
    .to_series()
    .to_list())

这些是没有空值的列，现在你可以将它们包装在自己的select中。

步骤2：

(df.select(
    df.select(pl.all().is_null().all())
        .melt()
        .filter(pl.col('value')==False)
        .select('variable')
        .collect()
        .to_series()
        .to_list())
.collect())

英文:

You can't, at least not in the way you want. polars doesn't know enough about the lazyframe to tell which columns are only nulls until you collect. That means you need a collect in order to get the columns you want and then another one to materialize the columns you wanted.

Let's turn your df=df.lazy()

Step 1:

(df.select(pl.all().is_null().all())
    .melt()
    .filter(pl.col(&#39;value&#39;)==False)
    .select(&#39;variable&#39;)
    .collect()
    .to_series()
    .to_list())

Those are your columns that have no nulls so now you wrap it in its own select

Step 2:

(df.select(
    df.select(pl.all().is_null().all())
        .melt()
        .filter(pl.col(&#39;value&#39;)==False)
        .select(&#39;variable&#39;)
        .collect()
        .to_series()
        .to_list())
.collect())

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Polars 和 Lazy API：如何删除只包含空值的列？

问题

答案1

Matplotlib FuncAnimation在VS Code中运行，但在Google Colab中不运行。

返回更改对象变量后的类对象

Import "numpy" could not be resolved; ipynb in vscode

如何告诉Python在同时存在.py和.pyc文件时运行.py文件？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论