问题

我是一个新的 polars 用户，我想在每一行的 polars DataFrame 上应用一个函数。在 pandas 中，我会使用 apply 函数，并指定函数的输入是 DataFrame 的行而不是列。

我看到了 polars 库的 apply 函数，并且文档中建议使用 Expression API 而不是在 polars 的 DataFrame 上使用 apply 函数，因为这更有效率。文档中有关于 Expression API 的示例，但 select 函数通常用于 DataFrame 的列。是否有一种方法可以在 DataFrame 的行上使用 Expression API？

为提供示例而进行的编辑

我有一个具有以下结构的 DataFrame

l=[(1,2,3,4,22,23,None,None),(5,6,8,10,None,None,None,None)]
df=pl.DataFrame(data=l, orient=&#39;row&#39;)

也就是说，该 DataFrame 在某一点直到结尾，一行具有 None 值。在此示例中，第一行的 None 值从第6列开始，而第二行的 None 值从第4列开始。

我想要做的是找到将这个 DataFrame 转换为只有三列的最有效的 polars 方法，其中第一列是行的第一个元素，第二列是行的第二个元素，第三列将包含以下列中不是 None 的所有其他元素的列表。

英文:

I am a new polars user and I want to apply a function in every polars DataFrame row. In pandas I would use the apply function specifying that the input of the function is the DataFrame's row instead of the DataFrame's column(s).

I saw the apply function of polars library, and it says that it is preferable, because it is much more efficient, to use the Expression API instead of the apply function on a polars DataFrame. The documentation has examples of the Expression API with the select function, but select is used with the DataFrames's columns. Is there a way to use the Expression API with the rows of the DataFrame?

Edit for providing an example

I have a DataFrame with this structure

l=[(1,2,3,4,22,23,None,None),(5,6,8,10,None,None,None,None)]
df=pl.DataFrame(data=l, orient=&#39;row&#39;)

i.e. a DataFrame that at some point and until the end, a row has None values. In this example, in the first row the None values start at column 6, while in the second, the None values start at column 4.

What I want to do is to find the most efficient polars way to turn this DataFrame into a DataFrame with only three columns, where the first column is the first element of the row, the second column is the second element of the row, and the third will have as a list all the other elements of the following columns that are not None.

答案1

得分: 2

如果您正在使用列名，您可以：

通过名称选择前2列
从除了这2个命名列之外的所有列创建一个列表
使用.arr.eval从列表中删除空值

df.select(
   pl.col("column_0", "column_1"), 
   pl.concat_list(pl.exclude("column_0", "column_1"))
     .arr.eval(pl.element().drop_nulls())
)

形状: (2, 3)
┌──────────┬──────────┬──────────────┐
│ column_0 ┆ column_1 ┆ column_2     │
│ ---      ┆ ---      ┆ ---          │
│ i64      ┆ i64      ┆ list[i64]    │
╞══════════╪══════════╪══════════════╡
│ 1        ┆ 2        ┆ [3, 4, … 23] │
│ 5        ┆ 6        ┆ [8, 10]      │
└──────────┴──────────┴──────────────┘

英文:

If you're using the column names, you can:

select the first 2 columns by name
create a list from all columns excluding the 2 named columns
remove nulls from the list with .arr.eval

df.select(
   pl.col(&quot;column_0&quot;, &quot;column_1&quot;), 
   pl.concat_list(pl.exclude(&quot;column_0&quot;, &quot;column_1&quot;))
     .arr.eval(pl.element().drop_nulls())
)

shape: (2, 3)
┌──────────┬──────────┬──────────────┐
│ column_0 ┆ column_1 ┆ column_2     │
│ ---      ┆ ---      ┆ ---          │
│ i64      ┆ i64      ┆ list[i64]    │
╞══════════╪══════════╪══════════════╡
│ 1        ┆ 2        ┆ [3, 4, … 23] │
│ 5        ┆ 6        ┆ [8, 10]      │
└──────────┴──────────┴──────────────┘

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

polars使用DataFrame的行与Expression API。

问题

答案1

mutate_if在R中与case_when一起使用的多个条件

Precision, recall, F1 score all have zero value for the minority class in the classification report

解析JSON时间错误

Python 父类数据访问继承

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论