2023年6月19日 22:14:31go评论162阅读模式

英文:

Improving polars statement that adds a column applying a lambda function on each row

问题

我正在尝试在polars中使用apply添加一列。与pandas的等效方法如下：

>>> import pandas as pd
>>> df = pd.DataFrame({"ref": [-1, 2, 8], "v1": [-1, 5, 0], "v2": [-1, 5, 8]})
>>> df['count'] = df.apply(lambda r: len([i for i in r if i == r[0]]) - 1, axis=1)
>>> df = df.drop('ref', axis=1)
>>> df
   v1  v2  count
0  -1  -1      2
1   5   5      0
2   0   8      1
>>>

以下是我使用polars的示例代码。虽然它按预期工作，但看起来不太美观，可能也可以改进。

>>> import polars as pl
>>>
>>> df = pl.DataFrame({"ref": [-1, 2, 8], "v1": [-1, 5, 0], "v2": [-1, 5, 8]})
>>>
>>> x = df.apply(lambda r: len([i for i in r if i == r[0]]) - 1).rename({'apply': 'count'})
>>> df = df.hstack([x.to_series()]).drop('ref')
>>>
>>> df
shape: (3, 3)
┌─────┬─────┬───────┐
│ v1  ┆ v2  ┆ count │
│ --- ┆ --- ┆ ---   │
│ i64 ┆ i64 ┆ i64   │
╞═════╪═════╪═══════╡
│ -1  ┆ -1  ┆ 2     │
│ 5   ┆ 5   ┆ 0     │
│ 0   ┆ 8   ┆ 1     │
└─────┴─────┴───────┘
>>>

让我感到困扰的是重命名部分和我拼凑在一起使用的hstack。我曾看到一些示例中使用了.with_column()方法，但该方法不在我的polars版本（0.17.14）中存在。

对于以上代码的任何改进，我将不胜感激。

TIA

英文:

I am trying to add a column using apply in polars. The equivalent of pandas is as follows:

&gt;&gt;&gt; import pandas as pd
&gt;&gt;&gt; df = pd.DataFrame({&quot;ref&quot;: [-1, 2, 8], &quot;v1&quot;: [-1, 5, 0], &quot;v2&quot;: [-1, 5, 8]})
&gt;&gt;&gt; df[&#39;count&#39;] = df.apply(lambda r: len([i for i in r if i == r[0]]) - 1, axis=1)
&gt;&gt;&gt; df = df.drop(&#39;ref&#39;, axis=1)
&gt;&gt;&gt; df
   v1  v2  count
0  -1  -1      2
1   5   5      0
2   0   8      1
&gt;&gt;&gt;

The following is the sample code that I have with polars. Though it works as desired, it looks ugly and probably can be improved as well.

&gt;&gt;&gt; import polars as pl
&gt;&gt;&gt;
&gt;&gt;&gt; df = pl.DataFrame({&quot;ref&quot;: [-1, 2, 8], &quot;v1&quot;: [-1, 5, 0], &quot;v2&quot;: [-1, 5, 8]})
&gt;&gt;&gt;
&gt;&gt;&gt; x = df.apply(lambda r: len([i for i in r if i == r[0]]) - 1).rename({&#39;apply&#39;: &#39;count&#39;})
&gt;&gt;&gt; df = df.hstack([x.to_series()]).drop(&#39;ref&#39;)
&gt;&gt;&gt;
&gt;&gt;&gt; df
shape: (3, 3)
┌─────┬─────┬───────┐
│ v1  ┆ v2  ┆ count │
│ --- ┆ --- ┆ ---   │
│ i64 ┆ i64 ┆ i64   │
╞═════╪═════╪═══════╡
│ -1  ┆ -1  ┆ 2     │
│ 5   ┆ 5   ┆ 0     │
│ 0   ┆ 8   ┆ 1     │
└─────┴─────┴───────┘
&gt;&gt;&gt;

What bothers me is the renaming part and hstack that I clobbered together to work. I have seen some examples where .with_column() was used but that method is not present in my version of polars (0.17.14).

I would be grateful for any improvements in the above code.

TIA

答案1

得分: 2

以前，存在.with_column和.with_columns两种方法，现在只有.with_columns。

看起来你想要计算ref和另一列的数值相同时的情况。

你可以直接使用polars中的表达式来实现这个功能：

df.with_columns(count = pl.sum(pl.col('ref') == pl.exclude('ref')))

shape: (3, 4)
┌─────┬─────┬─────┬───────┐
│ ref ┆ v1  ┆ v2  ┆ count │
│ --- ┆ --- ┆ --- ┆ ---   │
│ i64 ┆ i64 ┆ i64 ┆ u32   │
╞═════╪═════╪═════╪═══════╡
│ -1  ┆ -1  ┆ -1  ┆ 2     │
│ 2   ┆ 5   ┆ 5   ┆ 0     │
│ 8   ┆ 0   ┆ 8   ┆ 1     │
└─────┴─────┴─────┴───────┘

英文:

Previously, .with_column and .with_columns both existed, it's just .with_columns now.

It looks like you're trying to count when ref and another column have the same value.

You can do this directly with expressions in polars:

df.with_columns(count = pl.sum(pl.col(&#39;ref&#39;) == pl.exclude(&#39;ref&#39;)))

shape: (3, 4)
┌─────┬─────┬─────┬───────┐
│ ref ┆ v1  ┆ v2  ┆ count │
│ --- ┆ --- ┆ --- ┆ ---   │
│ i64 ┆ i64 ┆ i64 ┆ u32   │
╞═════╪═════╪═════╪═══════╡
│ -1  ┆ -1  ┆ -1  ┆ 2     │
│ 2   ┆ 5   ┆ 5   ┆ 0     │
│ 8   ┆ 0   ┆ 8   ┆ 1     │
└─────┴─────┴─────┴───────┘

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

优化polars语句，通过在每一行上应用lambda函数添加一列。

问题

答案1

Python Script Referencing or Calling

Pytest-xdist: 所有工作进程完成后的 tearDown

如何从另一个函数中退出一个函数？

Tkinter – Columnspan无法获得期望的结果：如何使小部件到达它们期望的位置？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论