英文:
Improving polars statement that adds a column applying a lambda function on each row
问题
我正在尝试在polars中使用apply
添加一列。与pandas
的等效方法如下:
>>> import pandas as pd
>>> df = pd.DataFrame({"ref": [-1, 2, 8], "v1": [-1, 5, 0], "v2": [-1, 5, 8]})
>>> df['count'] = df.apply(lambda r: len([i for i in r if i == r[0]]) - 1, axis=1)
>>> df = df.drop('ref', axis=1)
>>> df
v1 v2 count
0 -1 -1 2
1 5 5 0
2 0 8 1
>>>
以下是我使用polars的示例代码。虽然它按预期工作,但看起来不太美观,可能也可以改进。
>>> import polars as pl
>>>
>>> df = pl.DataFrame({"ref": [-1, 2, 8], "v1": [-1, 5, 0], "v2": [-1, 5, 8]})
>>>
>>> x = df.apply(lambda r: len([i for i in r if i == r[0]]) - 1).rename({'apply': 'count'})
>>> df = df.hstack([x.to_series()]).drop('ref')
>>>
>>> df
shape: (3, 3)
┌─────┬─────┬───────┐
│ v1 ┆ v2 ┆ count │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═══════╡
│ -1 ┆ -1 ┆ 2 │
│ 5 ┆ 5 ┆ 0 │
│ 0 ┆ 8 ┆ 1 │
└─────┴─────┴───────┘
>>>
让我感到困扰的是重命名部分和我拼凑在一起使用的hstack
。我曾看到一些示例中使用了.with_column()
方法,但该方法不在我的polars版本(0.17.14)中存在。
对于以上代码的任何改进,我将不胜感激。
TIA
英文:
I am trying to add a column using apply
in polars. The equivalent of pandas
is as follows:
>>> import pandas as pd
>>> df = pd.DataFrame({"ref": [-1, 2, 8], "v1": [-1, 5, 0], "v2": [-1, 5, 8]})
>>> df['count'] = df.apply(lambda r: len([i for i in r if i == r[0]]) - 1, axis=1)
>>> df = df.drop('ref', axis=1)
>>> df
v1 v2 count
0 -1 -1 2
1 5 5 0
2 0 8 1
>>>
The following is the sample code that I have with polars. Though it works as desired, it looks ugly and probably can be improved as well.
>>> import polars as pl
>>>
>>> df = pl.DataFrame({"ref": [-1, 2, 8], "v1": [-1, 5, 0], "v2": [-1, 5, 8]})
>>>
>>> x = df.apply(lambda r: len([i for i in r if i == r[0]]) - 1).rename({'apply': 'count'})
>>> df = df.hstack([x.to_series()]).drop('ref')
>>>
>>> df
shape: (3, 3)
┌─────┬─────┬───────┐
│ v1 ┆ v2 ┆ count │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═══════╡
│ -1 ┆ -1 ┆ 2 │
│ 5 ┆ 5 ┆ 0 │
│ 0 ┆ 8 ┆ 1 │
└─────┴─────┴───────┘
>>>
What bothers me is the renaming part and hstack
that I clobbered together to work. I have seen some examples where .with_column()
was used but that method is not present in my version of polars (0.17.14).
I would be grateful for any improvements in the above code.
TIA
答案1
得分: 2
以前,存在.with_column
和.with_columns
两种方法,现在只有.with_columns
。
看起来你想要计算ref
和另一列的数值相同时的情况。
你可以直接使用polars中的表达式来实现这个功能:
df.with_columns(count = pl.sum(pl.col('ref') == pl.exclude('ref')))
shape: (3, 4)
┌─────┬─────┬─────┬───────┐
│ ref ┆ v1 ┆ v2 ┆ count │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ u32 │
╞═════╪═════╪═════╪═══════╡
│ -1 ┆ -1 ┆ -1 ┆ 2 │
│ 2 ┆ 5 ┆ 5 ┆ 0 │
│ 8 ┆ 0 ┆ 8 ┆ 1 │
└─────┴─────┴─────┴───────┘
英文:
Previously, .with_column
and .with_columns
both existed, it's just .with_columns
now.
It looks like you're trying to count when ref
and another column have the same value.
You can do this directly with expressions in polars:
df.with_columns(count = pl.sum(pl.col('ref') == pl.exclude('ref')))
shape: (3, 4)
┌─────┬─────┬─────┬───────┐
│ ref ┆ v1 ┆ v2 ┆ count │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ u32 │
╞═════╪═════╪═════╪═══════╡
│ -1 ┆ -1 ┆ -1 ┆ 2 │
│ 2 ┆ 5 ┆ 5 ┆ 0 │
│ 8 ┆ 0 ┆ 8 ┆ 1 │
└─────┴─────┴─────┴───────┘
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论