如何在 Polars 的 .when 条件中应用和/或布尔逻辑?

huangapple go评论54阅读模式
英文:

How do I apply and/or boolean logic on Polars .when conditionals?

问题

以下是您请求的翻译:

Let's start with my dataframe. It has 2 columns, src and tgt. When tgt is not null and src is not "?", I want to set tgt=src.

让我们从我的数据框开始。它有2列,srctgt。当 tgt 不为空 并且 src 不是 "?" 时,我想将 tgt 设置为 src

should then give, with an alias to newtgt

应该如下,使用别名 newtgt:

I can check the not null and I can check == "?". How do I combine them? I tried and, &, and &&, none of which worked.

我可以检查非空,也可以检查 == "?"。如何将它们组合起来?我尝试了 and&&&,但都不起作用。

What I have so far, including error messages:

到目前为止,我的代码如下,包括错误消息:

output:

输出:

英文:

Let's start with my dataframe. It has 2 columns, src and tgt. When tgt is not null and src is not "?" , I want to set tgt=src.

┌─────┬──────┐
│ tgt ┆ src  │
│ --- ┆ ---  │
│ str ┆ str  │
╞═════╪══════╡
│ a   ┆ !a   │
│ ?   ┆ b    │
│ ?   ┆ null │
└─────┴──────┘

should then give, with an alias to newtgt

┌─────┬──────┬────────┐
│ tgt ┆ src  ┆ newtgt │
│ --- ┆ ---  ┆ ---    │
│ str ┆ str  ┆ str    │
╞═════╪══════╪════════╡
│ a   ┆ !a   ┆ a      │
│ ?   ┆ b    ┆ b      │
│ ?   ┆ null ┆ ?      │
└─────┴──────┴────────┘

I can check the not null and I can check == "?". How do I combine them? I tried and, & and &&, none of which worked.

What I have so far, including error messages:

import polars as pl

df = pl.from_dict(
    dict(tgt=["a","?","?"],src=["!a","b",None])
)
print("\ndf before:\n",df)

df2 = df.with_columns(
    pl.when(pl.col("src").is_not_null())
    .then(pl.col("src"))
    .otherwise(pl.col("tgt"))
    .alias("newtgt")
)
print("\ndf2 check if src not null:\n",df2)

df2 = df.with_columns(
    pl.when(pl.col("tgt") == "?")
    .then(pl.col("src"))
    .otherwise(pl.col("tgt"))
    .alias("newtgt")
)
print("\ndf2 if check tgt already known:\n",df2)

try:
    print("\n\ncheck both with `and`: ")
    df2 = df.with_columns(
        pl.when(pl.col("tgt") == "?" and pl.col("src").is_not_null())
        .then(pl.col("src"))
        .otherwise(pl.col("tgt"))
        .alias("newtgt")
    )
except (ValueError,) as e: 
    print("\nnot happy with `and`:\n  ", e)

try:
    print("\n\ncheck both with `&`: ")
    df2 = df.with_columns(
        pl.when(pl.col("tgt") == "?" & pl.col("src").is_not_null())
        .then(pl.col("src"))
        .otherwise(pl.col("tgt"))
        .alias("newtgt")
    )
except (pl.exceptions.InvalidOperationError,) as e: 
    print("\nnot happy with `&`:\n  ", e)

output:


df before:
 shape: (3, 2)
┌─────┬──────┐
│ tgt ┆ src  │
│ --- ┆ ---  │
│ str ┆ str  │
╞═════╪══════╡
│ a   ┆ !a   │
│ ?   ┆ b    │
│ ?   ┆ null │
└─────┴──────┘

df2 check if src not null:
 shape: (3, 3)
┌─────┬──────┬────────┐
│ tgt ┆ src  ┆ newtgt │
│ --- ┆ ---  ┆ ---    │
│ str ┆ str  ┆ str    │
╞═════╪══════╪════════╡
│ a   ┆ !a   ┆ !a     │
│ ?   ┆ b    ┆ b      │
│ ?   ┆ null ┆ ?      │
└─────┴──────┴────────┘

df2 if check tgt already known:
 shape: (3, 3)
┌─────┬──────┬────────┐
│ tgt ┆ src  ┆ newtgt │
│ --- ┆ ---  ┆ ---    │
│ str ┆ str  ┆ str    │
╞═════╪══════╪════════╡
│ a   ┆ !a   ┆ a      │
│ ?   ┆ b    ┆ b      │
│ ?   ┆ null ┆ null   │
└─────┴──────┴────────┘


check both with `and`: 

not happy with `and`:
   Since Expr are lazy, the truthiness of an Expr is ambiguous. Hint: use '&' or '|' to logically combine Expr, not 'and'/'or', and use 'x.is_in([y,z])' instead of 'x in [y,z]' to check membership.


check both with `&`: 

not happy with `&`:
   `bitand` operation not supported for dtype `str`

答案1

得分: 1

polars中,您需要为复杂和/或表达式的每个部分加括号,以避免模糊错误。正如该错误消息所暗示的那样,&也必须用于and

df.with_columns(
        pl.when((pl.col("tgt") == "?") & (pl.col("src").is_not_null()))
        .then(pl.col("src"))
        .otherwise(pl.col("tgt"))
        .alias("newtgt")
    )
shape: (3, 3)
┌─────┬──────┬────────┐
│ tgt ┆ src  ┆ newtgt │
│ --- ┆ ---  ┆ ---    │
│ str ┆ str  ┆ str    │
╞═════╪══════╪════════╡
│ a   ┆ !a   ┆ a      │
│ ?   ┆ b    ┆ b      │
│ ?   ┆ null ┆ ?      │
└─────┴──────┴────────┘

另一个等效的选项是pl.all(expr1, expr2, ...)

英文:

In polars you need to parenthesize each part of a complex and/or expression to avoid that ambiguous error. As that error message implies, & is required as well over and:

df.with_columns(
        pl.when((pl.col("tgt") == "?") & (pl.col("src").is_not_null()))
        .then(pl.col("src"))
        .otherwise(pl.col("tgt"))
        .alias("newtgt")
    )
shape: (3, 3)
┌─────┬──────┬────────┐
│ tgt ┆ src  ┆ newtgt │
│ --- ┆ ---  ┆ ---    │
│ str ┆ str  ┆ str    │
╞═════╪══════╪════════╡
│ a   ┆ !a   ┆ a      │
│ ?   ┆ b    ┆ b      │
│ ?   ┆ null ┆ ?      │
└─────┴──────┴────────┘

Another equivalent option is pl.all(expr1, expr2, ...)

huangapple
  • 本文由 发表于 2023年6月1日 06:59:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76377758.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定