如何在 Polars 的 .when 条件中应用和/或布尔逻辑?

huangapple go评论79阅读模式
英文:

How do I apply and/or boolean logic on Polars .when conditionals?

问题

以下是您请求的翻译:

Let's start with my dataframe. It has 2 columns, src and tgt. When tgt is not null and src is not "?", I want to set tgt=src.

让我们从我的数据框开始。它有2列,srctgt。当 tgt 不为空 并且 src 不是 "?" 时,我想将 tgt 设置为 src

should then give, with an alias to newtgt

应该如下,使用别名 newtgt:

I can check the not null and I can check == "?". How do I combine them? I tried and, &, and &&, none of which worked.

我可以检查非空,也可以检查 == "?"。如何将它们组合起来?我尝试了 and&&&,但都不起作用。

What I have so far, including error messages:

到目前为止,我的代码如下,包括错误消息:

output:

输出:

英文:

Let's start with my dataframe. It has 2 columns, src and tgt. When tgt is not null and src is not "?" , I want to set tgt=src.

  1. ┌─────┬──────┐
  2. tgt src
  3. --- ---
  4. str str
  5. ╞═════╪══════╡
  6. a !a
  7. ? b
  8. ? null
  9. └─────┴──────┘

should then give, with an alias to newtgt

  1. ┌─────┬──────┬────────┐
  2. tgt src newtgt
  3. --- --- ---
  4. str str str
  5. ╞═════╪══════╪════════╡
  6. a !a a
  7. ? b b
  8. ? null ?
  9. └─────┴──────┴────────┘

I can check the not null and I can check == "?". How do I combine them? I tried and, & and &&, none of which worked.

What I have so far, including error messages:

  1. import polars as pl
  2. df = pl.from_dict(
  3. dict(tgt=["a","?","?"],src=["!a","b",None])
  4. )
  5. print("\ndf before:\n",df)
  6. df2 = df.with_columns(
  7. pl.when(pl.col("src").is_not_null())
  8. .then(pl.col("src"))
  9. .otherwise(pl.col("tgt"))
  10. .alias("newtgt")
  11. )
  12. print("\ndf2 check if src not null:\n",df2)
  13. df2 = df.with_columns(
  14. pl.when(pl.col("tgt") == "?")
  15. .then(pl.col("src"))
  16. .otherwise(pl.col("tgt"))
  17. .alias("newtgt")
  18. )
  19. print("\ndf2 if check tgt already known:\n",df2)
  20. try:
  21. print("\n\ncheck both with `and`: ")
  22. df2 = df.with_columns(
  23. pl.when(pl.col("tgt") == "?" and pl.col("src").is_not_null())
  24. .then(pl.col("src"))
  25. .otherwise(pl.col("tgt"))
  26. .alias("newtgt")
  27. )
  28. except (ValueError,) as e:
  29. print("\nnot happy with `and`:\n ", e)
  30. try:
  31. print("\n\ncheck both with `&`: ")
  32. df2 = df.with_columns(
  33. pl.when(pl.col("tgt") == "?" & pl.col("src").is_not_null())
  34. .then(pl.col("src"))
  35. .otherwise(pl.col("tgt"))
  36. .alias("newtgt")
  37. )
  38. except (pl.exceptions.InvalidOperationError,) as e:
  39. print("\nnot happy with `&`:\n ", e)

output:

  1. df before:
  2. shape: (3, 2)
  3. ┌─────┬──────┐
  4. tgt src
  5. --- ---
  6. str str
  7. ╞═════╪══════╡
  8. a !a
  9. ? b
  10. ? null
  11. └─────┴──────┘
  12. df2 check if src not null:
  13. shape: (3, 3)
  14. ┌─────┬──────┬────────┐
  15. tgt src newtgt
  16. --- --- ---
  17. str str str
  18. ╞═════╪══════╪════════╡
  19. a !a !a
  20. ? b b
  21. ? null ?
  22. └─────┴──────┴────────┘
  23. df2 if check tgt already known:
  24. shape: (3, 3)
  25. ┌─────┬──────┬────────┐
  26. tgt src newtgt
  27. --- --- ---
  28. str str str
  29. ╞═════╪══════╪════════╡
  30. a !a a
  31. ? b b
  32. ? null null
  33. └─────┴──────┴────────┘
  34. check both with `and`:
  35. not happy with `and`:
  36. Since Expr are lazy, the truthiness of an Expr is ambiguous. Hint: use '&' or '|' to logically combine Expr, not 'and'/'or', and use 'x.is_in([y,z])' instead of 'x in [y,z]' to check membership.
  37. check both with `&`:
  38. not happy with `&`:
  39. `bitand` operation not supported for dtype `str`

答案1

得分: 1

polars中,您需要为复杂和/或表达式的每个部分加括号,以避免模糊错误。正如该错误消息所暗示的那样,&也必须用于and

  1. df.with_columns(
  2. pl.when((pl.col("tgt") == "?") & (pl.col("src").is_not_null()))
  3. .then(pl.col("src"))
  4. .otherwise(pl.col("tgt"))
  5. .alias("newtgt")
  6. )
  1. shape: (3, 3)
  2. ┌─────┬──────┬────────┐
  3. tgt src newtgt
  4. --- --- ---
  5. str str str
  6. ╞═════╪══════╪════════╡
  7. a !a a
  8. ? b b
  9. ? null ?
  10. └─────┴──────┴────────┘

另一个等效的选项是pl.all(expr1, expr2, ...)

英文:

In polars you need to parenthesize each part of a complex and/or expression to avoid that ambiguous error. As that error message implies, & is required as well over and:

  1. df.with_columns(
  2. pl.when((pl.col("tgt") == "?") & (pl.col("src").is_not_null()))
  3. .then(pl.col("src"))
  4. .otherwise(pl.col("tgt"))
  5. .alias("newtgt")
  6. )
  1. shape: (3, 3)
  2. ┌─────┬──────┬────────┐
  3. tgt src newtgt
  4. --- --- ---
  5. str str str
  6. ╞═════╪══════╪════════╡
  7. a !a a
  8. ? b b
  9. ? null ?
  10. └─────┴──────┴────────┘

Another equivalent option is pl.all(expr1, expr2, ...)

huangapple
  • 本文由 发表于 2023年6月1日 06:59:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76377758.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定