Polars聚合警告,使用when->then

huangapple go评论54阅读模式
英文:

Polars aggregation warning using when->then

问题

Consider the following:

In [9]: df
Out[9]: 
shape: (2, 2)
┌─────┬─────┐
 a    b   
 ---  --- 
 i64  i64 
╞═════╪═════╡
 1    1   
 2    1   
└─────┴─────┘

In [10]: df.groupby("a").agg(pl.when(pl.col("b") == 1).then(pl.col("b")))
The predicate '[(col("b")) == (1)]' in 'when->then->otherwise' is not a valid aggregation and might produce a different number of rows than the groupby operation would. This behavior is experimental and may be subject to change
Out[10]: 
shape: (2, 2)
┌─────┬───────────┐
 a    b         
 ---  ---       
 i64  list[i64] 
╞═════╪═══════════╡
 2    [1]       
 1    [1]       
└─────┴───────────┘

Is there something to worry about? The when->then has to produce a value even if it's null.

英文:

Consider the following:

In [9]: df
Out[9]: 
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 1   │
│ 2   ┆ 1   │
└─────┴─────┘

In [10]: df.groupby("a").agg(pl.when(pl.col("b") == 1).then(pl.col("b")))
The predicate '[(col("b")) == (1)]' in 'when->then->otherwise' is not a valid aggregation and might produce a different number of rows than the groupby operation would. This behavior is experimental and may be subject to change
Out[10]: 
shape: (2, 2)
┌─────┬───────────┐
│ a   ┆ b         │
│ --- ┆ ---       │
│ i64 ┆ list[i64] │
╞═════╪═══════════╡
│ 2   ┆ [1]       │
│ 1   ┆ [1]       │
└─────┴───────────┘

Is there something to worry about? The when->then has to produce a value even if it's null.

答案1

得分: 1

以下是您要翻译的内容:

聚合中的问题在于将组 "B" 与文字 1 进行比较,并用组 "B" 替换。我们尚未为该表达式正式化向量化规则,因此会出现警告。

更明确且更易理解的方法是,在 with_columns 中应用您的三元表达式,然后进行聚合:

df = pl.from_repr("""shape: (2, 2)
┌─────┬─────┐
 a    b   
 ---  --- 
 i64  i64 
╞═════╪═════╡
 1    1   
 2    1   
└─────┴─────┘
""")

(df.with_columns(
    pl.when(pl.col("b") == 1).then(pl.col("b"))
).groupby("a").all())
shape: (2, 2)
┌─────┬───────────┐
│ a   ┆ b         │
│ --- ┆ ---       │
│ i64 ┆ list[i64] │
╞═════╪═══════════╡
│ 1   ┆ [1]       │
│ 2   ┆ [1]       │
└─────┴───────────┘
英文:

The problem in the aggregation arises from comparing the group "B" with a literal 1 and replacing with the group "B". We haven't formalized the vectorization rules for that expression yet, hence the warning.

It is more explicit and (easier for use to understand), to apply your ternary expression in a with_columns and then do an aggregation:

df = pl.from_repr("""shape: (2, 2)
┌─────┬─────┐
 a    b   
 ---  --- 
 i64  i64 
╞═════╪═════╡
 1    1   
 2    1   
└─────┴─────┘
""")

(df.with_columns(
    pl.when(pl.col("b") == 1).then(pl.col("b"))
).groupby("a").all())
shape: (2, 2)
┌─────┬───────────┐
│ a   ┆ b         │
│ --- ┆ ---       │
│ i64 ┆ list[i64] │
╞═════╪═══════════╡
│ 1   ┆ [1]       │
│ 2   ┆ [1]       │
└─────┴───────────┘

huangapple
  • 本文由 发表于 2023年5月25日 19:48:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76331933.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定