英文:
Polars aggregation warning using when->then
问题
Consider the following:
In [9]: df
Out[9]:
shape: (2, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 1 │
│ 2 ┆ 1 │
└─────┴─────┘
In [10]: df.groupby("a").agg(pl.when(pl.col("b") == 1).then(pl.col("b")))
The predicate '[(col("b")) == (1)]' in 'when->then->otherwise' is not a valid aggregation and might produce a different number of rows than the groupby operation would. This behavior is experimental and may be subject to change
Out[10]:
shape: (2, 2)
┌─────┬───────────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ list[i64] │
╞═════╪═══════════╡
│ 2 ┆ [1] │
│ 1 ┆ [1] │
└─────┴───────────┘
Is there something to worry about? The when->then has to produce a value even if it's null.
英文:
Consider the following:
In [9]: df
Out[9]:
shape: (2, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 1 │
│ 2 ┆ 1 │
└─────┴─────┘
In [10]: df.groupby("a").agg(pl.when(pl.col("b") == 1).then(pl.col("b")))
The predicate '[(col("b")) == (1)]' in 'when->then->otherwise' is not a valid aggregation and might produce a different number of rows than the groupby operation would. This behavior is experimental and may be subject to change
Out[10]:
shape: (2, 2)
┌─────┬───────────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ list[i64] │
╞═════╪═══════════╡
│ 2 ┆ [1] │
│ 1 ┆ [1] │
└─────┴───────────┘
Is there something to worry about? The when->then has to produce a value even if it's null.
答案1
得分: 1
以下是您要翻译的内容:
聚合中的问题在于将组 "B" 与文字 1 进行比较,并用组 "B" 替换。我们尚未为该表达式正式化向量化规则,因此会出现警告。
更明确且更易理解的方法是,在 with_columns 中应用您的三元表达式,然后进行聚合:
df = pl.from_repr("""shape: (2, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 1 │
│ 2 ┆ 1 │
└─────┴─────┘
""")
(df.with_columns(
pl.when(pl.col("b") == 1).then(pl.col("b"))
).groupby("a").all())
shape: (2, 2)
┌─────┬───────────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ list[i64] │
╞═════╪═══════════╡
│ 1 ┆ [1] │
│ 2 ┆ [1] │
└─────┴───────────┘
英文:
The problem in the aggregation arises from comparing the group "B" with a literal 1 and replacing with the group "B". We haven't formalized the vectorization rules for that expression yet, hence the warning.
It is more explicit and (easier for use to understand), to apply your ternary expression in a with_columns and then do an aggregation:
df = pl.from_repr("""shape: (2, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 1 │
│ 2 ┆ 1 │
└─────┴─────┘
""")
(df.with_columns(
pl.when(pl.col("b") == 1).then(pl.col("b"))
).groupby("a").all())
shape: (2, 2)
┌─────┬───────────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ list[i64] │
╞═════╪═══════════╡
│ 1 ┆ [1] │
│ 2 ┆ [1] │
└─────┴───────────┘
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论