英文:
Polars aggregation warning using when->then
问题
Consider the following:
In [9]: df
Out[9]:
shape: (2, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 1 │
│ 2 ┆ 1 │
└─────┴─────┘
In [10]: df.groupby("a").agg(pl.when(pl.col("b") == 1).then(pl.col("b")))
The predicate '[(col("b")) == (1)]' in 'when->then->otherwise' is not a valid aggregation and might produce a different number of rows than the groupby operation would. This behavior is experimental and may be subject to change
Out[10]:
shape: (2, 2)
┌─────┬───────────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ list[i64] │
╞═════╪═══════════╡
│ 2 ┆ [1] │
│ 1 ┆ [1] │
└─────┴───────────┘
Is there something to worry about? The when->then has to produce a value even if it's null.
英文:
Consider the following:
In [9]: df
Out[9]:
shape: (2, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 1 │
│ 2 ┆ 1 │
└─────┴─────┘
In [10]: df.groupby("a").agg(pl.when(pl.col("b") == 1).then(pl.col("b")))
The predicate '[(col("b")) == (1)]' in 'when->then->otherwise' is not a valid aggregation and might produce a different number of rows than the groupby operation would. This behavior is experimental and may be subject to change
Out[10]:
shape: (2, 2)
┌─────┬───────────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ list[i64] │
╞═════╪═══════════╡
│ 2 ┆ [1] │
│ 1 ┆ [1] │
└─────┴───────────┘
Is there something to worry about? The when->then has to produce a value even if it's null.
答案1
得分: 1
以下是您要翻译的内容:
聚合中的问题在于将组 "B"
与文字 1
进行比较,并用组 "B"
替换。我们尚未为该表达式正式化向量化规则,因此会出现警告。
更明确且更易理解的方法是,在 with_columns
中应用您的三元表达式,然后进行聚合:
df = pl.from_repr("""shape: (2, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 1 │
│ 2 ┆ 1 │
└─────┴─────┘
""")
(df.with_columns(
pl.when(pl.col("b") == 1).then(pl.col("b"))
).groupby("a").all())
shape: (2, 2)
┌─────┬───────────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ list[i64] │
╞═════╪═══════════╡
│ 1 ┆ [1] │
│ 2 ┆ [1] │
└─────┴───────────┘
英文:
The problem in the aggregation arises from comparing the group "B"
with a literal 1
and replacing with the group "B"
. We haven't formalized the vectorization rules for that expression yet, hence the warning.
It is more explicit and (easier for use to understand), to apply your ternary expression in a with_columns
and then do an aggregation:
df = pl.from_repr("""shape: (2, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 1 │
│ 2 ┆ 1 │
└─────┴─────┘
""")
(df.with_columns(
pl.when(pl.col("b") == 1).then(pl.col("b"))
).groupby("a").all())
shape: (2, 2)
┌─────┬───────────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ list[i64] │
╞═════╪═══════════╡
│ 1 ┆ [1] │
│ 2 ┆ [1] │
└─────┴───────────┘
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论