如何在 Polars 数据框中添加一个新列,其值基于不同的条件?

huangapple go评论95阅读模式
英文:

How to add a new column in a Polars dataframe where the value is based on different conditions

问题

我有一个包含多个字段的数据框,我想根据这些字段添加一个额外的列作为标签。理想情况下,代码应该类似于这样:

df.with_columns(
    [
        pl.when(~pl.all(pl.col("temp").is_null()) & pl.all(pl.col("daily_temp").is_null())).then("sensor_1_" + pl.col("location")).alias("label"),
        pl.when(pl.all(pl.col("temp").is_null()) & ~pl.all(pl.col("daily_temp").is_null())).then("sensor_2_" + pl.col("loc")).alias("label"),
    ]
)

条件不重要,但我希望能够像上面描述的那样使用多个条件来向同一列添加一个值。使用上述代码会引发重复列错误,有没有正确的方法来实现这个目标?

英文:

I have a dataframe that contains a number of fields and I would like to add an additional column as a label based on these fields. Ideally the code would look something like this:

df.with_columns(
    [
        pl.when(~pl.all(pl.col("temp").is_null()) & pl.all(pl.col("daily_temp").is_null())).then("sensor_1_" + pl.col("location")).alias("label"),
        pl.when(pl.all(pl.col("temp").is_null()) & ~pl.all(pl.col("daily_temp").is_null())).then("sensor_2_" + pl.col("loc")).alias("label"),
    ]
)

The conditions are irrelevant, however I would like to be able to use multiple conditions as described above to add a value to the same column. Using the code as described will throw a duplicate column error, is there a way to do this properly?

答案1

得分: 1

如评论中所提到的,你可以使用when/then/when/then进行链式操作。

代码应该如下所示:

df.with_columns(
    (pl.when(~pl.all(pl.col("temp").is_null()) & pl.all(pl.col("daily_temp").is_null()))
      .then("sensor_1_" + pl.col("location"))
      .when(pl.all(pl.col("temp").is_null()) & ~pl.all(pl.col("daily_temp").is_null()))
      .then("sensor_2_" + pl.col("loc"))).alias("label")
)

我不确定为什么你想以这种方式进行操作,但你可以使用coalesce代替链式的when/then

代码如下所示:

df.with_columns(
    pl.coalesce(
    [
        pl.when(~pl.all(pl.col("temp").is_null()) & pl.all(pl.col("daily_temp").is_null())).then("sensor_1_" + pl.col("location")),
        pl.when(pl.all(pl.col("temp").is_null()) & ~pl.all(pl.col("daily_temp").is_null())).then("sensor_2_" + pl.col("loc")),
    ]
    ).alias("label")
)

coalesce函数会寻找传递给它的第一个非空值,实现了与链式操作相同的效果。我不确定链式条件语句是否经过了优化,是否更高效。

英文:

As is mentioned in the comments, you can chain then when/then/when/then.

It should look like this:

df.with_columns(
        (pl.when(~pl.all(pl.col("temp").is_null()) & pl.all(pl.col("daily_temp").is_null()))
          .then("sensor_1_" + pl.col("location"))
          .when(pl.all(pl.col("temp").is_null()) & ~pl.all(pl.col("daily_temp").is_null()))
          .then("sensor_2_" + pl.col("loc"))).alias("label")
)

I'm not sure why you'd want to do it this way but you could use coalesce instead of chaining when/then

That would look like:

df.with_columns(
    pl.coalesce(
    [
        pl.when(~pl.all(pl.col("temp").is_null()) & pl.all(pl.col("daily_temp").is_null())).then("sensor_1_" + pl.col("location")),
        pl.when(pl.all(pl.col("temp").is_null()) & ~pl.all(pl.col("daily_temp").is_null())).then("sensor_2_" + pl.col("loc")),
    ]
    ).alias("label")
)

coalesce looks for the first not-null thing that is passed to it which accomplishes the same thing as chaining. I'm not sure if the chained conditional is optimized and more performant.

huangapple
  • 本文由 发表于 2023年8月9日 01:20:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/76861863.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定