英文:
How to add a new column in a Polars dataframe where the value is based on different conditions
问题
我有一个包含多个字段的数据框,我想根据这些字段添加一个额外的列作为标签。理想情况下,代码应该类似于这样:
df.with_columns(
[
pl.when(~pl.all(pl.col("temp").is_null()) & pl.all(pl.col("daily_temp").is_null())).then("sensor_1_" + pl.col("location")).alias("label"),
pl.when(pl.all(pl.col("temp").is_null()) & ~pl.all(pl.col("daily_temp").is_null())).then("sensor_2_" + pl.col("loc")).alias("label"),
]
)
条件不重要,但我希望能够像上面描述的那样使用多个条件来向同一列添加一个值。使用上述代码会引发重复列错误,有没有正确的方法来实现这个目标?
英文:
I have a dataframe that contains a number of fields and I would like to add an additional column as a label based on these fields. Ideally the code would look something like this:
df.with_columns(
[
pl.when(~pl.all(pl.col("temp").is_null()) & pl.all(pl.col("daily_temp").is_null())).then("sensor_1_" + pl.col("location")).alias("label"),
pl.when(pl.all(pl.col("temp").is_null()) & ~pl.all(pl.col("daily_temp").is_null())).then("sensor_2_" + pl.col("loc")).alias("label"),
]
)
The conditions are irrelevant, however I would like to be able to use multiple conditions as described above to add a value to the same column. Using the code as described will throw a duplicate column error, is there a way to do this properly?
答案1
得分: 1
如评论中所提到的,你可以使用when/then/when/then
进行链式操作。
代码应该如下所示:
df.with_columns(
(pl.when(~pl.all(pl.col("temp").is_null()) & pl.all(pl.col("daily_temp").is_null()))
.then("sensor_1_" + pl.col("location"))
.when(pl.all(pl.col("temp").is_null()) & ~pl.all(pl.col("daily_temp").is_null()))
.then("sensor_2_" + pl.col("loc"))).alias("label")
)
我不确定为什么你想以这种方式进行操作,但你可以使用coalesce
代替链式的when/then
。
代码如下所示:
df.with_columns(
pl.coalesce(
[
pl.when(~pl.all(pl.col("temp").is_null()) & pl.all(pl.col("daily_temp").is_null())).then("sensor_1_" + pl.col("location")),
pl.when(pl.all(pl.col("temp").is_null()) & ~pl.all(pl.col("daily_temp").is_null())).then("sensor_2_" + pl.col("loc")),
]
).alias("label")
)
coalesce
函数会寻找传递给它的第一个非空值,实现了与链式操作相同的效果。我不确定链式条件语句是否经过了优化,是否更高效。
英文:
As is mentioned in the comments, you can chain then when/then/when/then.
It should look like this:
df.with_columns(
(pl.when(~pl.all(pl.col("temp").is_null()) & pl.all(pl.col("daily_temp").is_null()))
.then("sensor_1_" + pl.col("location"))
.when(pl.all(pl.col("temp").is_null()) & ~pl.all(pl.col("daily_temp").is_null()))
.then("sensor_2_" + pl.col("loc"))).alias("label")
)
I'm not sure why you'd want to do it this way but you could use coalesce instead of chaining when/then
That would look like:
df.with_columns(
pl.coalesce(
[
pl.when(~pl.all(pl.col("temp").is_null()) & pl.all(pl.col("daily_temp").is_null())).then("sensor_1_" + pl.col("location")),
pl.when(pl.all(pl.col("temp").is_null()) & ~pl.all(pl.col("daily_temp").is_null())).then("sensor_2_" + pl.col("loc")),
]
).alias("label")
)
coalesce looks for the first not-null thing that is passed to it which accomplishes the same thing as chaining. I'm not sure if the chained conditional is optimized and more performant.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论