英文:
How can i use when, then, otherwise to conditonally replace row values in polars?
问题
我有一个具有三列的数据集。将检查列 A 是否包含字符串的部分。如果字符串部分匹配 foo,则应将列 L 的值替换为列 G 的值。如果不匹配,则不应进行任何更改。为此,我尝试了以下方法。
df = pl.DataFrame(
{
"A": ["foo", "ham", "spam", "egg",],
"L": ["A54", "A12", "B84", "C12"],
"G": ["X34", "C84", "G96", "L6",],
}
)
print(df)
期望的输出
形状:(4, 3)
┌──────┬─────┬─────┐
│ A ┆ L ┆ G │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞══════╪═════╪═════╡
│ foo1 ┆ A54 ┆ X34 │
│ ham ┆ A12 ┆ C84 │
│ foo2 ┆ B84 ┆ G96 │
│ egg ┆ C12 ┆ L6 │
└──────┴─────┴─────┘
我尝试了这个
df = df.with_columns(
pl.when(
pl.col("A")
.str.contains("foo"))
.then(pl.col("L"))
.alias("G")
.otherwise(pl.col("G"))
)
但是,这并不起作用。有人能帮我吗?
英文:
I have a data set with three columns. Column A is to be checked for containing parts of strings. If the string matches foo partwise, the value from L should be replaced by the value of column G. If not nothing should change. For this I have tried the following.
df = pl.DataFrame(
{
"A": ["foo", "ham", "spam", "egg",],
"L": ["A54", "A12", "B84", "C12"],
"G": ["X34", "C84", "G96", "L6",],
}
)
print(df)
shape: (4, 3)
┌──────┬─────┬─────┐
│ A ┆ L ┆ G │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞══════╪═════╪═════╡
│ foo1 ┆ A54 ┆ X34 │
│ ham ┆ A12 ┆ C84 │
│ foo2 ┆ B84 ┆ G96 │
│ egg ┆ C12 ┆ L6 │
└──────┴─────┴─────┘
expected outcome
shape: (4, 3)
┌──────┬─────┬─────┐
│ A ┆ L ┆ G │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞══════╪═════╪═════╡
│ foo1 ┆ X34 ┆ X34 │
│ ham ┆ A12 ┆ C84 │
│ foo2 ┆ G96 ┆ G96 │
│ egg ┆ C12 ┆ L6 │
└──────┴─────┴─────┘
I tried this
df = df.with_columns(
pl.when(
pl.col("A")
.str.contains("foo"))
.then(pl.col("L"))
.alias("G")
.otherwise(pl.col("G"))
)
However, this does not work. Can someone help me with this?
答案1
得分: 2
- 别名始终放在末尾。
- 操作顺序通常将“别名”放在“否则”中,以便“然后”不会得到一个“别名”,从而导致意外结果。因此,我总是在“别名”之前用括号括起来,尽管这可能并不总是严格必要的。
因此,我认为您想要这样做:
df.with_columns(
(pl.when(
pl.col("A")
.str.contains("foo"))
.then(pl.col("G"))
.otherwise(pl.col("L")))
.alias("L")
)
还有一个新功能(我不确定哪个版本是第一个可以这样做的),但您也可以这样做:
df.with_columns(
L=pl.when(
pl.col("A")
.str.contains("foo"))
.then(pl.col("G"))
.otherwise(pl.col("L"))
)
英文:
Two things.
- The alias always goes at the end.
- The order of operations often puts the
alias
in theotherwise
so that thethen
doesn't get analias
leading to unexpected results. As a result I always wrap everything in parenthesis before analias
even though this may not always be strictly necessary
As a result I think you want this:
df.with_columns(
(pl.when(
pl.col("A")
.str.contains("foo"))
.then(pl.col("G"))
.otherwise(pl.col("L")))
.alias("L")
)
There's a new feature (I'm not sure which version is the first that can do this) but you can do this too:
df.with_columns(
L=pl.when(
pl.col("A")
.str.contains("foo"))
.then(pl.col("G"))
.otherwise(pl.col("L"))
)
答案2
得分: 1
df = df.with_columns(
pl.when(pl.col("A").str.contains("foo"))\
.then(pl.col("G"))\
.otherwise(pl.col("L")).alias("L")
)
英文:
df = df.with_columns(
pl.when(pl.col("A").str.contains("foo"))\
.then(pl.col("G"))\
.otherwise(pl.col("L")).alias("L")
)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论