如何使用 “when”、”then” 和 “otherwise” 条件ally 替换 Polars 中的行值?

huangapple go评论58阅读模式
英文:

How can i use when, then, otherwise to conditonally replace row values in polars?

问题

我有一个具有三列的数据集将检查列 A 是否包含字符串的部分如果字符串部分匹配 foo则应将列 L 的值替换为列 G 的值如果不匹配则不应进行任何更改为此我尝试了以下方法

    df = pl.DataFrame(
            {
                "A": ["foo", "ham", "spam", "egg",],
                "L": ["A54", "A12", "B84", "C12"],
                "G": ["X34", "C84", "G96", "L6",],
            }
        )
    print(df)
    
    期望的输出
    
    形状(4, 3)
    ┌──────┬─────┬─────┐
     A     L    G   
     ---   ---  --- 
     str   str  str 
    ╞══════╪═════╪═════╡
     foo1  A54  X34 
     ham   A12  C84 
     foo2  B84  G96 
     egg   C12  L6  
    └──────┴─────┴─────┘

我尝试了这个

    df = df.with_columns(
                pl.when(
                    pl.col("A")
                    .str.contains("foo"))
                .then(pl.col("L"))
                .alias("G")
                .otherwise(pl.col("G"))
            )

但是这并不起作用有人能帮我吗
英文:

I have a data set with three columns. Column A is to be checked for containing parts of strings. If the string matches foo partwise, the value from L should be replaced by the value of column G. If not nothing should change. For this I have tried the following.

df = pl.DataFrame(
        {
            "A": ["foo", "ham", "spam", "egg",],
            "L": ["A54", "A12", "B84", "C12"],
            "G": ["X34", "C84", "G96", "L6",],
        }
    )
print(df)

shape: (4, 3)
┌──────┬─────┬─────┐
│ A    ┆ L   ┆ G   │
│ ---  ┆ --- ┆ --- │
│ str  ┆ str ┆ str │
╞══════╪═════╪═════╡
│ foo1 ┆ A54 ┆ X34 │
│ ham  ┆ A12 ┆ C84 │
│ foo2 ┆ B84 ┆ G96 │
│ egg  ┆ C12 ┆ L6  │
└──────┴─────┴─────┘

expected outcome

shape: (4, 3)
┌──────┬─────┬─────┐
│ A    ┆ L   ┆ G   │
│ ---  ┆ --- ┆ --- │
│ str  ┆ str ┆ str │
╞══════╪═════╪═════╡
│ foo1 ┆ X34 ┆ X34 │
│ ham  ┆ A12 ┆ C84 │
│ foo2 ┆ G96 ┆ G96 │
│ egg  ┆ C12 ┆ L6  │
└──────┴─────┴─────┘

I tried this

df = df.with_columns(
            pl.when(
                pl.col("A")
                .str.contains("foo"))
            .then(pl.col("L"))
            .alias("G")
            .otherwise(pl.col("G"))
        )

However, this does not work. Can someone help me with this?

答案1

得分: 2

  1. 别名始终放在末尾。
  2. 操作顺序通常将“别名”放在“否则”中,以便“然后”不会得到一个“别名”,从而导致意外结果。因此,我总是在“别名”之前用括号括起来,尽管这可能并不总是严格必要的。

因此,我认为您想要这样做:

df.with_columns(
    (pl.when(
        pl.col("A")
        .str.contains("foo"))
    .then(pl.col("G"))
    .otherwise(pl.col("L")))
    .alias("L")
)

还有一个新功能(我不确定哪个版本是第一个可以这样做的),但您也可以这样做:

df.with_columns(
    L=pl.when(
        pl.col("A")
        .str.contains("foo"))
    .then(pl.col("G"))
    .otherwise(pl.col("L"))
)
英文:

Two things.

  1. The alias always goes at the end.
  2. The order of operations often puts the alias in the otherwise so that the then doesn't get an alias leading to unexpected results. As a result I always wrap everything in parenthesis before an alias even though this may not always be strictly necessary

As a result I think you want this:

df.with_columns(
            (pl.when(
                pl.col("A")
                .str.contains("foo"))
            .then(pl.col("G"))
            .otherwise(pl.col("L")))
            .alias("L")
        )

There's a new feature (I'm not sure which version is the first that can do this) but you can do this too:

df.with_columns(
            L=pl.when(
                pl.col("A")
                .str.contains("foo"))
            .then(pl.col("G"))
            .otherwise(pl.col("L"))
        )

答案2

得分: 1

df = df.with_columns(
    pl.when(pl.col("A").str.contains("foo"))\
        .then(pl.col("G"))\
        .otherwise(pl.col("L")).alias("L")
)
英文:
df = df.with_columns(
    pl.when(pl.col("A").str.contains("foo"))\
        .then(pl.col("G"))\
        .otherwise(pl.col("L")).alias("L")
)

huangapple
  • 本文由 发表于 2023年2月24日 02:11:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/75548741.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定