在Python Polars中替换一行。

huangapple go评论73阅读模式
英文:

Replace a row in python polars

问题

我想用单个值替换polars DataFrame中的一行:

import numpy as np
import polars as pl

df = np.zeros(shape=(4, 4))
df = pl.DataFrame(df)

例如,我想将索引为1的行中的所有值替换为1.0。

我在文档中寻找了一个直接的解决方案,但未找到一个。

英文:

I want to replace a row in a polars DataFrame with a single value:

import numpy as np
import polars as pl

df = np.zeros(shape=(4, 4))
df = pl.DataFrame(df)

For example I want to replace all values in row at index 1 with 1.0 .

I was looking for a straightforward solution in the documentation, but I couldn't find one.

答案1

得分: 4

以下是您要的代码部分的中文翻译:

在 Polars 中显式索引是一种反模式尽管如此使用 `with_row_count`可以通过在 `when/then` 表达式中使用额外列来创建一个包含替代行值的新 DataFrame最终结果中不会选择该列):

df.with_row_count().select(
    pl.when(pl.col("row_nr") == 1)
      .then(1)
      .otherwise(pl.col(c))
    .alias(c) for c in df.columns
)
形状(4, 4)
┌──────────┬──────────┬──────────┬──────────┐
 column_0  column_1  column_2  column_3 
 ---       ---       ---       ---      
 i32       i32       i32       i32      
╞══════════╪══════════╪══════════╪══════════╡
 0         0         0         0        
 1         1         1         1        
 0         0         0         0        
 0         0         0         0        
└──────────┴──────────┴──────────┴──────────┘

改进如下:

  • 最近还有一个 cumcount,在基本情况下充当行计数表达式,从而保持整个查询的惰性。
  • pl.all 可以用于消除上面的生成器推导式,结合 keep_name 来避免重复列错误。
df.select(
    pl.when(pl.all().cumcount() == 1)
      .then(1)
      .otherwise(pl.all())
    .keep_name()
)
形状(4, 4)
┌──────────┬──────────┬──────────┬──────────┐
 column_0  column_1  column_2  column_3 
 ---       ---       ---       ---      
 f64       f64       f64       f64      
╞══════════╪══════════╪══════════╪══════════╡
 0.0       0.0       0.0       0.0      
 1.0       1.0       1.0       1.0      
 0.0       0.0       0.0       0.0      
 0.0       0.0       0.0       0.0      
└──────────┴──────────┴──────────┴──────────┘

(可以从这里将结果转换为所需的数据类型)

英文:

It's an anti-pattern in Polars to explicitly index. That said, with a with_row_count column it is possible to make a new DataFrame with the replaced-by-row values, by using that extra column in a when/then expression (and not ultimately selecting it in the final result):

df.with_row_count().select(
    pl.when(pl.col("row_nr") == 1)
      .then(1)
      .otherwise(pl.col(c))
    .alias(c) for c in df.columns
)
shape: (4, 4)
┌──────────┬──────────┬──────────┬──────────┐
│ column_0 ┆ column_1 ┆ column_2 ┆ column_3 │
│ ---      ┆ ---      ┆ ---      ┆ ---      │
│ i32      ┆ i32      ┆ i32      ┆ i32      │
╞══════════╪══════════╪══════════╪══════════╡
│ 0        ┆ 0        ┆ 0        ┆ 0        │
│ 1        ┆ 1        ┆ 1        ┆ 1        │
│ 0        ┆ 0        ┆ 0        ┆ 0        │
│ 0        ┆ 0        ┆ 0        ┆ 0        │
└──────────┴──────────┴──────────┴──────────┘

EDIT: Two improvements:

  • There's also a fairly recent cumcount that acts as a row count
    expression in the base case, effectively. This keeps the whole query
    lazy.
  • pl.all can be used to get rid of the generator comprehension above, combined with a keep_name to avoid duplicate column errors.
df.select(
    pl.when(pl.all().cumcount() == 1)
      .then(1)
      .otherwise(pl.all())
    .keep_name()
)
shape: (4, 4)
┌──────────┬──────────┬──────────┬──────────┐
│ column_0 ┆ column_1 ┆ column_2 ┆ column_3 │
│ ---      ┆ ---      ┆ ---      ┆ ---      │
│ f64      ┆ f64      ┆ f64      ┆ f64      │
╞══════════╪══════════╪══════════╪══════════╡
│ 0.0      ┆ 0.0      ┆ 0.0      ┆ 0.0      │
│ 1.0      ┆ 1.0      ┆ 1.0      ┆ 1.0      │
│ 0.0      ┆ 0.0      ┆ 0.0      ┆ 0.0      │
│ 0.0      ┆ 0.0      ┆ 0.0      ┆ 0.0      │
└──────────┴──────────┴──────────┴──────────┘

(Can cast the result to whatever dtype from here)

huangapple
  • 本文由 发表于 2023年4月11日 00:33:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/75978877.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定