英文:
Replace a row in python polars
问题
我想用单个值替换polars DataFrame中的一行:
import numpy as np
import polars as pl
df = np.zeros(shape=(4, 4))
df = pl.DataFrame(df)
例如,我想将索引为1的行中的所有值替换为1.0。
我在文档中寻找了一个直接的解决方案,但未找到一个。
英文:
I want to replace a row in a polars DataFrame with a single value:
import numpy as np
import polars as pl
df = np.zeros(shape=(4, 4))
df = pl.DataFrame(df)
For example I want to replace all values in row at index 1 with 1.0 .
I was looking for a straightforward solution in the documentation, but I couldn't find one.
答案1
得分: 4
以下是您要的代码部分的中文翻译:
在 Polars 中,显式索引是一种反模式。尽管如此,使用 `with_row_count` 列,可以通过在 `when/then` 表达式中使用额外列来创建一个包含替代行值的新 DataFrame(最终结果中不会选择该列):
df.with_row_count().select(
pl.when(pl.col("row_nr") == 1)
.then(1)
.otherwise(pl.col(c))
.alias(c) for c in df.columns
)
形状:(4, 4)
┌──────────┬──────────┬──────────┬──────────┐
│ column_0 ┆ column_1 ┆ column_2 ┆ column_3 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ i32 ┆ i32 ┆ i32 │
╞══════════╪══════════╪══════════╪══════════╡
│ 0 ┆ 0 ┆ 0 ┆ 0 │
│ 1 ┆ 1 ┆ 1 ┆ 1 │
│ 0 ┆ 0 ┆ 0 ┆ 0 │
│ 0 ┆ 0 ┆ 0 ┆ 0 │
└──────────┴──────────┴──────────┴──────────┘
改进如下:
- 最近还有一个
cumcount
,在基本情况下充当行计数表达式,从而保持整个查询的惰性。 pl.all
可以用于消除上面的生成器推导式,结合keep_name
来避免重复列错误。
df.select(
pl.when(pl.all().cumcount() == 1)
.then(1)
.otherwise(pl.all())
.keep_name()
)
形状:(4, 4)
┌──────────┬──────────┬──────────┬──────────┐
│ column_0 ┆ column_1 ┆ column_2 ┆ column_3 │
│ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ f64 │
╞══════════╪══════════╪══════════╪══════════╡
│ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 │
│ 1.0 ┆ 1.0 ┆ 1.0 ┆ 1.0 │
│ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 │
│ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 │
└──────────┴──────────┴──────────┴──────────┘
(可以从这里将结果转换为所需的数据类型)
英文:
It's an anti-pattern in Polars to explicitly index. That said, with a with_row_count
column it is possible to make a new DataFrame with the replaced-by-row values, by using that extra column in a when/then
expression (and not ultimately select
ing it in the final result):
df.with_row_count().select(
pl.when(pl.col("row_nr") == 1)
.then(1)
.otherwise(pl.col(c))
.alias(c) for c in df.columns
)
shape: (4, 4)
┌──────────┬──────────┬──────────┬──────────┐
│ column_0 ┆ column_1 ┆ column_2 ┆ column_3 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ i32 ┆ i32 ┆ i32 │
╞══════════╪══════════╪══════════╪══════════╡
│ 0 ┆ 0 ┆ 0 ┆ 0 │
│ 1 ┆ 1 ┆ 1 ┆ 1 │
│ 0 ┆ 0 ┆ 0 ┆ 0 │
│ 0 ┆ 0 ┆ 0 ┆ 0 │
└──────────┴──────────┴──────────┴──────────┘
EDIT: Two improvements:
- There's also a fairly recent
cumcount
that acts as a row count
expression in the base case, effectively. This keeps the whole query
lazy. pl.all
can be used to get rid of the generator comprehension above, combined with akeep_name
to avoid duplicate column errors.
df.select(
pl.when(pl.all().cumcount() == 1)
.then(1)
.otherwise(pl.all())
.keep_name()
)
shape: (4, 4)
┌──────────┬──────────┬──────────┬──────────┐
│ column_0 ┆ column_1 ┆ column_2 ┆ column_3 │
│ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ f64 │
╞══════════╪══════════╪══════════╪══════════╡
│ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 │
│ 1.0 ┆ 1.0 ┆ 1.0 ┆ 1.0 │
│ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 │
│ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 │
└──────────┴──────────┴──────────┴──────────┘
(Can cast the result to whatever dtype from here)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论