英文:
How to create a new column based on the common start word between two series in a Polars DataFrame?
问题
我有一个由两个系列组成的 Polars DataFrame,分别是 'foo' 和 'bar',它们包含整数列表。我想创建一个新列,如果 'foo' 系列的起始词(第一个元素)等于 'bar' 系列的起始词,则赋值为1,否则赋值为0。我正在使用 Polars,它似乎比 pandas 复杂一些。
这是我正在使用的示例 DataFrame:
dff = pl.DataFrame({"foo": [[1, 3, 3, 3], [3, 5, 3, 4], [4, 7, 5, 3]], "bar": [[3, 345, 3, 4], [3, 4, 334, 2], [4, 52, 4, 2]]})
DataFrame 的形状是:
shape: (3, 2)
┌─────────────┬───────────────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ list[i64] ┆ list[i64] │
╞═════════════╪═══════════════╡
│ [1, 3, … 3] ┆ [3, 345, … 4] │
│ [3, 5, … 4] ┆ [3, 4, … 2] │
│ [4, 7, … 3] ┆ [4, 52, … 2] │
└─────────────┴───────────────┘
我想创建一个名为 'common start' 的新列,反映 'foo' 的起始词是否与 'bar' 的起始词匹配。在这种情况下,'common start' 列的预期输出将是:[0, 1, 1]
。
shape: (3, 3)
┌─────────────┬───────────────┬──────────────┐
│ foo ┆ bar ┆ common_start │
│ --- ┆ --- ┆ --- │
│ list[i64] ┆ list[i64] ┆ i64 │
╞═════════════╪═══════════════╪══════════════╡
│ [1, 3, … 3] ┆ [3, 345, … 4] ┆ 0 │
│ [3, 5, … 4] ┆ [3, 4, … 2] ┆ 1 │
│ [4, 7, … 3] ┆ [4, 52, … 2] ┆ 1 │
└─────────────┴───────────────┴──────────────┘
我该如何实现这个目标?对于使用 Polars 的任何指导或示例,我将不胜感激。
英文:
I have a Polars DataFrame consisting of two series, 'foo' and 'bar', which contain lists of integers. I want to create a new column that assigns a value of 1 if the start word (first element) of the 'foo' series is equal to the start word of the 'bar' series, and 0 otherwise. I'm using Polars, which seems a bit more complex than pandas.
Here is the example DataFrame I'm working with:
dff = pl.DataFrame({"foo": [[1, 3, 3, 3], [3, 5, 3, 4], [4, 7, 5, 3]], "bar": [[3, 345, 3, 4], [3, 4, 334, 2], [4, 52, 4, 2]]})
The shape of the DataFrame is:
shape: (3, 2)
┌─────────────┬───────────────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ list[i64] ┆ list[i64] │
╞═════════════╪═══════════════╡
│ [1, 3, … 3] ┆ [3, 345, … 4] │
│ [3, 5, … 4] ┆ [3, 4, … 2] │
│ [4, 7, … 3] ┆ [4, 52, … 2] │
└─────────────┴───────────────┘
I would like to create a new column named 'common start' that reflects whether the start word of 'foo' matches the start word of 'bar'. The expected output for the 'common start' column in this case would be: [0, 1, 1]
.
shape: (3, 3)
┌─────────────┬───────────────┬──────────────┐
│ foo ┆ bar ┆ common_start │
│ --- ┆ --- ┆ --- │
│ list[i64] ┆ list[i64] ┆ i64 │
╞═════════════╪═══════════════╪══════════════╡
│ [1, 3, … 3] ┆ [3, 345, … 4] ┆ 0 │
│ [3, 5, … 4] ┆ [3, 4, … 2] ┆ 1 │
│ [4, 7, … 3] ┆ [4, 52, … 2] ┆ 1 │
└─────────────┴───────────────┴──────────────┘
How can I achieve this? Any guidance or examples using Polars would be greatly appreciated.
答案1
得分: 1
The .list
namespace contains the functionality for polars lists.
You can use .get(0)
or .first()
to access the first item.
df.with_columns(common_start =
pl.col('foo').list.first() == pl.col('bar').list.first()
)
shape: (3, 3)
┌─────────────┬───────────────┬──────────────┐
│ foo ┆ bar ┆ common_start │
│ --- ┆ --- ┆ --- │
│ list[i64] ┆ list[i64] ┆ bool │
╞═════════════╪═══════════════╪══════════════╡
│ [1, 3, … 3] ┆ [3, 345, … 4] ┆ false │
│ [3, 5, … 4] ┆ [3, 4, … 2] ┆ true │
│ [4, 7, … 3] ┆ [4, 52, … 2] ┆ true │
└─────────────┴───────────────┴──────────────┘
A common way to generate 0/1
is to cast a bool to an int:
df.with_columns(common_start =
(pl.col('foo').list.first() == pl.col('bar').list.first()).cast(int)
)
shape: (3, 3)
┌─────────────┬───────────────┬──────────────┐
│ foo ┆ bar ┆ common_start │
│ --- ┆ --- ┆ --- │
│ list[i64] ┆ list[i64] ┆ i64 │
╞═════════════╪═══════════════╪══════════════╡
│ [1, 3, … 3] ┆ [3, 345, … 4] ┆ 0 │
│ [3, 5, … 4] ┆ [3, 4, … 2] ┆ 1 │
│ [4, 7, … 3] ┆ [4, 52, … 2] ┆ 1 │
└─────────────┴───────────────┴──────────────┘
You can also use when/then/otherwise
to specify specific values:
df.with_columns(common_start =
pl.when(pl.col('foo').list.first() == pl.col('bar').list.first())
.then(1)
.otherwise(0)
)
英文:
The .list
namespace contains the functionality for polars lists.
You can use .get(0)
or .first()
to access the first item.
df.with_columns(common_start =
pl.col('foo').list.first() == pl.col('bar').list.first()
)
shape: (3, 3)
┌─────────────┬───────────────┬──────────────┐
│ foo ┆ bar ┆ common_start │
│ --- ┆ --- ┆ --- │
│ list[i64] ┆ list[i64] ┆ bool │
╞═════════════╪═══════════════╪══════════════╡
│ [1, 3, … 3] ┆ [3, 345, … 4] ┆ false │
│ [3, 5, … 4] ┆ [3, 4, … 2] ┆ true │
│ [4, 7, … 3] ┆ [4, 52, … 2] ┆ true │
└─────────────┴───────────────┴──────────────┘
A common way to generate 0/1
is to cast a bool to an int:
df.with_columns(common_start =
(pl.col('foo').list.first() == pl.col('bar').list.first()).cast(int)
)
shape: (3, 3)
┌─────────────┬───────────────┬──────────────┐
│ foo ┆ bar ┆ common_start │
│ --- ┆ --- ┆ --- │
│ list[i64] ┆ list[i64] ┆ i64 │
╞═════════════╪═══════════════╪══════════════╡
│ [1, 3, … 3] ┆ [3, 345, … 4] ┆ 0 │
│ [3, 5, … 4] ┆ [3, 4, … 2] ┆ 1 │
│ [4, 7, … 3] ┆ [4, 52, … 2] ┆ 1 │
└─────────────┴───────────────┴──────────────┘
You can also use when/then/otherwise
to specify specific values:
df.with_columns(common_start =
pl.when(pl.col('foo').list.first() == pl.col('bar').list.first())
.then(1)
.otherwise(0)
)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论