如何在Polars DataFrame中基于两个系列之间的共同起始词创建一个新列?

huangapple go评论79阅读模式
英文:

How to create a new column based on the common start word between two series in a Polars DataFrame?

问题

我有一个由两个系列组成的 Polars DataFrame,分别是 'foo' 和 'bar',它们包含整数列表。我想创建一个新列,如果 'foo' 系列的起始词(第一个元素)等于 'bar' 系列的起始词,则赋值为1,否则赋值为0。我正在使用 Polars,它似乎比 pandas 复杂一些。

这是我正在使用的示例 DataFrame:

dff = pl.DataFrame({"foo": [[1, 3, 3, 3], [3, 5, 3, 4], [4, 7, 5, 3]], "bar": [[3, 345, 3, 4], [3, 4, 334, 2], [4, 52, 4, 2]]})

DataFrame 的形状是:

shape: (3, 2)
┌─────────────┬───────────────┐
│ foo         ┆ bar           │
│ ---         ┆ ---           │
│ list[i64]   ┆ list[i64]     │
╞═════════════╪═══════════════╡
│ [1, 3, … 3] ┆ [3, 345, … 4] │
│ [3, 5, … 4] ┆ [3, 4, … 2]   │
│ [4, 7, … 3] ┆ [4, 52, … 2]  │
└─────────────┴───────────────┘

我想创建一个名为 'common start' 的新列,反映 'foo' 的起始词是否与 'bar' 的起始词匹配。在这种情况下,'common start' 列的预期输出将是:[0, 1, 1]

shape: (3, 3)
┌─────────────┬───────────────┬──────────────┐
│ foo         ┆ bar           ┆ common_start │
│ ---         ┆ ---           ┆ ---          │
│ list[i64]   ┆ list[i64]     ┆ i64          │
╞═════════════╪═══════════════╪══════════════╡
│ [1, 3, … 3] ┆ [3, 345, … 4] ┆ 0            │
│ [3, 5, … 4] ┆ [3, 4, … 2]   ┆ 1            │
│ [4, 7, … 3] ┆ [4, 52, … 2]  ┆ 1            │
└─────────────┴───────────────┴──────────────┘

我该如何实现这个目标?对于使用 Polars 的任何指导或示例,我将不胜感激。

英文:

I have a Polars DataFrame consisting of two series, 'foo' and 'bar', which contain lists of integers. I want to create a new column that assigns a value of 1 if the start word (first element) of the 'foo' series is equal to the start word of the 'bar' series, and 0 otherwise. I'm using Polars, which seems a bit more complex than pandas.

Here is the example DataFrame I'm working with:

dff = pl.DataFrame({"foo": [[1, 3, 3, 3], [3, 5, 3, 4], [4, 7, 5, 3]], "bar": [[3, 345, 3, 4], [3, 4, 334, 2], [4, 52, 4, 2]]})

The shape of the DataFrame is:

shape: (3, 2)
┌─────────────┬───────────────┐
│ foo         ┆ bar           │
│ ---         ┆ ---           │
│ list[i64]   ┆ list[i64]     │
╞═════════════╪═══════════════╡
│ [1, 3, … 3] ┆ [3, 345, … 4] │
│ [3, 5, … 4] ┆ [3, 4, … 2]   │
│ [4, 7, … 3] ┆ [4, 52, … 2]  │
└─────────────┴───────────────┘

I would like to create a new column named 'common start' that reflects whether the start word of 'foo' matches the start word of 'bar'. The expected output for the 'common start' column in this case would be: [0, 1, 1].

shape: (3, 3)
┌─────────────┬───────────────┬──────────────┐
│ foo         ┆ bar           ┆ common_start │
│ ---         ┆ ---           ┆ ---          │
│ list[i64]   ┆ list[i64]     ┆ i64          │
╞═════════════╪═══════════════╪══════════════╡
│ [1, 3, … 3] ┆ [3, 345, … 4] ┆ 0            │
│ [3, 5, … 4] ┆ [3, 4, … 2]   ┆ 1            │
│ [4, 7, … 3] ┆ [4, 52, … 2]  ┆ 1            │
└─────────────┴───────────────┴──────────────┘

How can I achieve this? Any guidance or examples using Polars would be greatly appreciated.

答案1

得分: 1

The .list namespace contains the functionality for polars lists.

You can use .get(0) or .first() to access the first item.

df.with_columns(common_start =
   pl.col('foo').list.first() == pl.col('bar').list.first()
)
shape: (3, 3)
┌─────────────┬───────────────┬──────────────┐
 foo          bar            common_start 
 ---          ---            ---          
 list[i64]    list[i64]      bool         
╞═════════════╪═══════════════╪══════════════╡
 [1, 3,  3]  [3, 345,  4]  false        
 [3, 5,  4]  [3, 4,  2]    true         
 [4, 7,  3]  [4, 52,  2]   true         
└─────────────┴───────────────┴──────────────┘

A common way to generate 0/1 is to cast a bool to an int:

df.with_columns(common_start =
   (pl.col('foo').list.first() == pl.col('bar').list.first()).cast(int)
)
shape: (3, 3)
┌─────────────┬───────────────┬──────────────┐
 foo          bar            common_start 
 ---          ---            ---          
 list[i64]    list[i64]      i64          
╞═════════════╪═══════════════╪══════════════╡
 [1, 3,  3]  [3, 345,  4]  0            
 [3, 5,  4]  [3, 4,  2]    1            
 [4, 7,  3]  [4, 52,  2]   1            
└─────────────┴───────────────┴──────────────┘

You can also use when/then/otherwise to specify specific values:

df.with_columns(common_start = 
   pl.when(pl.col('foo').list.first() == pl.col('bar').list.first())
     .then(1)
     .otherwise(0)
)
英文:

The .list namespace contains the functionality for polars lists.

You can use .get(0) or .first() to access the first item.

df.with_columns(common_start =
   pl.col('foo').list.first() == pl.col('bar').list.first()
)
shape: (3, 3)
┌─────────────┬───────────────┬──────────────┐
│ foo         ┆ bar           ┆ common_start │
│ ---         ┆ ---           ┆ ---          │
│ list[i64]   ┆ list[i64]     ┆ bool         │
╞═════════════╪═══════════════╪══════════════╡
│ [1, 3, … 3] ┆ [3, 345, … 4] ┆ false        │
│ [3, 5, … 4] ┆ [3, 4, … 2]   ┆ true         │
│ [4, 7, … 3] ┆ [4, 52, … 2]  ┆ true         │
└─────────────┴───────────────┴──────────────┘

A common way to generate 0/1 is to cast a bool to an int:

df.with_columns(common_start =
   (pl.col('foo').list.first() == pl.col('bar').list.first()).cast(int)
)
shape: (3, 3)
┌─────────────┬───────────────┬──────────────┐
│ foo         ┆ bar           ┆ common_start │
│ ---         ┆ ---           ┆ ---          │
│ list[i64]   ┆ list[i64]     ┆ i64          │
╞═════════════╪═══════════════╪══════════════╡
│ [1, 3, … 3] ┆ [3, 345, … 4] ┆ 0            │
│ [3, 5, … 4] ┆ [3, 4, … 2]   ┆ 1            │
│ [4, 7, … 3] ┆ [4, 52, … 2]  ┆ 1            │
└─────────────┴───────────────┴──────────────┘

You can also use when/then/otherwise to specify specific values:

df.with_columns(common_start = 
   pl.when(pl.col('foo').list.first() == pl.col('bar').list.first())
     .then(1)
     .otherwise(0)
)

huangapple
  • 本文由 发表于 2023年6月18日 18:29:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76500077.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定