2023年6月18日 18:29:20go评论79阅读模式

英文:

How to create a new column based on the common start word between two series in a Polars DataFrame?

问题

我有一个由两个系列组成的 Polars DataFrame，分别是 'foo' 和 'bar'，它们包含整数列表。我想创建一个新列，如果 'foo' 系列的起始词（第一个元素）等于 'bar' 系列的起始词，则赋值为1，否则赋值为0。我正在使用 Polars，它似乎比 pandas 复杂一些。

这是我正在使用的示例 DataFrame：

dff = pl.DataFrame({"foo": [[1, 3, 3, 3], [3, 5, 3, 4], [4, 7, 5, 3]], "bar": [[3, 345, 3, 4], [3, 4, 334, 2], [4, 52, 4, 2]]})

DataFrame 的形状是：

shape: (3, 2)
┌─────────────┬───────────────┐
│ foo         ┆ bar           │
│ ---         ┆ ---           │
│ list[i64]   ┆ list[i64]     │
╞═════════════╪═══════════════╡
│ [1, 3, … 3] ┆ [3, 345, … 4] │
│ [3, 5, … 4] ┆ [3, 4, … 2]   │
│ [4, 7, … 3] ┆ [4, 52, … 2]  │
└─────────────┴───────────────┘

我想创建一个名为 'common start' 的新列，反映 'foo' 的起始词是否与 'bar' 的起始词匹配。在这种情况下，'common start' 列的预期输出将是：[0, 1, 1]。

shape: (3, 3)
┌─────────────┬───────────────┬──────────────┐
│ foo         ┆ bar           ┆ common_start │
│ ---         ┆ ---           ┆ ---          │
│ list[i64]   ┆ list[i64]     ┆ i64          │
╞═════════════╪═══════════════╪══════════════╡
│ [1, 3, … 3] ┆ [3, 345, … 4] ┆ 0            │
│ [3, 5, … 4] ┆ [3, 4, … 2]   ┆ 1            │
│ [4, 7, … 3] ┆ [4, 52, … 2]  ┆ 1            │
└─────────────┴───────────────┴──────────────┘

我该如何实现这个目标？对于使用 Polars 的任何指导或示例，我将不胜感激。

英文:

I have a Polars DataFrame consisting of two series, 'foo' and 'bar', which contain lists of integers. I want to create a new column that assigns a value of 1 if the start word (first element) of the 'foo' series is equal to the start word of the 'bar' series, and 0 otherwise. I'm using Polars, which seems a bit more complex than pandas.

Here is the example DataFrame I'm working with:

dff = pl.DataFrame({&quot;foo&quot;: [[1, 3, 3, 3], [3, 5, 3, 4], [4, 7, 5, 3]], &quot;bar&quot;: [[3, 345, 3, 4], [3, 4, 334, 2], [4, 52, 4, 2]]})

The shape of the DataFrame is:

shape: (3, 2)
┌─────────────┬───────────────┐
│ foo         ┆ bar           │
│ ---         ┆ ---           │
│ list[i64]   ┆ list[i64]     │
╞═════════════╪═══════════════╡
│ [1, 3, … 3] ┆ [3, 345, … 4] │
│ [3, 5, … 4] ┆ [3, 4, … 2]   │
│ [4, 7, … 3] ┆ [4, 52, … 2]  │
└─────────────┴───────────────┘

I would like to create a new column named 'common start' that reflects whether the start word of 'foo' matches the start word of 'bar'. The expected output for the 'common start' column in this case would be: [0, 1, 1].

shape: (3, 3)
┌─────────────┬───────────────┬──────────────┐
│ foo         ┆ bar           ┆ common_start │
│ ---         ┆ ---           ┆ ---          │
│ list[i64]   ┆ list[i64]     ┆ i64          │
╞═════════════╪═══════════════╪══════════════╡
│ [1, 3, … 3] ┆ [3, 345, … 4] ┆ 0            │
│ [3, 5, … 4] ┆ [3, 4, … 2]   ┆ 1            │
│ [4, 7, … 3] ┆ [4, 52, … 2]  ┆ 1            │
└─────────────┴───────────────┴──────────────┘

How can I achieve this? Any guidance or examples using Polars would be greatly appreciated.

答案1

得分: 1

The .list namespace contains the functionality for polars lists.

You can use .get(0) or .first() to access the first item.

df.with_columns(common_start =
   pl.col('foo').list.first() == pl.col('bar').list.first()
)

shape: (3, 3)
┌─────────────┬───────────────┬──────────────┐
│ foo         ┆ bar           ┆ common_start │
│ ---         ┆ ---           ┆ ---          │
│ list[i64]   ┆ list[i64]     ┆ bool         │
╞═════════════╪═══════════════╪══════════════╡
│ [1, 3, … 3] ┆ [3, 345, … 4] ┆ false        │
│ [3, 5, … 4] ┆ [3, 4, … 2]   ┆ true         │
│ [4, 7, … 3] ┆ [4, 52, … 2]  ┆ true         │
└─────────────┴───────────────┴──────────────┘

A common way to generate 0/1 is to cast a bool to an int:

df.with_columns(common_start =
   (pl.col('foo').list.first() == pl.col('bar').list.first()).cast(int)
)

shape: (3, 3)
┌─────────────┬───────────────┬──────────────┐
│ foo         ┆ bar           ┆ common_start │
│ ---         ┆ ---           ┆ ---          │
│ list[i64]   ┆ list[i64]     ┆ i64          │
╞═════════════╪═══════════════╪══════════════╡
│ [1, 3, … 3] ┆ [3, 345, … 4] ┆ 0            │
│ [3, 5, … 4] ┆ [3, 4, … 2]   ┆ 1            │
│ [4, 7, … 3] ┆ [4, 52, … 2]  ┆ 1            │
└─────────────┴───────────────┴──────────────┘

You can also use when/then/otherwise to specify specific values:

df.with_columns(common_start = 
   pl.when(pl.col('foo').list.first() == pl.col('bar').list.first())
     .then(1)
     .otherwise(0)
)

英文:

The .list namespace contains the functionality for polars lists.

You can use .get(0) or .first() to access the first item.

df.with_columns(common_start =
   pl.col(&#39;foo&#39;).list.first() == pl.col(&#39;bar&#39;).list.first()
)

shape: (3, 3)
┌─────────────┬───────────────┬──────────────┐
│ foo         ┆ bar           ┆ common_start │
│ ---         ┆ ---           ┆ ---          │
│ list[i64]   ┆ list[i64]     ┆ bool         │
╞═════════════╪═══════════════╪══════════════╡
│ [1, 3, … 3] ┆ [3, 345, … 4] ┆ false        │
│ [3, 5, … 4] ┆ [3, 4, … 2]   ┆ true         │
│ [4, 7, … 3] ┆ [4, 52, … 2]  ┆ true         │
└─────────────┴───────────────┴──────────────┘

A common way to generate 0/1 is to cast a bool to an int:

df.with_columns(common_start =
   (pl.col(&#39;foo&#39;).list.first() == pl.col(&#39;bar&#39;).list.first()).cast(int)
)

shape: (3, 3)
┌─────────────┬───────────────┬──────────────┐
│ foo         ┆ bar           ┆ common_start │
│ ---         ┆ ---           ┆ ---          │
│ list[i64]   ┆ list[i64]     ┆ i64          │
╞═════════════╪═══════════════╪══════════════╡
│ [1, 3, … 3] ┆ [3, 345, … 4] ┆ 0            │
│ [3, 5, … 4] ┆ [3, 4, … 2]   ┆ 1            │
│ [4, 7, … 3] ┆ [4, 52, … 2]  ┆ 1            │
└─────────────┴───────────────┴──────────────┘

You can also use when/then/otherwise to specify specific values:

df.with_columns(common_start = 
   pl.when(pl.col(&#39;foo&#39;).list.first() == pl.col(&#39;bar&#39;).list.first())
     .then(1)
     .otherwise(0)
)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Polars DataFrame中基于两个系列之间的共同起始词创建一个新列？

问题

答案1

Organizing latitude and longitude into separate columns using Pandas and Geopy to geocode a list of addresses

Quarto: 放置 Python 表格

python pandas: 从一个单元格生成（三个）单元格

Pandas/Dask 读取 Parquet 文件时不区分大小写的列名

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论