在Polars数据框中,通过另一列类型为列表来筛选一个类型为列表的列。

huangapple go评论67阅读模式
英文:

In a polars dataframe, filter a column of type list by another column of type list

问题

以下是您要翻译的内容:

我有一个 [tag:polars] 数据帧如下:

示例输入:

import polars as pl
d = pl.DataFrame()
df = d.with_columns(pl.lit([1]).alias('user_id'), 
                    pl.lit([[1,2,3,4]]).alias('items'),
                    pl.lit([[3,4,5,6]]).alias('popular_items'))

┌─────────────┬─────────────┬───────────────┐
 user_id      items        popular_items 
 ---          ---          ---           
 i64          list[i64]    list[i64]     
╞═════════════╪═════════════╪═══════════════╡
 1           [1, 2, 3, 4]  [3, 4, 5, 6]  
└─────────────┴─────────────┴───────────────┘

我想要通过删除每个 user_id 中在 items 列中的任何项目来筛选 popular_items 列。

我一直在尝试让它工作,但由于各种问题一直没有成功。很可能是因为我过于复杂化了事情。

预期的输出应如下所示:

┌─────────────┬─────────────┬───────────────┬───────────┐
 user_id      items        popular_items  suggested 
 ---          ---          ---            ---       
 i64          list[i64]    list[i64]      list[i64] 
╞═════════════╪═════════════╪═══════════════╪═══════════╡
 1            [1, 2, 3, 4] [3, 4, 5, 6]   [5, 6]    
└─────────────┴─────────────┴───────────────┴───────────┘

看起来解决方案应该很简单,但在尝试了不同的方法后,似乎一直不成功。

任何帮助将不胜感激!

英文:

I have a [tag:polars] dataframe as below:

Example input:

import polars as pl
d = pl.DataFrame()
df = d.with_columns(pl.lit([1]).alias('user_id'), 
                    pl.lit([[1,2,3,4]]).alias('items'),
                    pl.lit([[3,4,5,6]]).alias('popular_items'))

┌─────────────┬─────────────┬───────────────┐
 user_id      items        popular_items 
 ---          ---          ---           
 i64          list[i64]    list[i64]     
╞═════════════╪═════════════╪═══════════════╡
 1           [1, 2, 3, 4]  [3, 4, 5, 6]  
└─────────────┴─────────────┴───────────────┘

I want to filter popular_items column by removing any items that are in items column for each user_id

I have been trying to get it to work but have been unsuccessful due to various issues. In all likelihood, I am probably overcomplicating things.

The expected output should be as follows:

┌─────────────┬─────────────┬───────────────┬───────────┐
 user_id      items        popular_items  suggested 
 ---          ---          ---            ---       
 i64          list[i64]    list[i64]      list[i64] 
╞═════════════╪═════════════╪═══════════════╪═══════════╡
 1            [1, 2, 3, 4] [3, 4, 5, 6]   [5, 6]    
└─────────────┴─────────────┴───────────────┴───────────┘

It seems like the solution should be simple, but it seems to escape me after some time now trying different things.

Any help would be greatly appreciated!

答案1

得分: 1

我明白了,以下是您提供的代码的中文翻译部分:

我不确定是否有一种简单的方法来做到这一点

您可以展开列表并使用`.is_in`来检查是否在其他列表中

(df.explode("popular_items")
.with_columns(is_in = pl.col("popular_items").is_in("items").is_not()))


形状:(4,4)
┌─────────┬─────────────┬───────────────┬───────┐
│ user_id ┆ items       ┆ popular_items ┆ is_in │
│ ---     ┆ ---         ┆ ---           ┆ ---   │
│ i64     ┆ list[i64]   ┆ i64           ┆ bool  │
╞═════════╪═════════════╪═══════════════╪═══════╡
│ 1       ┆ [1, 2, … 4] ┆ 3             ┆ false │
│ 1       ┆ [1, 2, … 4] ┆ 4             ┆ false │
│ 1       ┆ [1, 2, … 4] ┆ 5             ┆ true  │
│ 1       ┆ [1, 2, … 4] ┆ 6             ┆ true  │
└─────────┴─────────────┴───────────────┴───────┘

然后,您可以使用.filter筛选,转回列表,并进行.join操作。

df.join(
   (df.explode("popular_items")
      .filter(pl.col("popular_items").is_in("items").is_not())
      .groupby("user_id")
      .agg(suggested="popular_items")),
   on="user_id"
)

形状:(1,4)
┌─────────┬─────────────┬───────────────┬───────────┐
│ user_id ┆ items ┆ popular_items ┆ suggested │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ list[i64] ┆ list[i64] ┆ list[i64] │
╞═════════╪═════════════╪═══════════════╪═══════════╡
│ 1 ┆ [1, 2, … 4] ┆ [3, 4, … 6] ┆ [5, 6] │
└─────────┴─────────────┴───────────────┴───────────┘


也许更好的方法是展开两个列表并使用[反连接](https://pola-rs.github.io/polars-book/user-guide/transformations/joins/#anti-join)来计算“差异”。

difference = (
df.explode("popular_items")
.join(
df.explode("items"),
left_on=["user_id", "popular_items"],
right_on=["user_id", "items"],
how="anti"
)
.groupby("user_id", maintain_order=True)
.agg(suggested="popular_items")
)

df.join(difference, on="user_id")


希望这能帮助您理解这些代码的中文翻译部分。

<details>
<summary>英文:</summary>

I&#39;m not sure if there is a &quot;simple&quot; way to do this.

You can explode the list and use `.is_in` to check if it is in the other.

(df.explode("popular_items")
.with_columns(is_in = pl.col("popular_items").is_in("items").is_not()))

shape: (4, 4)
┌─────────┬─────────────┬───────────────┬───────┐
│ user_id ┆ items ┆ popular_items ┆ is_in │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ list[i64] ┆ i64 ┆ bool │
╞═════════╪═════════════╪═══════════════╪═══════╡
│ 1 ┆ [1, 2, … 4] ┆ 3 ┆ false │
│ 1 ┆ [1, 2, … 4] ┆ 4 ┆ false │
│ 1 ┆ [1, 2, … 4] ┆ 5 ┆ true │
│ 1 ┆ [1, 2, … 4] ┆ 6 ┆ true │
└─────────┴─────────────┴───────────────┴───────┘

Which you could `.filter`, turn back into a list, and `.join`.

df.join(
(df.explode("popular_items")
.filter(pl.col("popular_items").is_in("items").is_not())
.groupby("user_id")
.agg(suggested="popular_items")),
on="user_id"
)

shape: (1, 4)
┌─────────┬─────────────┬───────────────┬───────────┐
│ user_id ┆ items ┆ popular_items ┆ suggested │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ list[i64] ┆ list[i64] ┆ list[i64] │
╞═════════╪═════════════╪═══════════════╪═══════════╡
│ 1 ┆ [1, 2, … 4] ┆ [3, 4, … 6] ┆ [5, 6] │
└─────────┴─────────────┴───────────────┴───────────┘

Perhaps it&#39;s better to explode both lists and use an [Anti-join](https://pola-rs.github.io/polars-book/user-guide/transformations/joins/#anti-join) to compute the &quot;difference&quot;. 

difference = (
df.explode("popular_items")
.join(
df.explode("items"),
left_on=["user_id", "popular_items"],
right_on=["user_id", "items"],
how="anti"
)
.groupby("user_id", maintain_order=True)
.agg(suggested="popular_items")
)

df.join(difference, on="user_id")


</details>



huangapple
  • 本文由 发表于 2023年5月7日 21:29:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/76194228.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定