创建一个新列,其中包含按另一列分组或汇总的唯一值列表。

huangapple go评论59阅读模式
英文:

Make a new column with list of unique values grouped by, or over, another column in Polars

问题

在 Polars 0.18.0 之前,我能够通过类型创建一个包含所有独特宝可梦的列。我看到 Expr.list() 已经重构为 implode(),但是我在使用新语法时遇到了困难,无法复制以下操作:

df.with_columns(lst_of_pokemon = pl.col('name').unique().list().over('Type 1'))

英文:

Before Polars 0.18.0, I was able to create a column with a list of all unique pokemon by type. I see Expr.list() has been refactored to implode(), but I'm having trouble replicating the following using the new syntax:

df.with_columns(lst_of_pokemon = pl.col('name').unique().list().over('Type 1'))

答案1

得分: 2

mapping_strategy= 参数已添加到 .over 中。

df = pl.from_repr("""
┌──────┬──────┐
 name  type 
 ---   ---  
 str   i64  
╞══════╪══════╡
 a     1    
 a     1    
 a     2    
 b     2    
 c     3    
 c     3    
└──────┴──────┘
""")

df.with_columns(lst_of_pokemon = 
   pl.col('name').unique().over('type', mapping_strategy='join')
)
shape: (6, 3)
┌──────┬──────┬────────────────┐
 name  type  lst_of_pokemon 
 ---   ---   ---            
 str   i64   list[str]      
╞══════╪══════╪════════════════╡
 a     1     ["a"]          
 a     1     ["a"]          
 a     2     ["a", "b"]     
 b     2     ["a", "b"]     
 c     3     ["c"]          
 c     3     ["c"]          
└──────┴──────┴────────────────┘
英文:

The mapping_strategy= argument for .over was added.

df = pl.from_repr("""
┌──────┬──────┐
│ name ┆ type │
│ ---  ┆ ---  │
│ str  ┆ i64  │
╞══════╪══════╡
│ a    ┆ 1    │
│ a    ┆ 1    │
│ a    ┆ 2    │
│ b    ┆ 2    │
│ c    ┆ 3    │
│ c    ┆ 3    │
└──────┴──────┘
""")

df.with_columns(lst_of_pokemon = 
   pl.col('name').unique().over('type', mapping_strategy='join')
)
shape: (6, 3)
┌──────┬──────┬────────────────┐
│ name ┆ type ┆ lst_of_pokemon │
│ ---  ┆ ---  ┆ ---            │
│ str  ┆ i64  ┆ list[str]      │
╞══════╪══════╪════════════════╡
│ a    ┆ 1    ┆ ["a"]          │
│ a    ┆ 1    ┆ ["a"]          │
│ a    ┆ 2    ┆ ["a", "b"]     │
│ b    ┆ 2    ┆ ["a", "b"]     │
│ c    ┆ 3    ┆ ["c"]          │
│ c    ┆ 3    ┆ ["c"]          │
└──────┴──────┴────────────────┘

答案2

得分: 1

编辑:jqurios发布的答案更加优雅

我认为以下代码可以生成所需的结果:

df.join(
    df.groupby("Type 1", maintain_order=True).agg(pl.col("Name").alias("combined_names")),
    on="Type 1"
)

使用groupby和agg,您可以创建按'Type 1'分组的所有宝可梦名称的列表,然后将其与原始数据框连接起来。

英文:

Edit: the answer posted by jqurios is way more elegant

In think the following produces the required result:

df.join(
    df.groupby("Type 1", maintain_order=True).agg(pl.col("Name").alias("combined_names")),
    on="Type 1"
)

With the groupby and agg you can create the lists with all the pokemon names per 'Type 1', which is joined to the original dataframe.

huangapple
  • 本文由 发表于 2023年6月8日 17:59:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76430697.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定