创建一个新列,其中包含按另一列分组或汇总的唯一值列表。

huangapple go评论87阅读模式
英文:

Make a new column with list of unique values grouped by, or over, another column in Polars

问题

在 Polars 0.18.0 之前,我能够通过类型创建一个包含所有独特宝可梦的列。我看到 Expr.list() 已经重构为 implode(),但是我在使用新语法时遇到了困难,无法复制以下操作:

df.with_columns(lst_of_pokemon = pl.col('name').unique().list().over('Type 1'))

英文:

Before Polars 0.18.0, I was able to create a column with a list of all unique pokemon by type. I see Expr.list() has been refactored to implode(), but I'm having trouble replicating the following using the new syntax:

df.with_columns(lst_of_pokemon = pl.col('name').unique().list().over('Type 1'))

答案1

得分: 2

mapping_strategy= 参数已添加到 .over 中。

  1. df = pl.from_repr("""
  2. ┌──────┬──────┐
  3. name type
  4. --- ---
  5. str i64
  6. ╞══════╪══════╡
  7. a 1
  8. a 1
  9. a 2
  10. b 2
  11. c 3
  12. c 3
  13. └──────┴──────┘
  14. """)
  15. df.with_columns(lst_of_pokemon =
  16. pl.col('name').unique().over('type', mapping_strategy='join')
  17. )
  1. shape: (6, 3)
  2. ┌──────┬──────┬────────────────┐
  3. name type lst_of_pokemon
  4. --- --- ---
  5. str i64 list[str]
  6. ╞══════╪══════╪════════════════╡
  7. a 1 ["a"]
  8. a 1 ["a"]
  9. a 2 ["a", "b"]
  10. b 2 ["a", "b"]
  11. c 3 ["c"]
  12. c 3 ["c"]
  13. └──────┴──────┴────────────────┘
英文:

The mapping_strategy= argument for .over was added.

  1. df = pl.from_repr("""
  2. ┌──────┬──────┐
  3. name type
  4. --- ---
  5. str i64
  6. ╞══════╪══════╡
  7. a 1
  8. a 1
  9. a 2
  10. b 2
  11. c 3
  12. c 3
  13. └──────┴──────┘
  14. """)
  15. df.with_columns(lst_of_pokemon =
  16. pl.col('name').unique().over('type', mapping_strategy='join')
  17. )
  1. shape: (6, 3)
  2. ┌──────┬──────┬────────────────┐
  3. name type lst_of_pokemon
  4. --- --- ---
  5. str i64 list[str]
  6. ╞══════╪══════╪════════════════╡
  7. a 1 ["a"]
  8. a 1 ["a"]
  9. a 2 ["a", "b"]
  10. b 2 ["a", "b"]
  11. c 3 ["c"]
  12. c 3 ["c"]
  13. └──────┴──────┴────────────────┘

答案2

得分: 1

编辑:jqurios发布的答案更加优雅

我认为以下代码可以生成所需的结果:

  1. df.join(
  2. df.groupby("Type 1", maintain_order=True).agg(pl.col("Name").alias("combined_names")),
  3. on="Type 1"
  4. )

使用groupby和agg,您可以创建按'Type 1'分组的所有宝可梦名称的列表,然后将其与原始数据框连接起来。

英文:

Edit: the answer posted by jqurios is way more elegant

In think the following produces the required result:

  1. df.join(
  2. df.groupby("Type 1", maintain_order=True).agg(pl.col("Name").alias("combined_names")),
  3. on="Type 1"
  4. )

With the groupby and agg you can create the lists with all the pokemon names per 'Type 1', which is joined to the original dataframe.

huangapple
  • 本文由 发表于 2023年6月8日 17:59:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76430697.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定