英文:
Make a new column with list of unique values grouped by, or over, another column in Polars
问题
在 Polars 0.18.0 之前,我能够通过类型创建一个包含所有独特宝可梦的列。我看到 Expr.list() 已经重构为 implode(),但是我在使用新语法时遇到了困难,无法复制以下操作:
df.with_columns(lst_of_pokemon = pl.col('name').unique().list().over('Type 1'))
英文:
Before Polars 0.18.0, I was able to create a column with a list of all unique pokemon by type. I see Expr.list() has been refactored to implode(), but I'm having trouble replicating the following using the new syntax:
df.with_columns(lst_of_pokemon = pl.col('name').unique().list().over('Type 1'))
答案1
得分: 2
mapping_strategy=
参数已添加到 .over
中。
df = pl.from_repr("""
┌──────┬──────┐
│ name ┆ type │
│ --- ┆ --- │
│ str ┆ i64 │
╞══════╪══════╡
│ a ┆ 1 │
│ a ┆ 1 │
│ a ┆ 2 │
│ b ┆ 2 │
│ c ┆ 3 │
│ c ┆ 3 │
└──────┴──────┘
""")
df.with_columns(lst_of_pokemon =
pl.col('name').unique().over('type', mapping_strategy='join')
)
shape: (6, 3)
┌──────┬──────┬────────────────┐
│ name ┆ type ┆ lst_of_pokemon │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ list[str] │
╞══════╪══════╪════════════════╡
│ a ┆ 1 ┆ ["a"] │
│ a ┆ 1 ┆ ["a"] │
│ a ┆ 2 ┆ ["a", "b"] │
│ b ┆ 2 ┆ ["a", "b"] │
│ c ┆ 3 ┆ ["c"] │
│ c ┆ 3 ┆ ["c"] │
└──────┴──────┴────────────────┘
英文:
The mapping_strategy=
argument for .over
was added.
df = pl.from_repr("""
┌──────┬──────┐
│ name ┆ type │
│ --- ┆ --- │
│ str ┆ i64 │
╞══════╪══════╡
│ a ┆ 1 │
│ a ┆ 1 │
│ a ┆ 2 │
│ b ┆ 2 │
│ c ┆ 3 │
│ c ┆ 3 │
└──────┴──────┘
""")
df.with_columns(lst_of_pokemon =
pl.col('name').unique().over('type', mapping_strategy='join')
)
shape: (6, 3)
┌──────┬──────┬────────────────┐
│ name ┆ type ┆ lst_of_pokemon │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ list[str] │
╞══════╪══════╪════════════════╡
│ a ┆ 1 ┆ ["a"] │
│ a ┆ 1 ┆ ["a"] │
│ a ┆ 2 ┆ ["a", "b"] │
│ b ┆ 2 ┆ ["a", "b"] │
│ c ┆ 3 ┆ ["c"] │
│ c ┆ 3 ┆ ["c"] │
└──────┴──────┴────────────────┘
答案2
得分: 1
编辑:jqurios发布的答案更加优雅
我认为以下代码可以生成所需的结果:
df.join(
df.groupby("Type 1", maintain_order=True).agg(pl.col("Name").alias("combined_names")),
on="Type 1"
)
使用groupby和agg,您可以创建按'Type 1'分组的所有宝可梦名称的列表,然后将其与原始数据框连接起来。
英文:
Edit: the answer posted by jqurios is way more elegant
In think the following produces the required result:
df.join(
df.groupby("Type 1", maintain_order=True).agg(pl.col("Name").alias("combined_names")),
on="Type 1"
)
With the groupby and agg you can create the lists with all the pokemon names per 'Type 1', which is joined to the original dataframe.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论