Combining multiple groups in Polars

huangapple go评论63阅读模式
英文:

Combining multiple groups in Polars

问题

我有一个像这样的数据框:

category year count
apple 2022 5
apple 2021 8
banana 2022 1
cold 2022 9
cold 2021 2
warm 2022 1
warm 2021 3

我需要根据预设的分组列表('fruit','temperature')对行进行分组,然后按年份进行聚合。最终的数据框将如下所示:

category year count
fruit 2022 6
fruit 2021 8
temp 2022 10
temp 2021 5

Category的值是字符串。我正在寻找任何可以使此工作的解决方案。实际的数据框要长得多,所以我希望使用类似带有分组的字典来进行聚合。

英文:

I have a dataframe like this:

category year count
apple 2022 5
apple 2021 8
banana 2022 1
cold 2022 9
cold 2021 2
warm 2022 1
warm 2021 3

I need to group the rows based on a pre-set list of groupings ('fruit', 'temperature') and then aggregate by year. The final DF would look like this:

category year count
fruit 2022 6
fruit 2021 8
temp 2022 10
temp 2021 5

The Category values are strings. I'm looking for any solution to make this work. The actual dataframe is quite a bit longer, so I'm hoping to use something like a dict with the groupings to aggregate.

答案1

得分: 1

我会将 category 列映射为标准化为 fruit/temperature,然后进行分组:

md = {"apple": "fruit", "banana": "fruit", "cold": "temp", "warm": "temp"}

df.with_columns(pl.col("category").map_dict(md)).groupby("category", "year").sum()
shape: (4, 3)
┌──────────┬──────┬───────┐
│ category ┆ year ┆ count │
│ ---      ┆ ---  ┆ ---   │
│ str      ┆ i64  ┆ i64   │
╞══════════╪══════╪═══════╡
│ fruit    ┆ 2022 ┆ 6     │
│ temp     ┆ 2022 ┆ 10    │
│ fruit    ┆ 2021 ┆ 8     │
│ temp     ┆ 2021 ┆ 5     │
└──────────┴──────┴───────┘

你也可以使用 when/then 链来标准化 category 列,但在更复杂的示例中,map_dict 会是更简洁的代码。

英文:

I would map_dict the category column to standardize that to fruit/temperature, then a groupby:

md = {"apple": "fruit", "banana": "fruit", "cold": "temp", "warm": "temp"}

df.with_columns(pl.col("category").map_dict(md)).groupby("category", "year").sum()
shape: (4, 3)
┌──────────┬──────┬───────┐
│ category ┆ year ┆ count │
│ ---      ┆ ---  ┆ ---   │
│ str      ┆ i64  ┆ i64   │
╞══════════╪══════╪═══════╡
│ fruit    ┆ 2022 ┆ 6     │
│ temp     ┆ 2022 ┆ 10    │
│ fruit    ┆ 2021 ┆ 8     │
│ temp     ┆ 2021 ┆ 5     │
└──────────┴──────┴───────┘

You could also do a when/then chain to standardize the category column, but in more complicated examples map_dict will be more concise code.

huangapple
  • 本文由 发表于 2023年5月23日 01:33:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/76308659.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定