英文:
Combining multiple groups in Polars
问题
我有一个像这样的数据框:
category | year | count |
---|---|---|
apple | 2022 | 5 |
apple | 2021 | 8 |
banana | 2022 | 1 |
cold | 2022 | 9 |
cold | 2021 | 2 |
warm | 2022 | 1 |
warm | 2021 | 3 |
我需要根据预设的分组列表('fruit','temperature')对行进行分组,然后按年份进行聚合。最终的数据框将如下所示:
category | year | count |
---|---|---|
fruit | 2022 | 6 |
fruit | 2021 | 8 |
temp | 2022 | 10 |
temp | 2021 | 5 |
Category的值是字符串。我正在寻找任何可以使此工作的解决方案。实际的数据框要长得多,所以我希望使用类似带有分组的字典来进行聚合。
英文:
I have a dataframe like this:
category | year | count |
---|---|---|
apple | 2022 | 5 |
apple | 2021 | 8 |
banana | 2022 | 1 |
cold | 2022 | 9 |
cold | 2021 | 2 |
warm | 2022 | 1 |
warm | 2021 | 3 |
I need to group the rows based on a pre-set list of groupings ('fruit', 'temperature') and then aggregate by year. The final DF would look like this:
category | year | count |
---|---|---|
fruit | 2022 | 6 |
fruit | 2021 | 8 |
temp | 2022 | 10 |
temp | 2021 | 5 |
The Category values are strings. I'm looking for any solution to make this work. The actual dataframe is quite a bit longer, so I'm hoping to use something like a dict with the groupings to aggregate.
答案1
得分: 1
我会将 category
列映射为标准化为 fruit/temperature,然后进行分组:
md = {"apple": "fruit", "banana": "fruit", "cold": "temp", "warm": "temp"}
df.with_columns(pl.col("category").map_dict(md)).groupby("category", "year").sum()
shape: (4, 3)
┌──────────┬──────┬───────┐
│ category ┆ year ┆ count │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞══════════╪══════╪═══════╡
│ fruit ┆ 2022 ┆ 6 │
│ temp ┆ 2022 ┆ 10 │
│ fruit ┆ 2021 ┆ 8 │
│ temp ┆ 2021 ┆ 5 │
└──────────┴──────┴───────┘
你也可以使用 when/then 链来标准化 category
列,但在更复杂的示例中,map_dict
会是更简洁的代码。
英文:
I would map_dict
the category
column to standardize that to fruit/temperature, then a groupby
:
md = {"apple": "fruit", "banana": "fruit", "cold": "temp", "warm": "temp"}
df.with_columns(pl.col("category").map_dict(md)).groupby("category", "year").sum()
shape: (4, 3)
┌──────────┬──────┬───────┐
│ category ┆ year ┆ count │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞══════════╪══════╪═══════╡
│ fruit ┆ 2022 ┆ 6 │
│ temp ┆ 2022 ┆ 10 │
│ fruit ┆ 2021 ┆ 8 │
│ temp ┆ 2021 ┆ 5 │
└──────────┴──────┴───────┘
You could also do a when/then chain to standardize the category
column, but in more complicated examples map_dict
will be more concise code.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论