英文:
Python-polars: Create row per unique value in a pl.DataFrame column, columns with another, and values with a third
问题
我有一个类似这样的 Polars DataFrame:
d = {'id': ['N/A', 'N/A', '1', '1', '2'], 'type': ['red', 'blue', 'yellow', 'green', 'yellow'], 'area': [0, 0, 3, 4, 5]}
dp = pl.DataFrame(d)
shape: (5, 3)
┌─────┬────────┬──────┐
│ id ┆ type ┆ area │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞═════╪════════╪══════╡
│ N/A ┆ red ┆ 0 │
│ N/A ┆ blue ┆ 0 │
│ 1 ┆ yellow ┆ 3 │
│ 1 ┆ green ┆ 4 │
│ 2 ┆ yellow ┆ 5 │
└─────┴────────┴──────┘
我想要进行一些类似于旋转或转置的操作,以便每一行都是一个 id(不包括 'N/A'),并且每种类型都有一列,其值为 area。如果没有给出值,应该为零。在这种情况下,结果应该如下所示:
red blue yellow green
'1' 0 0 3 4
'2' 0 0 5 0
在 Polars 中我该如何实现这个操作?我宁愿避免将整个 DataFrame 转换成 pandas。
英文:
I have a Polars DataFrame that looks like this:
d = {'id': ['N/A', 'N/A', '1', '1', '2'], 'type': ['red', 'blue', 'yellow', 'green', 'yellow'], 'area': [0, 0, 3, 4, 5]}
dp = pl.DataFrame(d)
shape: (5, 3)
┌─────┬────────┬──────┐
│ id ┆ type ┆ area │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞═════╪════════╪══════╡
│ N/A ┆ red ┆ 0 │
│ N/A ┆ blue ┆ 0 │
│ 1 ┆ yellow ┆ 3 │
│ 1 ┆ green ┆ 4 │
│ 2 ┆ yellow ┆ 5 │
└─────┴────────┴──────┘
I would like to do some sort of pivot or transpose operation so that each row is an id (excluding 'N/A') and there is a column for each type, and the value is the area. If no value is given, it should be zero. So in this case, the result should look like this:
red blue yellow green
'1' 0 0 3 4
'2' 0 0 5 0
How can I do this in Polars? I would rather avoid converting the whole thing into pandas.
答案1
得分: 2
在Polars中,您可以使用pivot操作来实现所需的结果。以下是如何在特定的DataFrame中执行此操作的示例代码:
import polars as pl
d = {
'id': ['N/A', 'N/A', '1', '1', '2'],
'type': ['red', 'blue', 'yellow', 'green', 'yellow'],
'area': [0, 0, 3, 4, 5]
}
dp = pl.DataFrame(d)
# 移除'id'列中包含'N/A'的行
dp = dp.filter(pl.col("id") != "N/A")
# 执行pivot操作
dp = dp.pivot('id', 'type', 'area', aggfn='first')
# 用0填充缺失值
dp = dp.fill_null(0)
print(dp)
输出结果如下:
shape: (2, 4)
┌─────┬──────┬───────┬──────┐
│ id ┆ blue ┆ green ┆ red │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i64 │
╞═════╪══════╪═══════╪══════╡
│ 1 ┆ 0 ┆ 4 ┆ 0 │
│ 2 ┆ 0 ┆ 0 ┆ 0 │
└─────┴──────┴───────┴──────┘
请注意,这段代码演示了如何在Polars中使用pivot操作将数据透视,并在需要时填充缺失值为0。
英文:
In Polars, you can achieve the desired result by using the pivot operation. Here's how you can do it for your specific DataFrame:
import polars as pl
d = {
'id': ['N/A', 'N/A', '1', '1', '2'],
'type': ['red', 'blue', 'yellow', 'green', 'yellow'],
'area': [0, 0, 3, 4, 5]
}
dp = pl.DataFrame(d)
# Remove rows with 'N/A' in the 'id' column
dp = dp.filter(pl.col("id") != "N/A")
# Perform the pivot operation
dp = dp.pivot('id', 'type', 'area', aggfn='first')
# Fill missing values with 0
dp = dp.fill_null(0)
print(dp)
Output:
shape: (2, 4)
┌─────┬──────┬───────┬──────┐
│ id ┆ blue ┆ green ┆ red │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i64 │
╞═════╪══════╪═══════╪══════╡
│ 1 ┆ 0 ┆ 4 ┆ 0 │
│ 2 ┆ 0 ┆ 0 ┆ 0 │
└─────┴──────┴───────┴──────┘
答案2
得分: 1
(df.pivot('area', 'id', 'type', None)
.filter(pl.col('id') != 'N/A')
)
英文:
(df.pivot('area', 'id', 'type', None)
.filter(pl.col('id') != 'N/A')
)
shape: (2, 5)
┌─────┬──────┬──────┬────────┬───────┐
│ id ┆ red ┆ blue ┆ yellow ┆ green │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪══════╪══════╪════════╪═══════╡
│ 1 ┆ null ┆ null ┆ 3 ┆ 4 │
│ 2 ┆ null ┆ null ┆ 5 ┆ null │
└─────┴──────┴──────┴────────┴───────┘
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论