Python-polars: Create row per unique value in a pl.DataFrame column, columns with another, and values with a third

huangapple go评论65阅读模式
英文:

Python-polars: Create row per unique value in a pl.DataFrame column, columns with another, and values with a third

问题

我有一个类似这样的 Polars DataFrame

    d = {'id': ['N/A', 'N/A', '1', '1', '2'], 'type': ['red', 'blue', 'yellow', 'green', 'yellow'], 'area': [0, 0, 3, 4, 5]}
    dp = pl.DataFrame(d)
    shape: (5, 3)
    ┌─────┬────────┬──────┐
     id   type    area 
     ---  ---     ---  
     str  str     i64  
    ╞═════╪════════╪══════╡
     N/A  red     0    
     N/A  blue    0    
     1    yellow  3    
     1    green   4    
     2    yellow  5    
    └─────┴────────┴──────┘

我想要进行一些类似于旋转或转置的操作以便每一行都是一个 id不包括 'N/A'),并且每种类型都有一列其值为 area如果没有给出值应该为零在这种情况下结果应该如下所示

          red   blue  yellow  green
    '1'    0      0     3      4
    '2'    0      0     5      0

在 Polars 中我该如何实现这个操作我宁愿避免将整个 DataFrame 转换成 pandas
英文:

I have a Polars DataFrame that looks like this:

d = {'id': ['N/A', 'N/A', '1', '1', '2'], 'type': ['red', 'blue', 'yellow', 'green', 'yellow'], 'area': [0, 0, 3, 4, 5]}
dp = pl.DataFrame(d)
shape: (5, 3)
┌─────┬────────┬──────┐
│ id  ┆ type   ┆ area │
│ --- ┆ ---    ┆ ---  │
│ str ┆ str    ┆ i64  │
╞═════╪════════╪══════╡
│ N/A ┆ red    ┆ 0    │
│ N/A ┆ blue   ┆ 0    │
│ 1   ┆ yellow ┆ 3    │
│ 1   ┆ green  ┆ 4    │
│ 2   ┆ yellow ┆ 5    │
└─────┴────────┴──────┘

I would like to do some sort of pivot or transpose operation so that each row is an id (excluding 'N/A') and there is a column for each type, and the value is the area. If no value is given, it should be zero. So in this case, the result should look like this:

      red   blue  yellow  green
'1'    0      0     3      4
'2'    0      0     5      0

How can I do this in Polars? I would rather avoid converting the whole thing into pandas.

答案1

得分: 2

在Polars中,您可以使用pivot操作来实现所需的结果。以下是如何在特定的DataFrame中执行此操作的示例代码:

import polars as pl

d = {
    'id': ['N/A', 'N/A', '1', '1', '2'],
    'type': ['red', 'blue', 'yellow', 'green', 'yellow'],
    'area': [0, 0, 3, 4, 5]
}

dp = pl.DataFrame(d)

# 移除'id'列中包含'N/A'的行
dp = dp.filter(pl.col("id") != "N/A")

# 执行pivot操作
dp = dp.pivot('id', 'type', 'area', aggfn='first')

# 用0填充缺失值
dp = dp.fill_null(0)

print(dp)

输出结果如下:

shape: (2, 4)
┌─────┬──────┬───────┬──────┐
│ id  ┆ blue ┆ green ┆ red  │
│ --- ┆ ---  ┆ ---   ┆ ---  │
│ str ┆ i64  ┆ i64   ┆ i64  │
╞═════╪══════╪═══════╪══════╡
│ 1   ┆ 0    ┆ 4     ┆ 0    │
│ 2   ┆ 0    ┆ 0     ┆ 0    │
└─────┴──────┴───────┴──────┘

请注意,这段代码演示了如何在Polars中使用pivot操作将数据透视,并在需要时填充缺失值为0。

英文:

In Polars, you can achieve the desired result by using the pivot operation. Here's how you can do it for your specific DataFrame:

import polars as pl

d = {
    'id': ['N/A', 'N/A', '1', '1', '2'],
    'type': ['red', 'blue', 'yellow', 'green', 'yellow'],
    'area': [0, 0, 3, 4, 5]
}

dp = pl.DataFrame(d)

# Remove rows with 'N/A' in the 'id' column
dp = dp.filter(pl.col("id") != "N/A")

# Perform the pivot operation
dp = dp.pivot('id', 'type', 'area', aggfn='first')

# Fill missing values with 0
dp = dp.fill_null(0)

print(dp)

Output:

shape: (2, 4)
┌─────┬──────┬───────┬──────┐
│ id  ┆ blue ┆ green ┆ red  │
│ --- ┆ ---  ┆ ---   ┆ ---  │
│ str ┆ i64  ┆ i64   ┆ i64  │
╞═════╪══════╪═══════╪══════╡
│ 1   ┆ 0    ┆ 4     ┆ 0    │
│ 2   ┆ 0    ┆ 0     ┆ 0    │
└─────┴──────┴───────┴──────┘

答案2

得分: 1

(df.pivot('area', 'id', 'type', None)
   .filter(pl.col('id') != 'N/A')
)
英文:
(df.pivot('area', 'id', 'type', None)
   .filter(pl.col('id') != 'N/A')
)
shape: (2, 5)
┌─────┬──────┬──────┬────────┬───────┐
│ id  ┆ red  ┆ blue ┆ yellow ┆ green │
│ --- ┆ ---  ┆ ---  ┆ ---    ┆ ---   │
│ str ┆ i64  ┆ i64  ┆ i64    ┆ i64   │
╞═════╪══════╪══════╪════════╪═══════╡
│ 1   ┆ null ┆ null ┆ 3      ┆ 4     │
│ 2   ┆ null ┆ null ┆ 5      ┆ null  │
└─────┴──────┴──────┴────────┴───────┘

huangapple
  • 本文由 发表于 2023年6月29日 04:01:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/76576359.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定