Python-polars: Create row per unique value in a pl.DataFrame column, columns with another, and values with a third

huangapple go评论109阅读模式
英文:

Python-polars: Create row per unique value in a pl.DataFrame column, columns with another, and values with a third

问题

  1. 我有一个类似这样的 Polars DataFrame
  2. d = {'id': ['N/A', 'N/A', '1', '1', '2'], 'type': ['red', 'blue', 'yellow', 'green', 'yellow'], 'area': [0, 0, 3, 4, 5]}
  3. dp = pl.DataFrame(d)
  4. shape: (5, 3)
  5. ┌─────┬────────┬──────┐
  6. id type area
  7. --- --- ---
  8. str str i64
  9. ╞═════╪════════╪══════╡
  10. N/A red 0
  11. N/A blue 0
  12. 1 yellow 3
  13. 1 green 4
  14. 2 yellow 5
  15. └─────┴────────┴──────┘
  16. 我想要进行一些类似于旋转或转置的操作以便每一行都是一个 id不包括 'N/A'),并且每种类型都有一列其值为 area如果没有给出值应该为零在这种情况下结果应该如下所示
  17. red blue yellow green
  18. '1' 0 0 3 4
  19. '2' 0 0 5 0
  20. Polars 中我该如何实现这个操作我宁愿避免将整个 DataFrame 转换成 pandas
英文:

I have a Polars DataFrame that looks like this:

  1. d = {'id': ['N/A', 'N/A', '1', '1', '2'], 'type': ['red', 'blue', 'yellow', 'green', 'yellow'], 'area': [0, 0, 3, 4, 5]}
  2. dp = pl.DataFrame(d)
  3. shape: (5, 3)
  4. ┌─────┬────────┬──────┐
  5. id type area
  6. --- --- ---
  7. str str i64
  8. ╞═════╪════════╪══════╡
  9. N/A red 0
  10. N/A blue 0
  11. 1 yellow 3
  12. 1 green 4
  13. 2 yellow 5
  14. └─────┴────────┴──────┘

I would like to do some sort of pivot or transpose operation so that each row is an id (excluding 'N/A') and there is a column for each type, and the value is the area. If no value is given, it should be zero. So in this case, the result should look like this:

  1. red blue yellow green
  2. '1' 0 0 3 4
  3. '2' 0 0 5 0

How can I do this in Polars? I would rather avoid converting the whole thing into pandas.

答案1

得分: 2

在Polars中,您可以使用pivot操作来实现所需的结果。以下是如何在特定的DataFrame中执行此操作的示例代码:

  1. import polars as pl
  2. d = {
  3. 'id': ['N/A', 'N/A', '1', '1', '2'],
  4. 'type': ['red', 'blue', 'yellow', 'green', 'yellow'],
  5. 'area': [0, 0, 3, 4, 5]
  6. }
  7. dp = pl.DataFrame(d)
  8. # 移除'id'列中包含'N/A'的行
  9. dp = dp.filter(pl.col("id") != "N/A")
  10. # 执行pivot操作
  11. dp = dp.pivot('id', 'type', 'area', aggfn='first')
  12. # 用0填充缺失值
  13. dp = dp.fill_null(0)
  14. print(dp)

输出结果如下:

  1. shape: (2, 4)
  2. ┌─────┬──────┬───────┬──────┐
  3. id blue green red
  4. --- --- --- ---
  5. str i64 i64 i64
  6. ╞═════╪══════╪═══════╪══════╡
  7. 1 0 4 0
  8. 2 0 0 0
  9. └─────┴──────┴───────┴──────┘

请注意,这段代码演示了如何在Polars中使用pivot操作将数据透视,并在需要时填充缺失值为0。

英文:

In Polars, you can achieve the desired result by using the pivot operation. Here's how you can do it for your specific DataFrame:

  1. import polars as pl
  2. d = {
  3. 'id': ['N/A', 'N/A', '1', '1', '2'],
  4. 'type': ['red', 'blue', 'yellow', 'green', 'yellow'],
  5. 'area': [0, 0, 3, 4, 5]
  6. }
  7. dp = pl.DataFrame(d)
  8. # Remove rows with 'N/A' in the 'id' column
  9. dp = dp.filter(pl.col("id") != "N/A")
  10. # Perform the pivot operation
  11. dp = dp.pivot('id', 'type', 'area', aggfn='first')
  12. # Fill missing values with 0
  13. dp = dp.fill_null(0)
  14. print(dp)

Output:

  1. shape: (2, 4)
  2. ┌─────┬──────┬───────┬──────┐
  3. id blue green red
  4. --- --- --- ---
  5. str i64 i64 i64
  6. ╞═════╪══════╪═══════╪══════╡
  7. 1 0 4 0
  8. 2 0 0 0
  9. └─────┴──────┴───────┴──────┘

答案2

得分: 1

  1. (df.pivot('area', 'id', 'type', None)
  2. .filter(pl.col('id') != 'N/A')
  3. )
英文:
  1. (df.pivot('area', 'id', 'type', None)
  2. .filter(pl.col('id') != 'N/A')
  3. )
  1. shape: (2, 5)
  2. ┌─────┬──────┬──────┬────────┬───────┐
  3. id red blue yellow green
  4. --- --- --- --- ---
  5. str i64 i64 i64 i64
  6. ╞═════╪══════╪══════╪════════╪═══════╡
  7. 1 null null 3 4
  8. 2 null null 5 null
  9. └─────┴──────┴──────┴────────┴───────┘

huangapple
  • 本文由 发表于 2023年6月29日 04:01:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/76576359.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定