2023年6月29日 04:01:17go评论109阅读模式

英文:

Python-polars: Create row per unique value in a pl.DataFrame column, columns with another, and values with a third

问题

我有一个类似这样的 Polars DataFrame：
    d = {'id': ['N/A', 'N/A', '1', '1', '2'], 'type': ['red', 'blue', 'yellow', 'green', 'yellow'], 'area': [0, 0, 3, 4, 5]}
    dp = pl.DataFrame(d)
    shape: (5, 3)
    ┌─────┬────────┬──────┐
    │ id  ┆ type   ┆ area │
    │ --- ┆ ---    ┆ ---  │
    │ str ┆ str    ┆ i64  │
    ╞═════╪════════╪══════╡
    │ N/A ┆ red    ┆ 0    │
    │ N/A ┆ blue   ┆ 0    │
    │ 1   ┆ yellow ┆ 3    │
    │ 1   ┆ green  ┆ 4    │
    │ 2   ┆ yellow ┆ 5    │
    └─────┴────────┴──────┘
我想要进行一些类似于旋转或转置的操作，以便每一行都是一个 id（不包括 'N/A'），并且每种类型都有一列，其值为 area。如果没有给出值，应该为零。在这种情况下，结果应该如下所示：
          red   blue  yellow  green
    '1'    0      0     3      4
    '2'    0      0     5      0
在 Polars 中我该如何实现这个操作？我宁愿避免将整个 DataFrame 转换成 pandas。

英文:

I have a Polars DataFrame that looks like this:

d = {&#39;id&#39;: [&#39;N/A&#39;, &#39;N/A&#39;, &#39;1&#39;, &#39;1&#39;, &#39;2&#39;], &#39;type&#39;: [&#39;red&#39;, &#39;blue&#39;, &#39;yellow&#39;, &#39;green&#39;, &#39;yellow&#39;], &#39;area&#39;: [0, 0, 3, 4, 5]}
dp = pl.DataFrame(d)
shape: (5, 3)
┌─────┬────────┬──────┐
│ id  ┆ type   ┆ area │
│ --- ┆ ---    ┆ ---  │
│ str ┆ str    ┆ i64  │
╞═════╪════════╪══════╡
│ N/A ┆ red    ┆ 0    │
│ N/A ┆ blue   ┆ 0    │
│ 1   ┆ yellow ┆ 3    │
│ 1   ┆ green  ┆ 4    │
│ 2   ┆ yellow ┆ 5    │
└─────┴────────┴──────┘

I would like to do some sort of pivot or transpose operation so that each row is an id (excluding 'N/A') and there is a column for each type, and the value is the area. If no value is given, it should be zero. So in this case, the result should look like this:

      red   blue  yellow  green
&#39;1&#39;    0      0     3      4
&#39;2&#39;    0      0     5      0

How can I do this in Polars? I would rather avoid converting the whole thing into pandas.

答案1

得分: 2

在Polars中，您可以使用pivot操作来实现所需的结果。以下是如何在特定的DataFrame中执行此操作的示例代码：

import polars as pl
d = {
    'id': ['N/A', 'N/A', '1', '1', '2'],
    'type': ['red', 'blue', 'yellow', 'green', 'yellow'],
    'area': [0, 0, 3, 4, 5]
}
dp = pl.DataFrame(d)
# 移除'id'列中包含'N/A'的行
dp = dp.filter(pl.col("id") != "N/A")
# 执行pivot操作
dp = dp.pivot('id', 'type', 'area', aggfn='first')
# 用0填充缺失值
dp = dp.fill_null(0)
print(dp)

输出结果如下：

shape: (2, 4)
┌─────┬──────┬───────┬──────┐
│ id  ┆ blue ┆ green ┆ red  │
│ --- ┆ ---  ┆ ---   ┆ ---  │
│ str ┆ i64  ┆ i64   ┆ i64  │
╞═════╪══════╪═══════╪══════╡
│ 1   ┆ 0    ┆ 4     ┆ 0    │
│ 2   ┆ 0    ┆ 0     ┆ 0    │
└─────┴──────┴───────┴──────┘

请注意，这段代码演示了如何在Polars中使用pivot操作将数据透视，并在需要时填充缺失值为0。

英文:

In Polars, you can achieve the desired result by using the pivot operation. Here's how you can do it for your specific DataFrame:

import polars as pl
d = {
    &#39;id&#39;: [&#39;N/A&#39;, &#39;N/A&#39;, &#39;1&#39;, &#39;1&#39;, &#39;2&#39;],
    &#39;type&#39;: [&#39;red&#39;, &#39;blue&#39;, &#39;yellow&#39;, &#39;green&#39;, &#39;yellow&#39;],
    &#39;area&#39;: [0, 0, 3, 4, 5]
}
dp = pl.DataFrame(d)
# Remove rows with &#39;N/A&#39; in the &#39;id&#39; column
dp = dp.filter(pl.col(&quot;id&quot;) != &quot;N/A&quot;)
# Perform the pivot operation
dp = dp.pivot(&#39;id&#39;, &#39;type&#39;, &#39;area&#39;, aggfn=&#39;first&#39;)
# Fill missing values with 0
dp = dp.fill_null(0)
print(dp)

Output:

shape: (2, 4)
┌─────┬──────┬───────┬──────┐
│ id  ┆ blue ┆ green ┆ red  │
│ --- ┆ ---  ┆ ---   ┆ ---  │
│ str ┆ i64  ┆ i64   ┆ i64  │
╞═════╪══════╪═══════╪══════╡
│ 1   ┆ 0    ┆ 4     ┆ 0    │
│ 2   ┆ 0    ┆ 0     ┆ 0    │
└─────┴──────┴───────┴──────┘

答案2

得分: 1

(df.pivot('area', 'id', 'type', None)
   .filter(pl.col('id') != 'N/A')
)

英文:

(df.pivot(&#39;area&#39;, &#39;id&#39;, &#39;type&#39;, None)
   .filter(pl.col(&#39;id&#39;) != &#39;N/A&#39;)
)

shape: (2, 5)
┌─────┬──────┬──────┬────────┬───────┐
│ id  ┆ red  ┆ blue ┆ yellow ┆ green │
│ --- ┆ ---  ┆ ---  ┆ ---    ┆ ---   │
│ str ┆ i64  ┆ i64  ┆ i64    ┆ i64   │
╞═════╪══════╪══════╪════════╪═══════╡
│ 1   ┆ null ┆ null ┆ 3      ┆ 4     │
│ 2   ┆ null ┆ null ┆ 5      ┆ null  │
└─────┴──────┴──────┴────────┴───────┘

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python-polars: Create row per unique value in a pl.DataFrame column, columns with another, and values with a third

问题

答案1

答案2

将Python列表根据元素条件分成多个列表。

如何在几秒钟内更新一个tkinter标签

Is there a way to look for a part of a string ('USD' in 'USDSEK') in the keys of a dictionary and if found return the value?

Python sys.argv and argparser conflict

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。