2023年6月27日 20:38:15go评论86阅读模式

英文:

How to create new columns based on a grouping method for one column in Polars?

问题

I have some data structured as shown in the first picture. Where I would like to restructure the dataframe.

Short piece of the initial data:

id	time	value
2050	02-01	20
2051	02-01	25
2050	02-02	21
2051	02-02	22
2051	02-03	23

The way I would like the restructured dataframe is with a timestamp column and then a column for each externallogid. I have done it with the use of pandas, but since the file is quite huge, and must be used multiple times, I would like to do it in Polars due to the speed.

Expected output:

time	2050	2051
02-01	20	25
02-02	21	22
02-03	nan	23

I have tried to use the groupby function, and join/hstack/concat. But it seems to have problems when also trying to use Lazyframes.

Thanks

To produce the data:

import polars as pl

lf = pl.DataFrame({'id': [2050, 2051, 2050, 2051, 2051],
                   'time': ['2023-05-01',
                            '2023-05-01',
                            '2023-05-02',
                            '2023-05-02',
                            '2023-05-03'],
                   'value': [20, 25, 21, 22, 23]})
lf = lf.with_column(pl.col("time").str.to_datetime("%Y-%m-%d"))

(Note: The code portion is not translated as per your request.)

英文:

I have some data structed as showed at the first picture. Where I like to restructure the dataframe.

Short piece of the initial data:

id	time	value
2050	02-01	20
2051	02-01	25
2050	02-02	21
2051	02-02	22
2051	02-03	23

The way I would like the restructured dataframe is with a timestamp column and then a column for each externallogid. I have done it with use of pandas, but since the file is quite huge, and must be used multiple times, I will like to do it in Polars due to the speed.

Excpected output:

time	2050	2051
02-01	20	25
02-02	21	22
02-03	nan	23

I have tried an use the groupby function, and join/hstack/concat. But seems to have problems when also trying to use Lazyframes.

Thanks

To produce the data:

import polars as pl

lf = pl.DataFrame({&#39;id&#39;: [2050, 2051, 2050, 2051, 2051],
                    &#39;time&#39;: [&#39;2023-05-01&#39;,
                             &#39;2023-05-01&#39;,
                             &#39;2023-05-02&#39;,
                             &#39;2023-05-02&#39;,
                             &#39;2023-05-03&#39;],
                   &#39;value&#39;: [20, 25, 21, 22, 23]})
lf = lf.with_columns(pl.col(&quot;time&quot;).str.to_datetime(&quot;%Y-%m-%d&quot;))

答案1

得分: 1

你应该进行数据透视；

In [29]: lf.pivot(columns='id', values='value', index='time', aggregate_function=None)
Out[29]:
shape: (3, 3)
┌─────────────────────┬──────┬──────┐
│ time                ┆ 2050 ┆ 2051 │
│ ---                 ┆ ---  ┆ ---  │
│ datetime[μs]        ┆ i64  ┆ i64  │
╞═════════════════════╪══════╪══════╡
│ 2023-05-01 00:00:00 ┆ 20   ┆ 25   │
│ 2023-05-02 00:00:00 ┆ 21   ┆ 22   │
│ 2023-05-03 00:00:00 ┆ null ┆ 23   │
└─────────────────────┴──────┴──────┘

英文:

You should pivot;

In [29]: lf.pivot(columns=&#39;id&#39;, values=&#39;value&#39;, index=&#39;time&#39;, aggregate_function=None)
Out[29]:
shape: (3, 3)
┌─────────────────────┬──────┬──────┐
│ time                ┆ 2050 ┆ 2051 │
│ ---                 ┆ ---  ┆ ---  │
│ datetime[μs]        ┆ i64  ┆ i64  │
╞═════════════════════╪══════╪══════╡
│ 2023-05-01 00:00:00 ┆ 20   ┆ 25   │
│ 2023-05-02 00:00:00 ┆ 21   ┆ 22   │
│ 2023-05-03 00:00:00 ┆ null ┆ 23   │
└─────────────────────┴──────┴──────┘

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何基于 Polars 中的一个列的分组方法创建新列？

问题

答案1

将仅包含正数据的列表归一化为包含负和正数值的范围内。

ValueError: 输出操作数的形状 (1,64) 与广播形状 (2,64) 不匹配。

The iterating polygons increasing by length of 10 px eachtime don't center perfectly with its inner polygon. What could the maths after line 11 be?

如何按照一个可能包含重复值的列表对字典列表进行排序？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论