如何基于 Polars 中的一个列的分组方法创建新列?

huangapple go评论72阅读模式
英文:

How to create new columns based on a grouping method for one column in Polars?

问题

I have some data structured as shown in the first picture. Where I would like to restructure the dataframe.

Short piece of the initial data:

id time value
2050 02-01 20
2051 02-01 25
2050 02-02 21
2051 02-02 22
2051 02-03 23

The way I would like the restructured dataframe is with a timestamp column and then a column for each externallogid. I have done it with the use of pandas, but since the file is quite huge, and must be used multiple times, I would like to do it in Polars due to the speed.

Expected output:

time 2050 2051
02-01 20 25
02-02 21 22
02-03 nan 23

I have tried to use the groupby function, and join/hstack/concat. But it seems to have problems when also trying to use Lazyframes.

Thanks

To produce the data:

import polars as pl

lf = pl.DataFrame({'id': [2050, 2051, 2050, 2051, 2051],
                   'time': ['2023-05-01',
                            '2023-05-01',
                            '2023-05-02',
                            '2023-05-02',
                            '2023-05-03'],
                   'value': [20, 25, 21, 22, 23]})
lf = lf.with_column(pl.col("time").str.to_datetime("%Y-%m-%d"))

(Note: The code portion is not translated as per your request.)

英文:

I have some data structed as showed at the first picture. Where I like to restructure the dataframe.
如何基于 Polars 中的一个列的分组方法创建新列?
Short piece of the initial data:

id time value
2050 02-01 20
2051 02-01 25
2050 02-02 21
2051 02-02 22
2051 02-03 23

The way I would like the restructured dataframe is with a timestamp column and then a column for each externallogid. I have done it with use of pandas, but since the file is quite huge, and must be used multiple times, I will like to do it in Polars due to the speed.

Excpected output:

time 2050 2051
02-01 20 25
02-02 21 22
02-03 nan 23

I have tried an use the groupby function, and join/hstack/concat. But seems to have problems when also trying to use Lazyframes.

Thanks

To produce the data:

import polars as pl

lf = pl.DataFrame({'id': [2050, 2051, 2050, 2051, 2051],
                    'time': ['2023-05-01',
                             '2023-05-01',
                             '2023-05-02',
                             '2023-05-02',
                             '2023-05-03'],
                   'value': [20, 25, 21, 22, 23]})
lf = lf.with_columns(pl.col("time").str.to_datetime("%Y-%m-%d"))

答案1

得分: 1

你应该进行数据透视;

In [29]: lf.pivot(columns='id', values='value', index='time', aggregate_function=None)
Out[29]:
shape: (3, 3)
┌─────────────────────┬──────┬──────┐
 time                 2050  2051 
 ---                  ---   ---  
 datetime[μs]         i64   i64  
╞═════════════════════╪══════╪══════╡
 2023-05-01 00:00:00  20    25   
 2023-05-02 00:00:00  21    22   
 2023-05-03 00:00:00  null  23   
└─────────────────────┴──────┴──────┘
英文:

You should pivot;

In [29]: lf.pivot(columns='id', values='value', index='time', aggregate_function=None)
Out[29]:
shape: (3, 3)
┌─────────────────────┬──────┬──────┐
 time                 2050  2051 
 ---                  ---   ---  
 datetime[μs]         i64   i64  
╞═════════════════════╪══════╪══════╡
 2023-05-01 00:00:00  20    25   
 2023-05-02 00:00:00  21    22   
 2023-05-03 00:00:00  null  23   
└─────────────────────┴──────┴──────┘

huangapple
  • 本文由 发表于 2023年6月27日 20:38:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76564954.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定