将Dask数据框以多个列作为索引进行旋转。

huangapple go评论58阅读模式
英文:

Pivoting a dask dataframe using multiple columns as index

问题

我有一个如下格式的Dask DataFrame:

date    hour    device  param   value
20190701    21  dev_01  att_1   0.000000
20190718    22  dev_01  att_2   20.000000
20190718    22  dev_01  att_3   18.611111
20190701    21  dev_01  att_4   18.706083
20190718    22  dev_01  att_5   23.333333

我正在尝试使用Dask.DataFrames.pivot_table() API进行数据透视。然而,我想要将'date'、'hour'和'device'用作索引(即,在透视表中,每一行都可以通过日期、小时和设备标识唯一标识):

ddf.pivot_table(index=['date', 'hour', 'device'], columns='param', values='value')

然而,它出现了以下错误:

'index' must be the name of an existing column

根据API文档(这里),我了解到参数'index'只接受单个列的名称(而不是列表),因此出现了此错误。

是否有其他方法可以使用多列作为索引来透视Dask DataFrame?

英文:

I have a Dask DataFrame of following format:

date	   hour	device	param	  value
20190701	21	dev_01	att_1	0.000000
20190718	22	dev_01	att_2	20.000000
20190718	22	dev_01	att_3	18.611111
20190701	21	dev_01	att_4	18.706083
20190718	22	dev_01	att_5	23.333333

I am trying to pivot the dataframe using Dask.DataFrames.pivot_table() API. However, I want to use 'date', 'hour' and 'device' as the index (i.e, in the pivoted table each row would be uniquely identified by the date, hour and device identifier):

ddf.pivot_table(index = ['date', 'hour', 'device'], columns='param', values='value')

However, it's failing with the following error:

'index' must be the name of an existing column

As I understand from the API documentation (here), the parameter 'index' accepts name of a single column (and not a list) and hence this error.

Is there any other alternative of pivoting a dask dataframe using multiple columns as index?

答案1

得分: 2

如在文档字符串中提到的,您要进行数据透视的列必须是单个列,并且必须是分类数据类型。因此,要实现您想要的效果,您需要将这三列转换为单个分类列。

这可以使用普通的Pandas语法完成,但可能需要对数据进行完整的遍历以获取类别。

英文:

As mentioned in the docstring the column on which you pivot must be a single column, and it must be of categorical dtype. So to accomplish what you want you would have to convert your three columns into a single categorical column.

This is doable using normal Pandas syntax, but will likely require a full pass through the data to get the categories.

huangapple
  • 本文由 发表于 2020年1月3日 20:36:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/59578760.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定