英文:
Pivoting a dask dataframe using multiple columns as index
问题
我有一个如下格式的Dask DataFrame:
date hour device param value
20190701 21 dev_01 att_1 0.000000
20190718 22 dev_01 att_2 20.000000
20190718 22 dev_01 att_3 18.611111
20190701 21 dev_01 att_4 18.706083
20190718 22 dev_01 att_5 23.333333
我正在尝试使用Dask.DataFrames.pivot_table() API进行数据透视。然而,我想要将'date'、'hour'和'device'用作索引(即,在透视表中,每一行都可以通过日期、小时和设备标识唯一标识):
ddf.pivot_table(index=['date', 'hour', 'device'], columns='param', values='value')
然而,它出现了以下错误:
'index' must be the name of an existing column
根据API文档(这里),我了解到参数'index'只接受单个列的名称(而不是列表),因此出现了此错误。
是否有其他方法可以使用多列作为索引来透视Dask DataFrame?
英文:
I have a Dask DataFrame of following format:
date hour device param value
20190701 21 dev_01 att_1 0.000000
20190718 22 dev_01 att_2 20.000000
20190718 22 dev_01 att_3 18.611111
20190701 21 dev_01 att_4 18.706083
20190718 22 dev_01 att_5 23.333333
I am trying to pivot the dataframe using Dask.DataFrames.pivot_table() API. However, I want to use 'date', 'hour' and 'device' as the index (i.e, in the pivoted table each row would be uniquely identified by the date, hour and device identifier):
ddf.pivot_table(index = ['date', 'hour', 'device'], columns='param', values='value')
However, it's failing with the following error:
'index' must be the name of an existing column
As I understand from the API documentation (here), the parameter 'index' accepts name of a single column (and not a list) and hence this error.
Is there any other alternative of pivoting a dask dataframe using multiple columns as index?
答案1
得分: 2
如在文档字符串中提到的,您要进行数据透视的列必须是单个列,并且必须是分类数据类型。因此,要实现您想要的效果,您需要将这三列转换为单个分类列。
这可以使用普通的Pandas语法完成,但可能需要对数据进行完整的遍历以获取类别。
英文:
As mentioned in the docstring the column on which you pivot must be a single column, and it must be of categorical dtype. So to accomplish what you want you would have to convert your three columns into a single categorical column.
This is doable using normal Pandas syntax, but will likely require a full pass through the data to get the categories.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论