英文:
How to use categorical data type with pyarrow dtypes?
问题
I'm working with the arrow dtypes with pandas, and my dataframe has a variable that should be categorical, but I can't figure out how to transform it into pyarrow data type for categorical data (dictionary).
According to pandas (https://arrow.apache.org/docs/python/pandas.html#pandas-arrow-conversion), the arrow data type I should be using is dictionary.
Usually, if you want pandas to use a pyarrow dtype you just add [pyarrow] to the name of the pyarrow type, for example dtype='string[pyarrow]'. I tried using dtype='dictionary[pyarrow]', but that yields the error:
data type 'dictionary[pyarrow]' not understood
I also tried 'categorical[pyarrow]', or 'category[pyarrow]', pyarrow.dictionary, pyarrow.dictionary(pyarrow.int16(), pyarrow.string()), and they didn't work either.
How can I use dictionary dtype on a pandas series?
pd.Series(['Chocolate', 'Candy', 'Waffles'], dtype='what_to_put_here????')
英文:
I'm working with the arrow dtypes with pandas, and my dataframe has a variable that should be categorical, but I can't figure out how to transform it into pyarrow data type for categorical data (dictionary)
According to pandas (https://arrow.apache.org/docs/python/pandas.html#pandas-arrow-conversion), the arrow data type I should be using is dictionary.
Usually, if you want pandas to use a pyarrow dtype you just add[pyarrow] to the name of the pyarrow type, for example dtype='string[pyarrow]'. I tried using dtype='dictionary[pyarrow]', but that yields the error:
> data type 'dictionary[pyarrow]' not understood
I also tried 'categorical[pyarrow]', or 'category[pyarrow]', pyarrow.dictionary, pyarrow.dictionary(pyarrow.int16(),pyarrow.string()), and they didn't work either.
How can i use dictionary dtype on a pandas series?
pd.Series(['Chocolate','Candy','Waffles'], dtype='what_to_put_here????')
答案1
得分: 4
我相信 pd.ArrowDtype
是必需的:
dtype=pd.ArrowDtype(pa.dictionary(pa.int16(), pa.string()))
英文:
I believe pd.ArrowDtype
is required:
dtype=pd.ArrowDtype(pa.dictionary(pa.int16(), pa.string()))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论