英文:
Make a categorical column which has categories ['a', 'b', 'c'] in Polars
问题
在Polars中,你可以使用以下方法创建具有指定元素和类别的分类列:
import polars as pl
elements = ['a', 'b', 'a', 'a']
categories = ['a', 'b', 'c']
categorical_column = pl.col('your_column_name').to_series().cast(pl.Object)
categorical_column = categorical_column.cast(pl.Categorical(categories))
# 如果要添加数据到列中,你可以使用以下方法
# categorical_column = categorical_column.push_down(pl.DataFrame({'your_column_name': elements}))
print(categorical_column)
请将 'your_column_name'
替换为你的列名称。这将创建一个具有指定元素和类别的分类列。如果需要添加数据到该列中,你可以使用 push_down
方法。
英文:
How do I make a Categorical column which has:
- elements:
['a', 'b', 'a', 'a']
- categories
['a', 'b', 'c']
in polars?
In pandas, I would do:
In [31]: pd.Series(pd.Categorical(['a', 'b', 'a', 'a'], categories=['a', 'b', 'c']))
Out[31]:
0 a
1 b
2 a
3 a
dtype: category
Categories (3, object): ['a', 'b', 'c']
I have no idea how to do this in polars, the docs for Categorical
look completely empty:
https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.Categorical.html
答案1
得分: 1
你可以使用 StringCache
:
with pl.StringCache():
pl.Series(['a', 'b', 'c'], dtype=pl.Categorical())
s = pl.Series(['a', 'b', 'a', 'a','z'], dtype=pl.Categorical())
在 StringCache
上下文中的所有内容都会共享相同的索引/值映射,因此第一行初始化了你想要的类别映射。第二行是你想要保留的 Series。我添加了一个额外的 'z'
以便我们可以看到:
s.to_physical()
shape: (5,)
Series: '' [u32]
[
0
1
0
0
3
]
请注意,s
系列会跳过值为 2 的索引,因为它里面没有一个值是 'c'
的。
英文:
You can use the StringCache
with pl.StringCache():
pl.Series(['a', 'b', 'c'], dtype=pl.Categorical())
s = pl.Series(['a', 'b', 'a', 'a','z'], dtype=pl.Categorical())
Everything in the StringCache
context will share the same index/value mapping so the first line initialized the mapping with the categories you want. The second line is the Series you want to keep. I added an extra 'z' so that we can see:
s.to_physical()
shape: (5,)
Series: '' [u32]
[
0
1
0
0
3
]
Note that the s
series skips 2 as it doesn't have a c value in it.
答案2
得分: 0
When I dig to parameters in Series, I found dtype parameter and I found a link in documentation, there is a Categorical in Other section.
is this what you need?:
elements = ['a', 'b', 'a', 'a']
categories = ['a', 'b', 'c']
categorical_col = pl.Series(elements, dtype=pl.Categorical(categories))
英文:
When I dig to parameters in Series, I found dtype parameter and I found a link in documentation, there is a Categorical in Other section.
is this what you need?:
elements = ['a', 'b', 'a', 'a']
categories = ['a', 'b', 'c']
categorical_col = pl.Series(elements, dtype=pl.Categorical(categories))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论