在Polars中创建一个具有类别 [‘a’, ‘b’, ‘c’] 的分类列。

huangapple go评论71阅读模式
英文:

Make a categorical column which has categories ['a', 'b', 'c'] in Polars

问题

在Polars中,你可以使用以下方法创建具有指定元素和类别的分类列:

import polars as pl

elements = ['a', 'b', 'a', 'a']
categories = ['a', 'b', 'c']

categorical_column = pl.col('your_column_name').to_series().cast(pl.Object)
categorical_column = categorical_column.cast(pl.Categorical(categories))

# 如果要添加数据到列中,你可以使用以下方法
# categorical_column = categorical_column.push_down(pl.DataFrame({'your_column_name': elements}))

print(categorical_column)

请将 'your_column_name' 替换为你的列名称。这将创建一个具有指定元素和类别的分类列。如果需要添加数据到该列中,你可以使用 push_down 方法。

英文:

How do I make a Categorical column which has:

  • elements: ['a', 'b', 'a', 'a']
  • categories ['a', 'b', 'c']

in polars?

In pandas, I would do:

In [31]: pd.Series(pd.Categorical(['a', 'b', 'a', 'a'], categories=['a', 'b', 'c']))
Out[31]:
0    a
1    b
2    a
3    a
dtype: category
Categories (3, object): ['a', 'b', 'c']

I have no idea how to do this in polars, the docs for Categorical look completely empty:
https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.Categorical.html

答案1

得分: 1

你可以使用 StringCache

with pl.StringCache():
    pl.Series(['a', 'b', 'c'], dtype=pl.Categorical())
    s = pl.Series(['a', 'b', 'a', 'a','z'], dtype=pl.Categorical())

StringCache 上下文中的所有内容都会共享相同的索引/值映射,因此第一行初始化了你想要的类别映射。第二行是你想要保留的 Series。我添加了一个额外的 'z' 以便我们可以看到:

s.to_physical()
shape: (5,)
Series: '' [u32]
[
    0
    1
    0
    0
    3
]

请注意,s 系列会跳过值为 2 的索引,因为它里面没有一个值是 'c' 的。

英文:

You can use the StringCache

with pl.StringCache():
    pl.Series(['a', 'b', 'c'], dtype=pl.Categorical())
    s = pl.Series(['a', 'b', 'a', 'a','z'], dtype=pl.Categorical())

Everything in the StringCache context will share the same index/value mapping so the first line initialized the mapping with the categories you want. The second line is the Series you want to keep. I added an extra 'z' so that we can see:

s.to_physical()
shape: (5,)
Series: '' [u32]
[
    0
    1
    0
    0
    3
]

Note that the s series skips 2 as it doesn't have a c value in it.

答案2

得分: 0

When I dig to parameters in Series, I found dtype parameter and I found a link in documentation, there is a Categorical in Other section.

DataTypes

is this what you need?:

elements = ['a', 'b', 'a', 'a']
categories = ['a', 'b', 'c']
categorical_col = pl.Series(elements, dtype=pl.Categorical(categories))

英文:

When I dig to parameters in Series, I found dtype parameter and I found a link in documentation, there is a Categorical in Other section.

DataTypes

is this what you need?:

elements = ['a', 'b', 'a', 'a']
categories = ['a', 'b', 'c']
categorical_col = pl.Series(elements, dtype=pl.Categorical(categories))

huangapple
  • 本文由 发表于 2023年7月4日 22:16:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76613542.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定