英文:
In polars, is there a way to remove character accents from string columns?
问题
我想从文本列中去除字符的重音,例如将 "Piña" 转换为 "Pina"。
这是我在 pandas 中如何做的:
(names
.str.normalize('NFKD')
.str.encode('ascii', errors='ignore')
.str.decode('utf-8'))
Polars 有 str.decode 和 str.encode,但似乎不是我正在寻找的。谢谢!
英文:
I want to remove character accents from a text column, ex. convert Piña to Pina.
This is how I would do it in pandas:
(names
.str.normalize('NFKD')
.str.encode('ascii', errors='ignore')
.str.decode('utf-8'))
Polars has str.decode and str.encode but they don't seem to be what i'm looking for.
Thanks!
答案1
得分: 3
- 使用apply/lambda:
像这样:
from unicodedata import normalize
df.with_columns(
a=pl.col('a')
.apply(lambda x: normalize('NFKD', x)
.encode('ascii', errors='ignore')
.decode('utf-8')))
- 定义函数/map:
像这样:
from unicodedata import normalize
def custnorm(In_series):
for i, x in enumerate(In_series):
newvalue = normalize('NFKD', x).encode('ascii', errors='ignore').decode('utf-8')
if newvalue != x:
In_series[i] = newvalue
return In_series
然后在df内部可以这样做:
```python
df.with_columns(a=pl.col('a').map(custnorm))
apply和map之间的区别在于,apply告诉polars一次一行地循环,而map告诉polars将整个列作为Series
传递给函数,然后函数必须返回一个相同大小的Series
。
英文:
To expand on @jqurious's comment you can do one of two things:
- apply/lambda
like this:
from unicodedata import normalize
df.with_columns(
a=pl.col('a')
.apply(lambda x: normalize('NFKD',x)
.encode('ascii', errors='ignore')
.decode('utf-8')))
- define function/map
like this:
from unicodedata import normalize
def custnorm(In_series):
for i, x in enumerate(In_series):
newvalue = normalize('NFKD',x).encode('ascii', errors='ignore').decode('utf-8')
if newvalue != x:
In_series[i]=newvalue
return In_series
then inside the df you can do
df.with_columns(a=pl.col('a').map(custnorm))
The difference between apply and map is that apply tells polars to do the looping one row at a time whereas map tells polars to feed the whole column as a Series
to the function which must then return a Series
of the same size.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论