Python Polars在数据框中查找字符串的长度

huangapple go评论62阅读模式
英文:

Python Polars find the length of a string in a dataframe

问题

我尝试在Polars中计算字符串中字母的数量。我可能可以只使用apply方法并获取len(Name)。但是,我想知道是否有Polars特定的方法?

import polars as pl

mydf = pl.DataFrame(
    {"start_date": ["2020-01-02", "2020-01-03", "2020-01-04"],
     "Name": ["John", "Joe", "James"]})

print(mydf)

start_date   Name  
 ---         ---   
 str         str   
╞════════════╪═══════╡
 2020-01-02  John  
 2020-01-03  Joe   
 2020-01-04  James 

最终John将有5个字母,Joe将有3个字母,James将有5个字母。

我认为类似于以下内容可能适用于Pandas的等效代码:

# 假设这是一个Pandas DataFrame
mydf['count'] = mydf['Name'].str.len()

# Polars等效 - 错误
mydf = mydf.with_columns(
        pl.col('Name').str.len().alias('count')
)
英文:

I am trying to count the number of letters in a string in Polars.
I could probably just use an apply method and get the len(Name).
However, I was wondering if there is a polars specific method?

import polars as pl

mydf = pl.DataFrame(
    {"start_date": ["2020-01-02", "2020-01-03", "2020-01-04"],
     "Name": ["John", "Joe", "James"]})

print(mydf)

start_date   Name  
 ---         ---   
 str         str   
╞════════════╪═══════╡
 2020-01-02  John  
 2020-01-03  Joe   
 2020-01-04  James 


In the end John would have 5, Joe would be 3 and James would be 5

I thought something like below might work based on the Pandas equivalent

# Assume that its a Pandas Dataframe
mydf['count'] = mydf ['Name'].str.len()

# Polars equivalent - ERRORs
mydf = mydf.with_columns(
        pl.col('Name').str.len().alias('count')
)

答案1

得分: 2

你可以使用以下方法:

  • .str.lengths() 计算UTF8字符串中的字节数(文档) - 更快
  • .str.n_chars() 计算字符数(文档)
mydf.with_columns([
    pl.col("Name").str.lengths().alias("len")
])
┌────────────┬───────┬─────┐
│ start_date ┆ Name  ┆ len │
│ ---        ┆ ---   ┆ --- │
│ str        ┆ str   ┆ u32 │
╞════════════╪═══════╪═════╡
│ 2020-01-02 ┆ John  ┆ 4   │
│ 2020-01-03 ┆ Joe   ┆ 3   │
│ 2020-01-04 ┆ James ┆ 5   │
└────────────┴───────┴─────┘
英文:

You can use

  • .str.lengths() that counts number of bytes in the UTF8 string (doc) - faster
  • .str.n_chars() that counts number of characters (doc)
mydf.with_columns([
    pl.col("Name").str.lengths().alias("len")
])
┌────────────┬───────┬─────┐
│ start_date ┆ Name  ┆ len │
│ ---        ┆ ---   ┆ --- │
│ str        ┆ str   ┆ u32 │
╞════════════╪═══════╪═════╡
│ 2020-01-02 ┆ John  ┆ 4   │
│ 2020-01-03 ┆ Joe   ┆ 3   │
│ 2020-01-04 ┆ James ┆ 5   │
└────────────┴───────┴─────┘

huangapple
  • 本文由 发表于 2023年2月8日 18:04:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/75384141.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定