Python Polars在数据框中查找字符串的长度

huangapple go评论96阅读模式
英文:

Python Polars find the length of a string in a dataframe

问题

我尝试在Polars中计算字符串中字母的数量。我可能可以只使用apply方法并获取len(Name)。但是,我想知道是否有Polars特定的方法?

  1. import polars as pl
  2. mydf = pl.DataFrame(
  3. {"start_date": ["2020-01-02", "2020-01-03", "2020-01-04"],
  4. "Name": ["John", "Joe", "James"]})
  5. print(mydf)
  6. start_date Name
  7. --- ---
  8. str str
  9. ╞════════════╪═══════╡
  10. 2020-01-02 John
  11. 2020-01-03 Joe
  12. 2020-01-04 James

最终John将有5个字母,Joe将有3个字母,James将有5个字母。

我认为类似于以下内容可能适用于Pandas的等效代码:

  1. # 假设这是一个Pandas DataFrame
  2. mydf['count'] = mydf['Name'].str.len()
  3. # Polars等效 - 错误
  4. mydf = mydf.with_columns(
  5. pl.col('Name').str.len().alias('count')
  6. )
英文:

I am trying to count the number of letters in a string in Polars.
I could probably just use an apply method and get the len(Name).
However, I was wondering if there is a polars specific method?

  1. import polars as pl
  2. mydf = pl.DataFrame(
  3. {"start_date": ["2020-01-02", "2020-01-03", "2020-01-04"],
  4. "Name": ["John", "Joe", "James"]})
  5. print(mydf)
  6. start_date Name
  7. --- ---
  8. str str
  9. ╞════════════╪═══════╡
  10. 2020-01-02 John
  11. 2020-01-03 Joe
  12. 2020-01-04 James

In the end John would have 5, Joe would be 3 and James would be 5

I thought something like below might work based on the Pandas equivalent

  1. # Assume that its a Pandas Dataframe
  2. mydf['count'] = mydf ['Name'].str.len()
  3. # Polars equivalent - ERRORs
  4. mydf = mydf.with_columns(
  5. pl.col('Name').str.len().alias('count')
  6. )

答案1

得分: 2

你可以使用以下方法:

  • .str.lengths() 计算UTF8字符串中的字节数(文档) - 更快
  • .str.n_chars() 计算字符数(文档)
  1. mydf.with_columns([
  2. pl.col("Name").str.lengths().alias("len")
  3. ])
  1. ┌────────────┬───────┬─────┐
  2. start_date Name len
  3. --- --- ---
  4. str str u32
  5. ╞════════════╪═══════╪═════╡
  6. 2020-01-02 John 4
  7. 2020-01-03 Joe 3
  8. 2020-01-04 James 5
  9. └────────────┴───────┴─────┘
英文:

You can use

  • .str.lengths() that counts number of bytes in the UTF8 string (doc) - faster
  • .str.n_chars() that counts number of characters (doc)
  1. mydf.with_columns([
  2. pl.col("Name").str.lengths().alias("len")
  3. ])
  1. ┌────────────┬───────┬─────┐
  2. start_date Name len
  3. --- --- ---
  4. str str u32
  5. ╞════════════╪═══════╪═════╡
  6. 2020-01-02 John 4
  7. 2020-01-03 Joe 3
  8. 2020-01-04 James 5
  9. └────────────┴───────┴─────┘

huangapple
  • 本文由 发表于 2023年2月8日 18:04:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/75384141.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定