如何在Python中删除字符串中的任何非波斯字符?

huangapple go评论55阅读模式
英文:

How to remove any non-Persian character in a string in python?

问题

  1. 你应该考虑使用 Unicode 字符集来识别波斯语字符。
  2. 波斯语字符属于 Unicode 范围,而不是 ASCII 范围。
  3. 你应该使用波斯语(Persian)而不是阿拉伯语(Arabic)。
  4. 要找到波斯语字母的 Unicode 范围,你可以使用 Unicode 表,或者查找 Unicode 范围的相关资料。波斯语字母通常在 U+0600 到 U+06FF 的范围内。
英文:

I want to remove any non-Persian character in a string in python.
For example if I have a string like this:

00سلامabc

I have the Persian characters and the result becomes like this:

سلام

I know that it is possible that I can extract just Persian characters from a string by regex.
But I have four questions:

  1. Which type of characters should I consider? ascii or unicode?
  2. Is there Persian range in ascii or unicode?
  3. Which language should I use? Arabic or Persian?
  4. How do I find the range of the alphabets?

答案1

得分: 2

import re

def persian_only(s):
    return "".join(re.findall(r"[\u0600-\u06FF]+", s))

https://trinket.io/python3/cc31b7b436

英文:

You could use a regular expression to find all the persian characters and join them back together...

import re

def persian_only(s):
    return "".join(re.findall(r"[\u0600-\u06FF]+", s))

>>> persian_only("00سلامabc")
سلام

https://trinket.io/python3/cc31b7b436

huangapple
  • 本文由 发表于 2023年2月14日 19:54:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/75447466.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定