英文:
How to remove any non-Persian character in a string in python?
问题
- 你应该考虑使用 Unicode 字符集来识别波斯语字符。
- 波斯语字符属于 Unicode 范围,而不是 ASCII 范围。
- 你应该使用波斯语(Persian)而不是阿拉伯语(Arabic)。
- 要找到波斯语字母的 Unicode 范围,你可以使用 Unicode 表,或者查找 Unicode 范围的相关资料。波斯语字母通常在 U+0600 到 U+06FF 的范围内。
英文:
I want to remove any non-Persian character in a string in python.
For example if I have a string like this:
00سلامabc
I have the Persian characters and the result becomes like this:
سلام
I know that it is possible that I can extract just Persian characters from a string by regex.
But I have four questions:
- Which type of characters should I consider? ascii or unicode?
- Is there Persian range in ascii or unicode?
- Which language should I use? Arabic or Persian?
- How do I find the range of the alphabets?
答案1
得分: 2
import re
def persian_only(s):
return "".join(re.findall(r"[\u0600-\u06FF]+", s))
https://trinket.io/python3/cc31b7b436
英文:
You could use a regular expression to find all the persian characters and join them back together...
import re
def persian_only(s):
return "".join(re.findall(r"[\u0600-\u06FF]+", s))
>>> persian_only("00سلامabc")
سلام
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论