英文:
Go regexp for printable characters
问题
我有一个处理和保存用户指定名称的Go服务器应用程序。我真的不在乎名称是什么;如果他们想要使用象形文字或表情符号,只要大多数客户端都能显示就可以。根据C#的这个问题,我希望使用以下正则表达式:
^[^\p{Cc}\p{Cn}\p{Cs}]{1,50}$
基本上是1-50个不是控制字符、未分配字符或部分UTF-16字符的字符。但是Go不支持Cn。基本上,我找不到一个合理的正则表达式,可以匹配任何可打印的Unicode字符串,但不匹配"",例如。
我想使用正则表达式,因为客户端不是用Go编写的,我希望能够精确匹配服务器验证。在其他语言中,如何匹配isPrint
这样的函数并不清楚。
除了在我的应用程序中硬编码未分配的Unicode范围并单独检查之外,还有其他方法可以做到这一点吗?
英文:
I have a Go server application that processes and saves a user-specified name. I really don't care what the name is; if they want it to be in hieroglyphs or emojis that's fine, as long as most clients can display it. Based on this question for C# I was hoping to use
^[^\p{Cc}\p{Cn}\p{Cs}]{1,50}$
basically 1-50 characters that are not control characters, unassigned characters, or partial UTF-16 characters. But Go does not support Cn. Basically I can't find a reasonable regexp that will match any printable unicode string but not "", for example.
I want to use regex because the clients are not written in Go and I want to be able to precisely match the server validation. It's not clear how to match functions like isPrint
in other languages.
Is there any way to do this other than hard-coding the unassigned unicode ranges into my application and separately checking for those?
答案1
得分: 1
你可能只想使用这些Unicode字符类:
- L(字母)
- M(标记)
- P(标点符号)
- S(符号)
这将给你一个[正向]的正则表达式:
^[\pL\pM\pN\pP\pS]+$
或者,测试那些你不想要的Unicode字符类:
- Z(分隔符)
- C(其他)
同样,一个正向的正则表达式:
^[^\pZ\pC]+$
英文:
You probably want to use just these Unicode character classes:
- L (Letter)
- M (Mark)
- P (Punctuation)
- S (Symbol)
That would give you this [positive] regular expression:
^[\pL\pM\pN\pP\pS]+$
Alternatively, test for those Unicode character classes which you don't want:
- Z (Separator)
- C (Other)
Again, a positive regular expression:
^[^\pZ\pC]+$
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论