可打印字符的Go正则表达式

huangapple go评论84阅读模式
英文:

Go regexp for printable characters

问题

我有一个处理和保存用户指定名称的Go服务器应用程序。我真的不在乎名称是什么;如果他们想要使用象形文字或表情符号,只要大多数客户端都能显示就可以。根据C#的这个问题,我希望使用以下正则表达式:

^[^\p{Cc}\p{Cn}\p{Cs}]{1,50}$

基本上是1-50个不是控制字符、未分配字符或部分UTF-16字符的字符。但是Go不支持Cn。基本上,我找不到一个合理的正则表达式,可以匹配任何可打印的Unicode字符串,但不匹配"퟿͸",例如。

我想使用正则表达式,因为客户端不是用Go编写的,我希望能够精确匹配服务器验证。在其他语言中,如何匹配isPrint这样的函数并不清楚。

除了在我的应用程序中硬编码未分配的Unicode范围并单独检查之外,还有其他方法可以做到这一点吗?

英文:

I have a Go server application that processes and saves a user-specified name. I really don't care what the name is; if they want it to be in hieroglyphs or emojis that's fine, as long as most clients can display it. Based on this question for C# I was hoping to use

^[^\p{Cc}\p{Cn}\p{Cs}]{1,50}$

basically 1-50 characters that are not control characters, unassigned characters, or partial UTF-16 characters. But Go does not support Cn. Basically I can't find a reasonable regexp that will match any printable unicode string but not "퟿͸", for example.

I want to use regex because the clients are not written in Go and I want to be able to precisely match the server validation. It's not clear how to match functions like isPrint in other languages.

Is there any way to do this other than hard-coding the unassigned unicode ranges into my application and separately checking for those?

答案1

得分: 1

你可能只想使用这些Unicode字符类:

  • L(字母)
  • M(标记)
  • P(标点符号)
  • S(符号)

这将给你一个[正向]的正则表达式:

^[\pL\pM\pN\pP\pS]+$

或者,测试那些你不想要的Unicode字符类:

  • Z(分隔符)
  • C(其他)

同样,一个正向的正则表达式:

^[^\pZ\pC]+$
英文:

You probably want to use just these Unicode character classes:

  • L (Letter)
  • M (Mark)
  • P (Punctuation)
  • S (Symbol)

That would give you this [positive] regular expression:

^[\pL\pM\pN\pP\pS]+$

Alternatively, test for those Unicode character classes which you don't want:

  • Z (Separator)
  • C (Other)

Again, a positive regular expression:

^[^\pZ\pC]+$

huangapple
  • 本文由 发表于 2022年6月3日 22:18:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/72490989.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定