Sql正则表达式可以接受中文字符、ASCII字符,同时拒绝特殊字符。

huangapple go评论54阅读模式
英文:

Sql regexp that accepts chinese character,ascii & rejects special characters

问题

需要一个满足以下条件的 SQL 正则表达式模式。

  1. 接受中文字符,
  2. 接受 - a-z、A-Z、0-9 和空格,
  3. 拒绝特殊字符。

我已经尝试了以下内容。

Select regexp_like((val)::TEXT , ('^[ !-’~¡-ÿ]*$')::TEXT)

regexp_like((val)::TEXT ,('^[^[:ascii:]]+$')::text);

上述查询也接受了不应该接受的特殊字符。

SELECT (('#$')::TEXT ~ ('^[a-zA-Z0-9]*$'));

这个查询满足条件,但无法接受中文字符。

英文:

I need a sql regexp pattern that satisfied the following criteria.

1.Accepts chinese characters,
2.Accepts - a-z,A-Z,0-9,spaces
3.Rejects only special characters.

I've tried the following.

  Select regexp_like((val)::TEXT , ('^[ !-’~¡-ÿ]*$')::TEXT)


      Or

regexp_like((val)::TEXT ,('^[^[:ascii:]]+$')::text);

The above query also accepts special characters it should not be.

SELECT (('#$$')::TEXT ~ ('^[a-zA-Z0-9]*$'));

This query satisfied but fails to accept chinese character.

答案1

得分: 0

你可以使用中文字符的Unicode值

SELECT (('#$$')::TEXT ~ ('^[a-zA-Z0-9\x4e00-\x9fff\x3400-\x4dbf]*$'));

英文:

You can use the unicode values of the chinese characters

SELECT (('#$$')::TEXT ~ ('^[a-zA-Z0-9\x4e00-\x9fff\x3400-\x4dbf]*$'));

答案2

得分: 0

根据 Wikipedia,中文字符位于以下Unicode范围内,从U+4E00到U+9FFF。维基百科 – CJK统一表意文字

另外,还有 扩展A到H
CJK统一表意文字扩展A,U+3400到U+4DBF。
CJK统一表意文字扩展B,U+20000到U+2A6DF。
CJK统一表意文字扩展C,U+2A700到U+2B73F。
CJK统一表意文字扩展D,U+2B740到U+2B81F。
CJK统一表意文字扩展E,U+2B820到U+2CEAF。
CJK统一表意文字扩展F,U+2CEB0到U+2EBEF。
CJK统一表意文字扩展G,U+30000到U+3134F。
CJK统一表意文字扩展H,U+31350到U+323AF。

因此,您可以将Unicode范围添加到您的字符类,如下所示。

> "1.接受中文字符"

[\u4e00-\u9fff]

> "2.接受 - a-z,A-Z,0-9,空格"

(?i),将切换为 不区分大小写 模式。

(?i)[a-z\d \u4e00-\u9fff]

> "3.仅拒绝特殊字符"

我想您提供的值是您希望拒绝的字符。
对于提供的范围,从_!到’,您希望跳过数字字符_0到9_和大写字母_从A到Z

因此,需要更改为以下内容。

[^!-/:-@\[-`~¡-ÿ]

然后,您可以使用 字符类交集 语法 && 将此字符类添加到先前的字符类中。

因此,完整的模式将如下所示。

(?i)^[a-z\d \u4e00-\u9fff&&[^!-/:-@\[-`~¡-ÿ]]*$
英文:

According to Wikipedia, the Chinese characters are within the following Unicode range, U+4E00, through U+9FFF.  Wikipedia – CJK Unified Ideographs.

Additionally, there are Extensions A through H.
CJK Unified Ideographs Extension A, U+3400 through U+4DBF.
CJK Unified Ideographs Extension B, U+20000 through U+2A6DF.
CJK Unified Ideographs Extension C, U+2A700 through U+2B73F.
CJK Unified Ideographs Extension D, U+2B740 though U+2B81F.
CJK Unified Ideographs Extension E, U+2B820 though U+2CEAF.
CJK Unified Ideographs Extension F, U+2CEB0 through U+2EBEF.
CJK Unified Ideographs Extension G, U+30000 through U+3134F.
CJK Unified Ideographs Extension H, U+31350 through U+323AF.

So, you can add a Unicode range to your character class, as follows.

> "1.Accepts chinese characters"

[\u4e00-\u9fff]

> "2.Accepts - a-z,A-Z,0-9,spaces"

The (?i), will toggle-on case-insensitive mode.

(?i)[a-z\d \u4e00-\u9fff]

> "3.Rejects only special characters."

I imagine the values you provided, are the characters you wish to reject.
For the provided range, ! through ’, you want to skip over the digit characters, 0 through 9, and the uppercase letters, A through Z.

So, that will need to be changed to the following.

[^!-/:-@\[-`~¡-ÿ]

You can then add this character class to the previous, using the character class intersection syntax, &&.

So, the complete pattern would be the following.

(?i)^[a-z\d \u4e00-\u9fff&&[^!-/:-@\[-`~¡-ÿ]]*$

huangapple
  • 本文由 发表于 2023年6月15日 03:50:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76477093.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定