英文:
Sql regexp that accepts chinese character,ascii & rejects special characters
问题
需要一个满足以下条件的 SQL 正则表达式模式。
- 接受中文字符,
- 接受 - a-z、A-Z、0-9 和空格,
- 拒绝特殊字符。
我已经尝试了以下内容。
Select regexp_like((val)::TEXT , ('^[ !-’~¡-ÿ]*$')::TEXT)
或
regexp_like((val)::TEXT ,('^[^[:ascii:]]+$')::text);
上述查询也接受了不应该接受的特殊字符。
SELECT (('#$')::TEXT ~ ('^[a-zA-Z0-9]*$'));
这个查询满足条件,但无法接受中文字符。
英文:
I need a sql regexp pattern that satisfied the following criteria.
1.Accepts chinese characters,
2.Accepts - a-z,A-Z,0-9,spaces
3.Rejects only special characters.
I've tried the following.
Select regexp_like((val)::TEXT , ('^[ !-’~¡-ÿ]*$')::TEXT)
Or
regexp_like((val)::TEXT ,('^[^[:ascii:]]+$')::text);
The above query also accepts special characters it should not be.
SELECT (('#$$')::TEXT ~ ('^[a-zA-Z0-9]*$'));
This query satisfied but fails to accept chinese character.
答案1
得分: 0
你可以使用中文字符的Unicode值
SELECT (('#$$')::TEXT ~ ('^[a-zA-Z0-9\x4e00-\x9fff\x3400-\x4dbf]*$'));
英文:
You can use the unicode values of the chinese characters
SELECT (('#$$')::TEXT ~ ('^[a-zA-Z0-9\x4e00-\x9fff\x3400-\x4dbf]*$'));
答案2
得分: 0
根据 Wikipedia,中文字符位于以下Unicode范围内,从U+4E00到U+9FFF。维基百科 – CJK统一表意文字。
另外,还有 扩展A到H。
CJK统一表意文字扩展A,U+3400到U+4DBF。
CJK统一表意文字扩展B,U+20000到U+2A6DF。
CJK统一表意文字扩展C,U+2A700到U+2B73F。
CJK统一表意文字扩展D,U+2B740到U+2B81F。
CJK统一表意文字扩展E,U+2B820到U+2CEAF。
CJK统一表意文字扩展F,U+2CEB0到U+2EBEF。
CJK统一表意文字扩展G,U+30000到U+3134F。
CJK统一表意文字扩展H,U+31350到U+323AF。
因此,您可以将Unicode范围添加到您的字符类,如下所示。
> "1.接受中文字符"
[\u4e00-\u9fff]
> "2.接受 - a-z,A-Z,0-9,空格"
(?i),将切换为 不区分大小写 模式。
(?i)[a-z\d \u4e00-\u9fff]
> "3.仅拒绝特殊字符"
我想您提供的值是您希望拒绝的字符。
对于提供的范围,从_!到’,您希望跳过数字字符_0到9_和大写字母_从A到Z。
因此,需要更改为以下内容。
[^!-/:-@\[-`~¡-ÿ]
然后,您可以使用 字符类交集 语法 && 将此字符类添加到先前的字符类中。
因此,完整的模式将如下所示。
(?i)^[a-z\d \u4e00-\u9fff&&[^!-/:-@\[-`~¡-ÿ]]*$
英文:
According to Wikipedia, the Chinese characters are within the following Unicode range, U+4E00, through U+9FFF. Wikipedia – CJK Unified Ideographs.
Additionally, there are Extensions A through H.
CJK Unified Ideographs Extension A, U+3400 through U+4DBF.
CJK Unified Ideographs Extension B, U+20000 through U+2A6DF.
CJK Unified Ideographs Extension C, U+2A700 through U+2B73F.
CJK Unified Ideographs Extension D, U+2B740 though U+2B81F.
CJK Unified Ideographs Extension E, U+2B820 though U+2CEAF.
CJK Unified Ideographs Extension F, U+2CEB0 through U+2EBEF.
CJK Unified Ideographs Extension G, U+30000 through U+3134F.
CJK Unified Ideographs Extension H, U+31350 through U+323AF.
So, you can add a Unicode range to your character class, as follows.
> "1.Accepts chinese characters"
[\u4e00-\u9fff]
> "2.Accepts - a-z,A-Z,0-9,spaces"
The (?i), will toggle-on case-insensitive mode.
(?i)[a-z\d \u4e00-\u9fff]
> "3.Rejects only special characters."
I imagine the values you provided, are the characters you wish to reject.
For the provided range, ! through ’, you want to skip over the digit characters, 0 through 9, and the uppercase letters, A through Z.
So, that will need to be changed to the following.
[^!-/:-@\[-`~¡-ÿ]
You can then add this character class to the previous, using the character class intersection syntax, &&.
So, the complete pattern would be the following.
(?i)^[a-z\d \u4e00-\u9fff&&[^!-/:-@\[-`~¡-ÿ]]*$
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论