匹配正则表达式符号并忽略确切字符串

huangapple go评论54阅读模式
英文:

Regex match symbol and ignore exact string

问题

(?<!&)#(?!8203;)

这将允许捕获大部分的'#'在我的情况下。

例如,对于输入he#ll#o,将会有2个符合预期的匹配。

同样,对于输入he#ll#o​,将会有2个符合预期的匹配。

然而,对于输入&#&#&#或者#8203;#8203;#8203;,它将无法找到匹配项。

我应该如何修改现有的正则表达式以忽略完全匹配'​',考虑到前面的文本可能不是前一个单词或空白的结尾?

英文:

Currently I have regex like so:

(?<!&)#(?!8203;)

This will allow the capturing of most '#' for my case.

For instance, given the input he#ll#o, there would be 2 matches as expected.

Again, given the input he#ll#o​, there would be 2 matches as expected.

However, given the input &#&#&# or just #8203;#8203;#8203;, it will fail to find matches.

How do I modify the existing regular expression to ignore exactly '​', given that the preceding text may not be the end of a previous word or whitespace?

答案1

得分: 2

你可以调整前后查找为

#(?<!&#(?=8203;))

查看正则表达式演示

详情

  • # - 一个 # 字符
  • (?<!&#(?=8203;)) - 一个负回顾,如果左边紧跟着一个 &# 字符序列,后面紧跟着 8203; 字符序列,则匹配失败。

一个类似的正则表达式如下

(?<!&(?=#8203;))#

查看这个正则表达式演示。我会使用 #(?<!&#(?=8203;)),因为只有在找到 # 字符后才会触发回顾检查,而且查找静态字符比在字符串的每个位置检查回顾模式更容易(就像第二个正则表达式的情况一样)。

英文:

You can adjust the lookarounds to

#(?<!&#(?=8203;))

See the regex demo.

Details:

  • # - a # char
  • (?<!&#(?=8203;)) - a negative lookbehind that fails the match if - immediately on the left - there is a &# char sequence that is immediately followed with 8203; char sequence.

A synonymous regex will look like

(?<!&(?=#8203;))#

See this regex demo. I'd use #(?<!&#(?=8203;)) since the lookbehind check is only triggered once the # char is found, and it is easier to look for a static char than to check for the lookbehind pattern at each location in the string (as is the case with the second regex).

huangapple
  • 本文由 发表于 2023年5月11日 01:35:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/76221196.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定