英文:
Regex match symbol and ignore exact string
问题
(?<!&)#(?!8203;)
这将允许捕获大部分的'#'在我的情况下。
例如,对于输入he#ll#o,将会有2个符合预期的匹配。
同样,对于输入he#ll#o&#8203;,将会有2个符合预期的匹配。
然而,对于输入&#&#&#或者#8203;#8203;#8203;,它将无法找到匹配项。
我应该如何修改现有的正则表达式以忽略完全匹配'&#8203;',考虑到前面的文本可能不是前一个单词或空白的结尾?
英文:
Currently I have regex like so:
(?<!&)#(?!8203;)
This will allow the capturing of most '#' for my case.
For instance, given the input he#ll#o, there would be 2 matches as expected.
Again, given the input he#ll#o&#8203;, there would be 2 matches as expected.
However, given the input &#&#&# or just #8203;#8203;#8203;, it will fail to find matches.
How do I modify the existing regular expression to ignore exactly '&#8203;', given that the preceding text may not be the end of a previous word or whitespace?
答案1
得分: 2
你可以调整前后查找为
#(?<!&#(?=8203;))
查看正则表达式演示。
详情:
#- 一个#字符(?<!&#(?=8203;))- 一个负回顾,如果左边紧跟着一个&#字符序列,后面紧跟着8203;字符序列,则匹配失败。
一个类似的正则表达式如下
(?<!&(?=#8203;))#
查看这个正则表达式演示。我会使用 #(?<!&#(?=8203;)),因为只有在找到 # 字符后才会触发回顾检查,而且查找静态字符比在字符串的每个位置检查回顾模式更容易(就像第二个正则表达式的情况一样)。
英文:
You can adjust the lookarounds to
#(?<!&#(?=8203;))
See the regex demo.
Details:
#- a#char(?<!&#(?=8203;))- a negative lookbehind that fails the match if - immediately on the left - there is a&#char sequence that is immediately followed with8203;char sequence.
A synonymous regex will look like
(?<!&(?=#8203;))#
See this regex demo. I'd use #(?<!&#(?=8203;)) since the lookbehind check is only triggered once the # char is found, and it is easier to look for a static char than to check for the lookbehind pattern at each location in the string (as is the case with the second regex).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论