Regex(Python):匹配不以字符前导的整数

huangapple go评论63阅读模式
英文:

Regex (Python): Matching Integers not Preceded by Character

问题

Based on some string of numbers:

(30123:424302) 123 #4324:#34123

如何获取仅不紧随在“#”之前的数字?我已经找到了如何获取紧随在“#”之前的数字(\#+\d+),但我需要相反的情况。我能够将所有\d+分组,然后根据我的模式进行反向匹配吗?

澄清一下,我需要上面示例中的 30123424302123

英文:

Based on some string of numbers:

(30123:424302) 123 #4324:#34123

How can I obtain only the numbers that are NOT immediately preceded by "#"? I have found how to get those numbers preceded by "#" (\#+\d+) but I need the opposite. Can I group all \d+ and then inverse match based on the pattern I have somehow?

To clarify, I need 30123, 424302, and 123 in the above example.

答案1

得分: 4

你可以尝试这个正则表达式,它使用了负向后行断言和单词边界:

(?<!#)\b\d+

正则表达式演示

正则表达式详细信息:

  • (?<!#): 负向后行断言条件,当前置位置出现 # 时不匹配

  • \b 单词边界

  • \d+: 匹配 1 个或多个数字

英文:

You may try this regex with a negative lookbehind + word boundary:

(?&lt;!#)\b\d+

RegEx Demo

RegEx Details:

  • (?&lt;!#): A negative lookbehind condition to fail the match when # appears on preceding position
  • \b Word boundary
  • \d+: Match 1+ digits

答案2

得分: 1

以下是您要翻译的内容:

"你需要

(?&lt;![#\d])\d+

请参阅正则表达式演示

模式详细信息

  • (?&lt;![#\d]) - 一个负向回顾断言,如果当前位置之前有数字或#字符,匹配将失败
  • \d+ - 一个或多个数字。

请参阅Python演示

import re
text = &quot;(30123:424302) 123 #4324:#34123&quot;
print(re.findall(r&quot;(?&lt;![#\d])\d+&quot;, text))
# =&gt; [&#39;30123&#39;, &#39;424302&#39;, &#39;123&#39;]

如果您需要以最初的方式“反转”某些内容,您可以匹配您不想要的内容,然后匹配并捕获您想要的内容,在收集匹配后,从结果列表中删除所有空值:

import re
text = &quot;(30123:424302) 123 #4324:#34123&quot;
print(list(filter(None, re.findall(r&quot;#\d+|(\d+)&quot;, text))))

请参阅此Python演示

正如您所见,#\d+会消耗#后面的所有数字(即在不希望的上下文中),而(\d+)则提取了正确的值。

英文:

You need

(?&lt;![#\d])\d+

See the regex demo.

Pattern details

  • (?&lt;![#\d]) - a negative lookbehind that fails the match if there is a digit or a # char immediately before the current position
  • \d+ - one or more digits.

See the Python demo:

import re
text = &quot;(30123:424302) 123 #4324:#34123&quot;
print(re.findall(r&quot;(?&lt;![#\d])\d+&quot;, text))
# =&gt; [&#39;30123&#39;, &#39;424302&#39;, &#39;123&#39;]

And if you need to "reverse" something the way you originally thought of, you can match what you do not want, and then match and capture what you want, and after collecting the matches, remove all empty values from the resulting list:

import re
text = &quot;(30123:424302) 123 #4324:#34123&quot;
print(list(filter(None, re.findall(r&quot;#\d+|(\d+)&quot;, text))))

See this Python demo.

As you can see, #\d+ consumed all digits after the # (i.e. in the undesired context) and (\d+) fetched the right values.

huangapple
  • 本文由 发表于 2023年4月20日 04:19:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76058513.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定