英文:
Regex (Python): Matching Integers not Preceded by Character
问题
Based on some string of numbers:
(30123:424302) 123 #4324:#34123
如何获取仅不紧随在“#”之前的数字?我已经找到了如何获取紧随在“#”之前的数字(\#+\d+
),但我需要相反的情况。我能够将所有\d+
分组,然后根据我的模式进行反向匹配吗?
澄清一下,我需要上面示例中的 30123
,424302
和 123
。
英文:
Based on some string of numbers:
(30123:424302) 123 #4324:#34123
How can I obtain only the numbers that are NOT immediately preceded by "#"? I have found how to get those numbers preceded by "#" (\#+\d+
) but I need the opposite. Can I group all \d+
and then inverse match based on the pattern I have somehow?
To clarify, I need 30123
, 424302
, and 123
in the above example.
答案1
得分: 4
你可以尝试这个正则表达式,它使用了负向后行断言和单词边界:
(?<!#)\b\d+
正则表达式详细信息:
-
(?<!#)
: 负向后行断言条件,当前置位置出现#
时不匹配 -
\b
单词边界 -
\d+
: 匹配 1 个或多个数字
英文:
You may try this regex with a negative lookbehind + word boundary:
(?<!#)\b\d+
RegEx Details:
(?<!#)
: A negative lookbehind condition to fail the match when#
appears on preceding position\b
Word boundary\d+
: Match 1+ digits
答案2
得分: 1
以下是您要翻译的内容:
"你需要
(?<![#\d])\d+
请参阅正则表达式演示。
模式详细信息
(?<![#\d])
- 一个负向回顾断言,如果当前位置之前有数字或#
字符,匹配将失败\d+
- 一个或多个数字。
请参阅Python演示:
import re
text = "(30123:424302) 123 #4324:#34123"
print(re.findall(r"(?<![#\d])\d+", text))
# => ['30123', '424302', '123']
如果您需要以最初的方式“反转”某些内容,您可以匹配您不想要的内容,然后匹配并捕获您想要的内容,在收集匹配后,从结果列表中删除所有空值:
import re
text = "(30123:424302) 123 #4324:#34123"
print(list(filter(None, re.findall(r"#\d+|(\d+)", text))))
请参阅此Python演示。
正如您所见,#\d+
会消耗#
后面的所有数字(即在不希望的上下文中),而(\d+)
则提取了正确的值。
英文:
You need
(?<![#\d])\d+
See the regex demo.
Pattern details
(?<![#\d])
- a negative lookbehind that fails the match if there is a digit or a#
char immediately before the current position\d+
- one or more digits.
See the Python demo:
import re
text = "(30123:424302) 123 #4324:#34123"
print(re.findall(r"(?<![#\d])\d+", text))
# => ['30123', '424302', '123']
And if you need to "reverse" something the way you originally thought of, you can match what you do not want, and then match and capture what you want, and after collecting the matches, remove all empty values from the resulting list:
import re
text = "(30123:424302) 123 #4324:#34123"
print(list(filter(None, re.findall(r"#\d+|(\d+)", text))))
See this Python demo.
As you can see, #\d+
consumed all digits after the #
(i.e. in the undesired context) and (\d+)
fetched the right values.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论