英文:
Regex (?J) mode modifier in Python Regex or equival ability for named capture group from different patterns
问题
我正在尝试使用命名捕获组从两个不同的模式序列中捕获。这个SO问题在PCRE中使用模式修饰符 (?J)
解决了这个问题,而这个SO问题解决了Python中一个相关的问题,但我尚未成功地应用到我的用例中。
示例测试字符串:
abc-CAPTUREME-xyz-abcdef
abc-xyz-CAPTUREME-abcdef
期望的输出:
CAPTUREME
CAPTUREME
CAPTUREME
可以出现在 xyz
序列的左侧或右侧。我最初失败的正则表达式尝试如下:
r'abc-(xyz-(?P<cap>\w+)|(?P<cap>\w+)-xyz)-abcdef'
但在Python正则表达式中,这会产生一个错误 (?P<cap> 子模式名称必须唯一)
,并且Python不支持在上面的第一个答案中使用的 (?J)
修饰符来解决这个问题。
使用单个捕获组,我可以捕获 CAPTUREME-xyz
或 xyz-CAPTUREME
,但我无法使用回顾先行来复制上面链接的第2个stackoverflow文章中的示例。每次尝试复制第2个stackoverflow文章都无法匹配我的字符串,而且差异太多,我无法理解发生了什么。
r'abc-(?P<cap>(xyz-)\w+|\w+(-xyz))-abcdef'
英文:
I am trying to capture from two different pattern sequences using a named capture group. This SO question solves the problem in PCRE using the mode modifier (?J)
, and this SO question solves a related problem in Python that I haven't succeeded at applying to my use case.
Example test strings:
abc-CAPTUREME-xyz-abcdef
abc-xyz-CAPTUREME-abcdef
Desired output:
CAPTUREME
CAPTUREME
CAPTUREME
appears on either the left or right of the xyz
sequence. My initial failed attempt at a regex looked like this:
r'abc-(xyz-(?P<cap>\w+)|(?P<cap>\w+)-xyz)-abcdef'
But in Python regexes that yields an error (?P<cap> A subpattern name must be unique)
and python doesn't support the (?J)
modifier that was used in the first answer above to solve the problem.
With a single capture group I can capture CAPTUREME-xyz
or xyz-CAPTUREME
, but I can't reproduce the example in the 2nd stack overflow article linked above using lookarounds. Every attempt to replicate the 2nd stack overflow article simply doesn't match my string and there are too many differences for me to piece together what's happening.
r'abc-(?P<cap>(xyz-)\w+|\w+(-xyz))-abcdef'
答案1
得分: 1
在第二篇文章中,您可以将模式写成:
(?P<cap>(?<=abc-xyz-)\w+|\w+(?=-xyz-abcdef))
解释
(?P<cap>
命名组 cap(?<=abc-xyz-)\w+
匹配 1 个或多个单词字符,前面有 abc-xyz-|
或\w+(?=-xyz-abcdef)
匹配 1 个或多个单词字符,后面有 -xyz-abcdef
)
结束组 cap
在 Python 中的另一个选项可以是使用条件语句和捕获组:
abc-(xyz-)?(?P<cap>\w+)-(?(1)|xyz-)abcdef
解释
abc-(xyz-)?
匹配abc-
并可选择在组 1 中捕获xyz-
(?P<cap>\w+)
命名组 cap,匹配 1 个或多个单词字符-
字面匹配(?(1)|xyz-)
如果组 1 不存在,匹配xyz-
abcdef
字面匹配
英文:
Looking at the second article, you could write the pattern as:
(?P<cap>(?<=abc-xyz-)\w+|\w+(?=-xyz-abcdef))
Explanation
(?P<cap>
Named group cap(?<=abc-xyz-)\w+
Match 1+ word characters, asserting abc-xyz- to the left|
Or\w+(?=-xyz-abcdef)
Match 1+ word characters, asserting -xyz-abcdef to the right
)
Close group cap
<hr>
Another option in Python could be using a conditional and a capture group:
abc-(xyz-)?(?P<cap>\w+)-(?(1)|xyz-)abcdef
Explanation
abc-(xyz-)?
Matchabc-
and optionally capturexyz-
in group 1(?P<cap>\w+)
Named group cap, match 1+ word characters-
Match literally(?(1)|xyz-)
If group 1 is not present, matchxyz-
abcdef
Match literally
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论