Regex (?J) mode modifier in Python Regex or equival ability for named capture group from different patterns

huangapple go评论159阅读模式
英文:

Regex (?J) mode modifier in Python Regex or equival ability for named capture group from different patterns

问题

我正在尝试使用命名捕获组从两个不同的模式序列中捕获。这个SO问题在PCRE中使用模式修饰符 (?J) 解决了这个问题,而这个SO问题解决了Python中一个相关的问题,但我尚未成功地应用到我的用例中。

示例测试字符串:

abc-CAPTUREME-xyz-abcdef
abc-xyz-CAPTUREME-abcdef

期望的输出:

CAPTUREME
CAPTUREME

CAPTUREME 可以出现在 xyz 序列的左侧或右侧。我最初失败的正则表达式尝试如下:

r'abc-(xyz-(?P<cap>\w+)|(?P<cap>\w+)-xyz)-abcdef'

但在Python正则表达式中,这会产生一个错误 (?P<cap> 子模式名称必须唯一),并且Python不支持在上面的第一个答案中使用的 (?J) 修饰符来解决这个问题。

使用单个捕获组,我可以捕获 CAPTUREME-xyzxyz-CAPTUREME,但我无法使用回顾先行来复制上面链接的第2个stackoverflow文章中的示例。每次尝试复制第2个stackoverflow文章都无法匹配我的字符串,而且差异太多,我无法理解发生了什么。

r'abc-(?P<cap>(xyz-)\w+|\w+(-xyz))-abcdef'
英文:

I am trying to capture from two different pattern sequences using a named capture group. This SO question solves the problem in PCRE using the mode modifier (?J), and this SO question solves a related problem in Python that I haven't succeeded at applying to my use case.

Example test strings:

abc-CAPTUREME-xyz-abcdef
abc-xyz-CAPTUREME-abcdef

Desired output:

CAPTUREME
CAPTUREME

CAPTUREME appears on either the left or right of the xyz sequence. My initial failed attempt at a regex looked like this:

r&#39;abc-(xyz-(?P&lt;cap&gt;\w+)|(?P&lt;cap&gt;\w+)-xyz)-abcdef&#39;

But in Python regexes that yields an error (?P&lt;cap&gt; A subpattern name must be unique) and python doesn't support the (?J) modifier that was used in the first answer above to solve the problem.

With a single capture group I can capture CAPTUREME-xyz or xyz-CAPTUREME, but I can't reproduce the example in the 2nd stack overflow article linked above using lookarounds. Every attempt to replicate the 2nd stack overflow article simply doesn't match my string and there are too many differences for me to piece together what's happening.

r&#39;abc-(?P&lt;cap&gt;(xyz-)\w+|\w+(-xyz))-abcdef&#39;

https://regex101.com/r/NeWrDe/1

答案1

得分: 1

在第二篇文章中,您可以将模式写成:

(?P<cap>(?<=abc-xyz-)\w+|\w+(?=-xyz-abcdef))

解释

  • (?P<cap> 命名组 cap
    • (?<=abc-xyz-)\w+ 匹配 1 个或多个单词字符,前面有 abc-xyz-
    • |
    • \w+(?=-xyz-abcdef) 匹配 1 个或多个单词字符,后面有 -xyz-abcdef
  • ) 结束组 cap

正则表达式演示


在 Python 中的另一个选项可以是使用条件语句和捕获组:

abc-(xyz-)?(?P<cap>\w+)-(?(1)|xyz-)abcdef

解释

  • abc-(xyz-)? 匹配 abc- 并可选择在组 1 中捕获 xyz-
  • (?P<cap>\w+) 命名组 cap,匹配 1 个或多个单词字符
  • - 字面匹配
  • (?(1)|xyz-) 如果组 1 不存在,匹配 xyz-
  • abcdef 字面匹配

正则表达式演示

英文:

Looking at the second article, you could write the pattern as:

(?P&lt;cap&gt;(?&lt;=abc-xyz-)\w+|\w+(?=-xyz-abcdef))

Explanation

  • (?P&lt;cap&gt; Named group cap
    • (?&lt;=abc-xyz-)\w+ Match 1+ word characters, asserting abc-xyz- to the left
    • | Or
    • \w+(?=-xyz-abcdef) Match 1+ word characters, asserting -xyz-abcdef to the right
  • ) Close group cap

Regex demo

<hr>

Another option in Python could be using a conditional and a capture group:

abc-(xyz-)?(?P&lt;cap&gt;\w+)-(?(1)|xyz-)abcdef

Explanation

  • abc-(xyz-)? Match abc- and optionally capture xyz- in group 1
  • (?P&lt;cap&gt;\w+) Named group cap, match 1+ word characters
  • - Match literally
  • (?(1)|xyz-) If group 1 is not present, match xyz-
  • abcdef Match literally

Regex demo

huangapple
  • 本文由 发表于 2023年3月4日 07:47:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/75632754.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定