2023年3月4日 07:47:15go评论177阅读模式

英文:

Regex (?J) mode modifier in Python Regex or equival ability for named capture group from different patterns

问题

我正在尝试使用命名捕获组从两个不同的模式序列中捕获。这个SO问题在PCRE中使用模式修饰符 (?J) 解决了这个问题，而这个SO问题解决了Python中一个相关的问题，但我尚未成功地应用到我的用例中。

示例测试字符串：

abc-CAPTUREME-xyz-abcdef
abc-xyz-CAPTUREME-abcdef

期望的输出：

CAPTUREME
CAPTUREME

CAPTUREME 可以出现在 xyz 序列的左侧或右侧。我最初失败的正则表达式尝试如下：

r'abc-(xyz-(?P<cap>\w+)|(?P<cap>\w+)-xyz)-abcdef'

但在Python正则表达式中，这会产生一个错误 (?P<cap> 子模式名称必须唯一)，并且Python不支持在上面的第一个答案中使用的 (?J) 修饰符来解决这个问题。

使用单个捕获组，我可以捕获 CAPTUREME-xyz 或 xyz-CAPTUREME，但我无法使用回顾先行来复制上面链接的第2个stackoverflow文章中的示例。每次尝试复制第2个stackoverflow文章都无法匹配我的字符串，而且差异太多，我无法理解发生了什么。

r'abc-(?P<cap>(xyz-)\w+|\w+(-xyz))-abcdef'

英文:

I am trying to capture from two different pattern sequences using a named capture group. This SO question solves the problem in PCRE using the mode modifier (?J), and this SO question solves a related problem in Python that I haven't succeeded at applying to my use case.

Example test strings:

abc-CAPTUREME-xyz-abcdef
abc-xyz-CAPTUREME-abcdef

Desired output:

CAPTUREME
CAPTUREME

CAPTUREME appears on either the left or right of the xyz sequence. My initial failed attempt at a regex looked like this:

r&#39;abc-(xyz-(?P&lt;cap&gt;\w+)|(?P&lt;cap&gt;\w+)-xyz)-abcdef&#39;

But in Python regexes that yields an error (?P<cap> A subpattern name must be unique) and python doesn't support the (?J) modifier that was used in the first answer above to solve the problem.

With a single capture group I can capture CAPTUREME-xyz or xyz-CAPTUREME, but I can't reproduce the example in the 2nd stack overflow article linked above using lookarounds. Every attempt to replicate the 2nd stack overflow article simply doesn't match my string and there are too many differences for me to piece together what's happening.

r&#39;abc-(?P&lt;cap&gt;(xyz-)\w+|\w+(-xyz))-abcdef&#39;

https://regex101.com/r/NeWrDe/1

答案1

得分: 1

在第二篇文章中，您可以将模式写成：

(?P<cap>(?<=abc-xyz-)\w+|\w+(?=-xyz-abcdef))

解释

(?P<cap> 命名组 cap
- (?<=abc-xyz-)\w+ 匹配 1 个或多个单词字符，前面有 abc-xyz-
- | 或
- \w+(?=-xyz-abcdef) 匹配 1 个或多个单词字符，后面有 -xyz-abcdef
) 结束组 cap

正则表达式演示

在 Python 中的另一个选项可以是使用条件语句和捕获组：

abc-(xyz-)?(?P<cap>\w+)-(?(1)|xyz-)abcdef

解释

abc-(xyz-)? 匹配 abc- 并可选择在组 1 中捕获 xyz-
(?P<cap>\w+) 命名组 cap，匹配 1 个或多个单词字符
- 字面匹配
(?(1)|xyz-) 如果组 1 不存在，匹配 xyz-
abcdef 字面匹配

正则表达式演示

英文:

Looking at the second article, you could write the pattern as:

(?P&lt;cap&gt;(?&lt;=abc-xyz-)\w+|\w+(?=-xyz-abcdef))

Explanation

(?P<cap> Named group cap
- (?<=abc-xyz-)\w+ Match 1+ word characters, asserting abc-xyz- to the left
- | Or
- \w+(?=-xyz-abcdef) Match 1+ word characters, asserting -xyz-abcdef to the right
) Close group cap

Regex demo

<hr>

Another option in Python could be using a conditional and a capture group:

abc-(xyz-)?(?P&lt;cap&gt;\w+)-(?(1)|xyz-)abcdef

Explanation

abc-(xyz-)? Match abc- and optionally capture xyz- in group 1
(?P<cap>\w+) Named group cap, match 1+ word characters
- Match literally
(?(1)|xyz-) If group 1 is not present, match xyz-
abcdef Match literally

Regex demo

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Regex (?J) mode modifier in Python Regex or equival ability for named capture group from different patterns

问题

答案1

std::bad_cast 使用 Python SDK for Couchbase 时出现错误。

Golang 正则表达式 – 我做错了什么？

理解 Python 字符串

Tkcalendar 配置 DateEntry 小部件

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论