英文:
Match multiple regex groups starting after a specific word/pattern within the text
问题
我试图匹配特定模式(或在这种情况下是一个单词)之后的所有百分比(例如20%):
Lorem ipsum dolor 10% sit amet, consectetur adipiscing elit. Morbi et
feugiat Discount vitae 15% urna. Sed 20% et lorem in dapibus.
Mauris arcu dui, vestibulum eget eros eu, eleifend luctus risus。
我想匹配15%和20%,但不想匹配10%。它应该通过确保匹配的百分比出现在单词“Discount”之后来确定这一点。
这是我想出的模式,但似乎匹配所有百分比:
(?<=Discount)*(\d+%)+
这将使用C# / .NET正则表达式引擎。
英文:
I'm trying to match all instances of a percentage (e.g. 20%) AFTER a specific pattern (or in this case a word):
Lorem ipsum dolor 10% sit amet, consectetur adipiscing elit. Morbi et
feugiat Discount vitae 15% urna. Sed 20% et lorem in dapibus.
Mauris arcu dui, vestibulum eget eros eu, eleifend luctus risus.
I want to match the 15% and 20%, but not the 10%. It should determine this by making sure the percentages it's matching occur after the word Discount
appears.
This is the pattern I came up with but it seems to match all percentages:
(?<=Discount)*(\d+%)+
This would using the C# / .NET regex engine.
答案1
得分: 3
在模式中,(?<=Discount)*(\d+%)+
部分,你可选择性地重复了一个先行断言,该断言仅在当前位置的直接左侧断言单词“Discount”,所以 0 次也足够,你将匹配所有 (\d+%)+
的出现。
如果你只想获取一个值,你不需要使用捕获组,因为模式 (\d+%)+
重复匹配 1 次或多次数字和百分号。
要仅获取一个值,你可以像这样编写模式,并使用单词边界来防止部分单词匹配:
(?<=\bDiscount\b.*)\b\d+%
该模式匹配:
(?<=
正向先行断言\bDiscount\b.*
匹配单词“Discount”,后跟 0 次或多次任何字符(除换行符外,因为在“Discount”和\d+%
模式之间还有其他字符)
)
关闭先行断言\b
单词边界\d+%
匹配 1 次或多次任何数字和“%”
在 .NET 中,你还可以利用 Group.Captures 属性使用重复捕获组:
\bDiscount\b(?:.*?(\b\d+%))+
英文:
In the pattern (?<=Discount)*(\d+%)+
you are optionally repeating a lookbehind assertion that only asserts the word "Discount" directly to the left of the current position, so 0 times would also suffice and you will match all occurrences of (\d+%)+
If you want a value only you don't need a capture group, as this pattern (\d+%)+
repeats 1+ times 1+ digits and %
To get a value only, you could write the pattern like this and use word boundaries to prevent partial word matches:
(?<=\bDiscount\b.*)\b\d+%
The pattern matches:
(?<=
Postive lookbehind assertion\bDiscount\b.*
Match the word "Discount" followed by 0+ times any character except newlines (as there are other characters in between "Discount" and the\d+%
pattern)
)
Close the lookbehind\b
A word boundary\d+%
Match 1+ times any digit and%
<hr>
In .NET you could also make use of repeating capture group using the Group.Captures Property
\bDiscount\b(?:.*?(\b\d+%))+
答案2
得分: 1
不要翻译的内容已经移除,以下是翻译好的部分:
与 Bird #4 使用的(可变长度)正向后瞻不同,您可以使用(可变长度)负向前瞻:
\b\d+%(?!.*\bDiscount\b)
正则表达式可以分解如下:
\b # 匹配词边界
\d+% # 匹配一个或多个(+)数字(`\d`)后跟 '%'
(?! # 开始负向前瞻
.* # 匹配零个或多个 (*) 除换行符以外的字符
\b # 匹配词边界
Discount # 匹配 'Discount'
\b # 匹配词边界
) # 结束负向前瞻
请注意,C++ 是相对较少支持可变长度(正向和负向)后瞻的语言之一。大多数主流语言的正则表达式引擎支持可变长度(正向和负向)前瞻,但不支持可变长度后瞻。这包括 PHP、Perl、Python(标准正则表达式引擎)、R、Ruby 和 Java。要点是,如果考虑到代码可能从 C++ 迁移到不同的语言,前瞻解决方案可能更合适。
我不能确定在这里负向前瞻是否比正向后瞻更有效。
英文:
Rather than using a (variable-length) positive lookbehind, as Bird #4 has done, you could use a (variable-length) negative lookahead:
\b\d+%(?!.*\bDiscount\b)
The regular expression can be broken down as follows.
\b # match a word boundary
\d+% # match one or more (+) digits (`\d`) followed by '%'
(?! # begin a negative lookahead
.* # match zero or more (*) characters other than line terminators
\b # match a word boundary
Discount # match 'Discount'
\b # match a word boundary
) # end the negative lookahead
Note that C++ is one of the relatively few languages that support variable-length (positive and negative) lookbehinds. Most mainstream languages have regex engines that support variable-length (positive and negative) lookaheads but not variable-length lookbehinds. That includes PHP, Perl, Python (standard regex engine), R, Ruby and Java. The upshot is that the lookahead solution would be advised if it were thought that the code might be ported from C++ to a different language.
I cannot say whether a negative lookahead would tend to be more efficient than a positive lookbehind here.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论