使用Go编程语言查找命名捕获组的正则表达式

huangapple go评论79阅读模式
英文:

Regex to find named capturing groups with Go programming language

问题

我正在寻找一个正则表达式来查找(其他)正则表达式字符串中的命名捕获组。

例如:我想在以下正则表达式中找到(?P<country>m((a|b).+)n), (?P<city>.+)(?P<street>(5|6)\. .+)

/(?P<country>m((a|b).+)n)/(?P<city>.+)/(?P<street>(5|6)\. .+)

我尝试了以下正则表达式来查找命名捕获组:

var subGroups string = `(\(.+\))*?`
var prefixedSubGroups string = `.+` + subGroups
var postfixedSubGroups string = subGroups + `.+`
var surroundedSubGroups string = `.+` + subGroups + `.+`
var capturingGroupNameRegex *regexp.RichRegexp = regexp.MustCompile(
	`(?U)` + 
	`\(\?P<.+>` + 
	`(` +	prefixedSubGroups + `|` + postfixedSubGroups + `|` + surroundedSubGroups + `)` + 
	`\)`) 

?U使贪婪量词(+*)变为非贪婪,非贪婪量词(*?)变为贪婪。详细信息请参阅Go正则表达式文档

但它不起作用,因为括号没有正确匹配。

英文:

I'm looking for a regex to find named capturing groups in (other) regex strings.

Example: I want to find (?P&lt;country&gt;m((a|b).+)n), (?P&lt;city&gt;.+) and (?P&lt;street&gt;(5|6)\. .+) in the following regex:

/(?P&lt;country&gt;m((a|b).+)n)/(?P&lt;city&gt;.+)/(?P&lt;street&gt;(5|6)\. .+)

I tried the following regex to find the named capturing groups:

var subGroups string = `(\(.+\))*?`
var prefixedSubGroups string = `.+` + subGroups
var postfixedSubGroups string = subGroups + `.+`
var surroundedSubGroups string = `.+` + subGroups + `.+`
var capturingGroupNameRegex *regexp.RichRegexp = regexp.MustCompile(
	`(?U)` + 
	`\(\?P&lt;.+&gt;` + 
	`(` +	prefixedSubGroups + `|` + postfixedSubGroups + `|` + surroundedSubGroups + `)` + 
	`\)`) 

?U makes greedy quantifiers(+ and * ) non-greedy, and non-greedy quantifiers (*?) greedy. Details in the Go regex documentation.

But it doesn't work because parenthesis are not matched correctly.

答案1

得分: 7

匹配任意嵌套的括号是不可能通过正则表达式来实现的,因为任意(递归)嵌套无法被正则语言描述。

一些现代的正则表达式引擎支持递归(Perl,PCRE)或平衡匹配(.NET),但Go不是其中之一(文档明确说明,Go的正则表达式包似乎是基于RE2库,不支持Perl的(?R)构造)。你需要构建一个递归下降解析器,而不是正则表达式。

英文:

Matching arbitrarily nested parentheses correctly is not possible with regular expressions because arbitrary (recursive) nesting cannot be described by a regular language.

Some modern regex flavor do support recursion (Perl, PCRE) or balanced matching (.NET), but Go is not one of them (the docs explicitly say that Perl's (?R) construct is not supported by the RE2 library that Go's regex package appears to be based on). You need to build a recursive descent parser, not a regex.

huangapple
  • 本文由 发表于 2012年11月11日 18:22:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/13330274.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定