正则表达式中的嵌套重复。

huangapple go评论78阅读模式
英文:

Regex with nested repetition

问题

我正在尝试在Go中创建一个正则表达式,该表达式匹配由空格分隔的最多50个单词,其中每个单词由1-32个"a"组成。我正在使用以下正则表达式:

regexp.Compile(`^(a{1,32}\s?){1,50}$`)

但是我得到了以下错误:

error parsing regexp: invalid repeat count: `{1,50}`

我注意到它可以正常工作,最多重复31次,如下所示:

r, err := regexp.Compile(`^(a{1,32}\s?){1,31}$`)

参考https://go.dev/play/p/RLnroX9-57_m

英文:

I'm trying to create a regex in Go that matches up to 50 words separated by white space where each word is 1-32 "a"s
I'm using the following regex

regexp.Compile(`^(a{1,32}\s?){1,50}$`)

and I am getting the following error

error parsing regexp: invalid repeat count: `{1,50}`

I've noticed that it does work up to 31 repetitions like so

r, err := regexp.Compile(`^(a{1,32}\s?){1,31}$`)

see https://go.dev/play/p/RLnroX9-57_m

答案1

得分: 3

Go的regexp引擎有一个限制,即顶层和任何内部重复的组合不能超过内部重复部分的1000个副本。这在re2语法规范中有记录。

在你的情况下,最多可以使用31个,因为内部32 * 外部31 = 992。32 * 32 = 1024,以及32 * 50 = 1600都超过了这个限制。

解决方法是将表达式拆分为多个部分:^(a{1,32}\s?){1,31}(a{1,32}\s?){0,19}$

英文:

Go's regexp engine has a limit where combination of top level and any inner repetitions must not exceed 1000 copies of the innermost repeated part. This is documented in re2 Syntax spec.

In your case up to 31 works because inner 32 * outer 31 = 992. 32 * 32 = 1024 and also 32 * 50 = 1600 won't work for exceeding that limit.

Workaround is to split expression into multiple parts: ^(a{1,32}\s?){1,31}(a{1,32}\s?){0,19}$

huangapple
  • 本文由 发表于 2023年1月26日 14:30:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/75242825.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定