匹配重复字符的正则表达式是:/(.)\1+/g

huangapple go评论85阅读模式
英文:

Regex to match repeated characters

问题

我正在尝试创建一个正则表达式,如果一个字符串中有3个或更多连续重复的字符(例如 aaaaaa,testtttttt,otttttter),则匹配该字符串。

我尝试了以下正则表达式:

regexp.Compile("[A-Za-z0-9]{3,}")
regexp.Compile("(.){3,}")
regexp.Compile("(.)\{3,}")

这些正则表达式可以匹配任意连续的3个字符,但不能匹配连续的字符...我错在哪里?

英文:

I am trying to create a regex that matches a string if it has a 3 or more repetitive characters in a row (e.g. aaaaaa, testtttttt, otttttter).

I have tried the following:

regexp.Compile("[A-Za-z0-9]{3,}")
regexp.Compile("(.){3,}")
regexp.Compile("(.)\{3,}")

which matches any 3 characters in a row, but not consecutive characters... Where am I going wrong?

答案1

得分: 8

你所要求的无法通过真正的正则表达式来实现,你需要的是(不规则的)反向引用。虽然许多正则表达式引擎都实现了它们,但Go使用的RE2引擎却没有。RE2是一个快速的正则表达式引擎,保证了线性时间的字符串处理,但目前还没有已知的方法可以以如此高效的方式实现反向引用。(请参阅https://swtch.com/~rsc/regexp/获取更多信息。)

为了解决你的问题,你可能需要搜索其他正则表达式库。我相信可以找到PCRE的绑定,但我个人没有使用过它们。

另一种方法是在不使用(不规则的)正则表达式的情况下手动解析字符串。

英文:

What you're asking for cannot be done with true regular expressions, what you need are (irregular) backreferences. While many regexp engines implement them, RE2 used by Go does not. RE2 is a fast regexp engine that guarantees linear time string processing, but there's no known way to implement backreferences with such efficiency. (See https://swtch.com/~rsc/regexp/ for further information.)

To solve your problem you may want to search for some other regexp library. I believe bindings for PCRE can be found, but I've no personal experience from them.

Another approach would be to parse the string manually without using (ir)regular expressions.

答案2

得分: 3

这是一个丑陋的解决方案,你可以自动生成它:

A{3,}|B{3,}|...|Z{3,}|a{3,}|b{3,}|...|z{3,}|0{3,}|1{3,}|...|9{3,}
英文:

Here is the ugly solution, you could automatically generate it:

A{3,}|B{3,}|...|Z{3,}|a{3,}|b{3,}|...|z{3,}|0{3,}|1{3,}|...|9{3,}

答案3

得分: 3

由于所述的问题,我最终选择了以下非正则表达式解决方案:

norm := "this it a ttttt"
repeatCount := 1
thresh := 3
lastChar := ""
for _, r := range norm {
    c := string(r)
    if c == lastChar {
        repeatCount++
        if repeatCount == thresh {
            break
        }
    } else {
        repeatCount = 1
    }
    lastChar = c
}

请注意,这是一个Go语言的代码示例。

英文:

Due to the problems stated, I eventually settled on the following non-regex solution:

norm = "this it a ttttt"
repeatCount := 1
thresh := 3
lastChar := ""
for _, r := range norm {
	c := string(r)
	if c == lastChar {
		repeatCount++
		if repeatCount == thresh {
			break
		}
	} else {
		repeatCount = 1
	}
	lastChar = c
}

huangapple
  • 本文由 发表于 2016年3月2日 08:37:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/35736368.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定