为什么在 regex.ReplaceAllString() 中会删除数字?

huangapple go评论86阅读模式
英文:

Why are numbers removed in a regex.ReplaceAllString()

问题

这个示例清楚地展示了我的困境。

最终,我想要将一个混乱的字符串拆分成单词。对我来说,"2015"是一个单词,"$100"也是一个单词,但如果输入是"One. 2wo, (three)",我希望得到[One 2wo three]。因为Go语言不允许使用支持Unicode的正则表达式,所以我想先删除所有的"垃圾字符",然后使用strings.Fields()函数。

问题是任何数字都被删除了:

reg := regexp.MustCompile(`[\[\](){}"?!,-:;,']`)
fmt.Println(reg.ReplaceAllString("one 1 zer0", ""))
// 输出"one  zer",而我期望得到"one 1 zer0" :(
英文:

This play clearly demonstrates my predicament.

Ultimately I'm trying to split an unruly string into words. To me "2015" is a word and so is "$100" but if the input is "One. 2wo, (three)" I want [One 2wo three]. Because go doesn't allow a Unicode aware regex I thought I'd first remove all "junk characters" and then use strings.Fields()

The problem is that any numbers are stripped:

reg := regexp.MustCompile(`[\[\](){}"?!,-:;,']`)
fmt.Println(reg.ReplaceAllString("one 1 zer0", ""))
// outputs "one  zer" when I'd expect "one 1 zer0" :(

答案1

得分: 4

[,-:] 匹配范围在 ,: 之间的所有字符。这个范围恰好包含了所有的 ASCII 数字(参见 ascii(7))。将 - 放在末尾即可:

reg := regexp.MustCompile(`[\[\](){}"?!,:;,'-]`)
英文:

[,-:] matches all characters in the range ,:. This range happens to contain all ASCII digits (see ascii(7)). Put the - at the end instead:

reg := regexp.MustCompile(`[\[\](){}"?!,:;,'-]`)

huangapple
  • 本文由 发表于 2015年3月24日 22:57:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/29235887.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定