2016年2月4日 12:50:23go评论84阅读模式

英文:

Golang regexp Boundary with Latin Character

问题

我有一个关于 Golang 正则表达式的小问题。
似乎 \b 边界选项不起作用
当我输入类似这样的拉丁字符时。

我期望 é 应该被视为普通字符..
但它被视为边界词之一。

package main

import (
	"fmt"
	"regexp"
)

func main() {	
	r, _ := regexp.Compile(`\b(vis)\b`)
	fmt.Println(r.MatchString("re vis e"))
	fmt.Println(r.MatchString("revise"))
	fmt.Println(r.MatchString("r&#233;vis&#233;"))
}

结果是:

true 
false 
true

请给我任何建议，如何将 r.MatchString("révisé") 处理为 false？

谢谢

英文:

I have a small tricky issue about golang regex.
seems \b boundering option doesn't work
when I put latein chars like this.

I expected that é should be treated as a regular char..
but it's treated as one of boundering wards.

package main

import (
	&quot;fmt&quot;
	&quot;regexp&quot;
)

func main() {	
	r, _ := regexp.Compile(`\b(vis)\b`)
	fmt.Println(r.MatchString(&quot;re vis e&quot;))
	fmt.Println(r.MatchString(&quot;revise&quot;))
	fmt.Println(r.MatchString(&quot;r&#233;vis&#233;&quot;))
}

result was:

true 
false 
true

Please give me any suggestion how to deal with r.MatchString("révisé") as false ?

Thank you

答案1

得分: 6

问题在于\b只适用于ASCII字符周围的边界，正如文档中所述：

> 在ASCII单词边界上（\w在一侧，\W、\A或\z在另一侧）

而é不是ASCII字符。但是，你可以通过组合其他正则表达式快捷方式来创建自己的\b替代方案。这里有一个简单的解决方案，可以解决问题中提到的情况，但你可能希望添加更全面的匹配：

package main

import (
    "fmt"
    "regexp"
)

func main() {   
    r, _ := regexp.Compile(`(?:\A|\s)(vis)(?:\s|\z)`)
    fmt.Println(r.MatchString("vis")) // 添加了这个案例
    fmt.Println(r.MatchString("re vis e"))
    fmt.Println(r.MatchString("revise"))
    fmt.Println(r.MatchString("révisé"))
}

运行此代码得到的结果是：

true
true
false
false

这个解决方案的作用是将\b替换为(?:\A|\z|\s)，意思是“一个非捕获组，其中包含以下之一：字符串的开头、字符串的结尾或空白字符”。你可能希望在这里添加其他可能性，比如标点符号。

英文:

The issue is that \b is only for boundaries around ASCII characters, as stated in the docs:

> at ASCII word boundary (\w on one side and \W, \A, or \z on the other)

And é is not ASCII. But, you can make your own \b replacement by combining other regex shortcuts. Here is a simple solution that solves the case given in the question, though you may want to add more thorough matching:

package main

import (
    &quot;fmt&quot;
    &quot;regexp&quot;
)

func main() {   
    r, _ := regexp.Compile(`(?:\A|\s)(vis)(?:\s|\z)`)
    fmt.Println(r.MatchString(&quot;vis&quot;)) // added this case
    fmt.Println(r.MatchString(&quot;re vis e&quot;))
    fmt.Println(r.MatchString(&quot;revise&quot;))
    fmt.Println(r.MatchString(&quot;r&#233;vis&#233;&quot;))
}

Running this gives:

true
true
false
false

What this solution does is essentially replace \b with (?:\A|\z|\s), which means "a non-capturing group with one of the following: start of string, end of string or whitespace". You may want to add other possibilities here, like punctuation.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Golang正则表达式中的边界与拉丁字符

问题

答案1

如何执行需要用户输入的命令

如何在go中使用LevelDB？

可以把一个频道保持开放吗？

在Golang中对长时间运行的MSSQL事务的错误连接响应

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论