2015年5月27日 03:51:16go评论117阅读模式

英文:

regexp.FindSubmatch with hex character codes

问题

我无法在某些简单情况下使用regexp.FindSubmatch。例如，以下代码可以正常工作：

assigned := regexp.MustCompile(`\x7f`)
group := assigned.FindSubmatch([]byte{0x7f})
fmt.Println(group)

（在playground中打印的是[[127]]）

但是，如果我将字节更改为0x80，它就无法正常工作。为什么呢？

英文:

I cannot regexp.FindSubmatch in certain simple cases. For example, following code works properly:

assigned := regexp.MustCompile(`\x7f`)
group := assigned.FindSubmatch([]byte{0x7f})
fmt.Println(group)

(in playground it prints [[127]])

But if I change byte to 0x80 it does not work. Why?

答案1

得分: 2

根据包文档中的说明：

> 所有字符都是UTF-8编码的码点。

因此，正则表达式\x80不匹配字节值0x80，而是匹配字符U+0080的UTF-8表示。如果我们将测试程序更改为：

func main() {
	assigned := regexp.MustCompile(`\x80`)
	group := assigned.FindSubmatch([]byte{1, 2, 3, 0xc2, 0x80})
	fmt.Println(group)
}

现在我们得到了一个匹配的两个字节序列[[194 128]]，表示该字符。

无法将regexp包切换到二进制模式，因此您需要将输入转换为有效的UTF-8，或者使用其他包来匹配您的数据。

英文:

As mentioned in the package documentation:

> All characters are UTF-8-encoded code points.

So the regular expression \x80 does not match the byte value 0x80, but rather the UTF-8 representation of the character U+0080. This is evident if we change your test program to:

func main() {
	assigned := regexp.MustCompile(`\x80`)
	group := assigned.FindSubmatch([]byte{1, 2, 3, 0xc2, 0x80})
	fmt.Println(group)
}

We now get a match for the two byte sequence [[194 128]], which represents that character in question.

There is no way to switch the regexp package into a binary mode, so you will either need to convert your inputs to valid UTF-8, or use a different package to match your data.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

regexp.FindSubmatch使用十六进制字符代码

问题

答案1

Printing two digits after decimal in go

你的正则表达式为什么在Java中只返回一个元素而不是完整的匹配组？

从一个ORM迁移到另一个ORM

Go语言有部分类吗？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。