2017年4月18日 08:27:01go评论104阅读模式

英文:

Escaping of hex values in string literals

问题

我正在尝试在golang字符串中转义特定的十六进制值。函数调用看起来像这样：

Insert(0, "\x00\x00\x00\rIHDR\x00\x00\x000\x00\x00\x000\b\x03")
Insert(25, "\x00\x00\x00\x06PLTE")
Insert(43, "\x00\x00\x00\x02tRNS")
Insert(57, "\x00\x00\t;IDATx\xDA\x010\t\xCF\xF6") // 有问题的行
Insert(2432, "\x00\x00\x00\x00IEND")

问题出现在语言解释"\xDA"十六进制转义时。它没有正确转义为Ú，而是转义为�（替换字符）。

我通过以下playground示例确保了这一点：

fmt.Println("\xDA")
i := 218
h := fmt.Sprintf("%x", i)
fmt.Printf("Hex conf of '218' is '%s'\n", h)
fmt.Println(string(i))

当运行这段代码时，输出为：

�
Hex conf of '218' is 'da'
&#218;

我在这里漏掉了什么吗？"\xDA"被转义为65533的值，这使得我整个程序出现问题，因为它依赖于CRC32和其他一些校验和。这在这个程序的javascript版本中没有发生（该版本本身是从James compface程序（用C编写）翻译而来）。

这是playground链接：https://play.golang.org/p/c-XMK68maX

英文:

I am attempting to escape a specific hex value in a golang string. The function call looks something like this:

Insert(0, &quot;\x00\x00\x00\rIHDR\x00\x00\x000\x00\x00\x000\b\x03&quot;)
Insert(25, &quot;\x00\x00\x00\x06PLTE&quot;)
Insert(43, &quot;\x00\x00\x00\x02tRNS&quot;)
Insert(57, &quot;\x00\x00\t;IDATx\xDA\x010\t\xCF\xF6&quot;) // problem line
Insert(2432, &quot;\x00\x00\x00\x00IEND&quot;)

The problem arises when the language interprets the "\xDA" hex escape. Instead of correctly escaping to a Ú value, it is escaped to � (the Replacement Character).

I ensured this is what was occuring in the following playground example:

fmt.Println(&quot;\xDA&quot;)
i := 218
h := fmt.Sprintf(&quot;%x&quot;, i)
fmt.Printf(&quot;Hex conf of &#39;%d&#39; is &#39;%s&#39;\n&quot;, i, h)	
fmt.Println(string(i))

This snippet, when run, prints

�
Hex conf of &#39;218&#39; is &#39;da&#39;
&#218;

Am I missing something here? The fact that "\xDA" is being escaped to a value of 65533 is throwing off my entire program, which relies on the CRC32 and some other checksums. This does not occur in the javascript version of this program (which itself is a translation from James compface program, written in C).

Here is the playground link: https://play.golang.org/p/c-XMK68maX

答案1

得分: 8

Go语言的字符串实际上是一系列字节，但在需要编码时，默认假设为utf8。值\xda不是一个有效的utf8字符，所以在打印时会转换为unicode.ReplacementCharacter，即"�"。

    ReplacementChar = '\uFFFD'     // 代表无效的码点。

如果你想要在字符串字面值中表示\xda的rune值，可以使用unicode转义：\u00DA，或者使用utf8编码：\xc3\x9a，或者直接使用字符本身：Ú。

如果你实际上想要在字符串中表示\xda的单个字节值，那么你已经拥有了它，打印出来的字符并不重要。

英文:

Go strings are just a series of bytes, but when an encoding is needed, it's assumed to be utf8. The value \xda isn't a valid utf8 character, so when printing it's converted to the unicode.ReplacementCharacter "�"

    ReplacementChar = &#39;\uFFFD&#39;     // Represents invalid code points.

If you want the rune value of \xda in a string literal, use a unicode escape: \u00DA, or use the utf8 encoded: \xc3\x9a, or use the character itself: Ú.

https://play.golang.org/p/EJZIqCI_Gr

If you actually want a single byte value of \xda in your string, that is what you have and the printed character is inconsequential.

答案2

得分: 3

你的输入看起来像是ISO-8859-1（Latin-1）编码。将其转换为UTF-8编码。例如，

package main

import (
	"fmt"
	"unicode/utf8"
)

// ISO88591ToString 将ISO-8859-1（Latin-1）映射为字符串（UTF-8）。
func ISO88591ToString(iso string) string {
	var utf []rune
	for i := 0; i < len(iso); i++ {
		r := iso[i]
		if utf == nil {
			if r < utf8.RuneSelf {
				continue
			}
			utf = make([]rune, len(iso))
			for j, r := range iso[:i] {
				utf[j] = rune(r)
			}
		}
		utf[i] = rune(r)
	}
	if utf == nil {
		return string(iso)
	}
	return string(utf)
}

func main() {
	l1 := "\x00\x00\t;IDATx\xda\x010\t\xcf\xf6"
	fmt.Printf("%q\n", l1)
	s := ISO88591ToString(l1)
	fmt.Printf("%q\n", s)
}

输出：

"\x00\x00\t;IDATx\xda\x010\t\xcf\xf6"
"\x00\x00\t;IDATxÚ\x010\tÏö"

英文:

Your input looks like ISO-8859-1 (Latin-1). Convert it to UTF-8. For example,

package main

import (
	&quot;fmt&quot;
	&quot;unicode/utf8&quot;
)

// ISO88591ToString maps ISO-8859-1 (Latin-1) to string (UTF-8).
func ISO88591ToString(iso string) string {
	var utf []rune
	for i := 0; i &lt; len(iso); i++ {
		r := iso[i]
		if utf == nil {
			if r &lt; utf8.RuneSelf {
				continue
			}
			utf = make([]rune, len(iso))
			for j, r := range iso[:i] {
				utf[j] = rune(r)
			}
		}
		utf[i] = rune(r)
	}
	if utf == nil {
		return string(iso)
	}
	return string(utf)
}

func main() {
	l1 := &quot;\x00\x00\t;IDATx\xDA\x010\t\xCF\xF6&quot;
	fmt.Printf(&quot;%q\n&quot;, l1)
	s := ISO88591ToString(l1)
	fmt.Printf(&quot;%q\n&quot;, s)
}

Output:

&quot;\x00\x00\t;IDATx\xda\x010\t\xcf\xf6&quot;
&quot;\x00\x00\t;IDATx&#218;\x010\t&#207;&#246;&quot;

答案3

得分: 1

在Go语言中，字符串是以UTF-8编码的。\xDA本身不是一个有效的UTF-8序列，这意味着将其作为字符串的一部分打印时，将会显示Unicode替换字符U+FFFD，而不是你想要的字符（Ú或U+00DA）。

然而，你似乎正在处理原始字节，所以你需要考虑一下你是否想要表示为\u00DA的符文，它在UTF-8中被编码为2字节序列\xC3\x8F，或者你是否需要单字节\xDA。前者将按照你的期望打印出Ú，但需要2个字节。后者将不会按照你的期望打印，但它会正确地将\xDA解释为1个字节而不是2个字节。

下面是一个示例，你可以在Playground上运行它：

func main() {
    // 由UTF-8首字节组成的字符串。
    dataString := "\xCF\xDA\xF6"

    // 不会打印出你认为的结果。
    for _, c := range dataString {
        fmt.Printf("%X ", c)
    }
    fmt.Println()

    // 将字符串的字节转换为字节切片。
    data := []byte(dataString)

    // 现在应该打印CF、DA、F6。
    for _, b := range data {
        fmt.Printf("%X ", b)
    }
    fmt.Println()
}

希望对你有帮助！

英文:

Strings in Go are UTF-8, and \xDA isn't a valid UTF-8 sequence by itself, meaning printing it as a part of a string will yield the Unicode replacement character U+FFFD instead of what you wanted (Ú, or U+00DA).

You seem to be working with raw bytes, however, so you should consider whether you want the rune represented by \u00DA, which is encoded in UTF-8 as the 2-byte sequence \xC3\x8F, or whether you require the single byte \xDA. The former will print Ú as you want with the caveat that it requires 2 bytes. The latter will not print as you expect, yet it will correctly be interpret \xDA as 1 byte rather than 2 bytes.

Here's an illustrative example you can run on the Playground:

func main() {
	// A string made up of UTF-8 lead bytes.
	dataString := &quot;\xCF\xDA\xF6&quot;

	// Doesn&#39;t print what you think it should.
	for _, c := range dataString {
		fmt.Printf(&quot;%X &quot;, c)
	}
	fmt.Println()

	// Convert the string&#39;s bytes to a byte slice.
	data := []byte(dataString)

	// Now it should print CF, DA, F6.
	for _, b := range data {
		fmt.Printf(&quot;%X &quot;, b)
	}
	fmt.Println()
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

字符串字面值中的十六进制值转义

问题

答案1

答案2

答案3

无法通过Upstart启动Golang程序。

Golang中与Python的getattr()或call()等效的方法是什么？

Go语言的第22个练习题是关于Reader接口的，你想知道这个问题的含义是什么。

在网络请求之前执行函数

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论