字符串字面值中的十六进制值转义

huangapple go评论99阅读模式
英文:

Escaping of hex values in string literals

问题

我正在尝试在golang字符串中转义特定的十六进制值。函数调用看起来像这样:

Insert(0, "\x00\x00\x00\rIHDR\x00\x00\x000\x00\x00\x000\b\x03")
Insert(25, "\x00\x00\x00\x06PLTE")
Insert(43, "\x00\x00\x00\x02tRNS")
Insert(57, "\x00\x00\t;IDATx\xDA\x010\t\xCF\xF6") // 有问题的行
Insert(2432, "\x00\x00\x00\x00IEND")

问题出现在语言解释"\xDA"十六进制转义时。它没有正确转义为Ú,而是转义为�(替换字符)。

我通过以下playground示例确保了这一点:

fmt.Println("\xDA")
i := 218
h := fmt.Sprintf("%x", i)
fmt.Printf("Hex conf of '218' is '%s'\n", h)
fmt.Println(string(i))

当运行这段代码时,输出为:

�
Hex conf of '218' is 'da'
Ú

我在这里漏掉了什么吗?"\xDA"被转义为65533的值,这使得我整个程序出现问题,因为它依赖于CRC32和其他一些校验和。这在这个程序的javascript版本中没有发生(该版本本身是从James compface程序(用C编写)翻译而来)。

这是playground链接:https://play.golang.org/p/c-XMK68maX

英文:

I am attempting to escape a specific hex value in a golang string. The function call looks something like this:

Insert(0, "\x00\x00\x00\rIHDR\x00\x00\x000\x00\x00\x000\b\x03")
Insert(25, "\x00\x00\x00\x06PLTE")
Insert(43, "\x00\x00\x00\x02tRNS")
Insert(57, "\x00\x00\t;IDATx\xDA\x010\t\xCF\xF6") // problem line
Insert(2432, "\x00\x00\x00\x00IEND")

The problem arises when the language interprets the "\xDA" hex escape. Instead of correctly escaping to a Ú value, it is escaped to � (the Replacement Character).

I ensured this is what was occuring in the following playground example:

fmt.Println("\xDA")
i := 218
h := fmt.Sprintf("%x", i)
fmt.Printf("Hex conf of '%d' is '%s'\n", i, h)	
fmt.Println(string(i))

This snippet, when run, prints

�
Hex conf of '218' is 'da'
Ú

Am I missing something here? The fact that "\xDA" is being escaped to a value of 65533 is throwing off my entire program, which relies on the CRC32 and some other checksums. This does not occur in the javascript version of this program (which itself is a translation from James compface program, written in C).

Here is the playground link: https://play.golang.org/p/c-XMK68maX

答案1

得分: 8

Go语言的字符串实际上是一系列字节,但在需要编码时,默认假设为utf8。值\xda不是一个有效的utf8字符,所以在打印时会转换为unicode.ReplacementCharacter,即"�"。

    ReplacementChar = '\uFFFD'     // 代表无效的码点。

如果你想要在字符串字面值中表示\xda的rune值,可以使用unicode转义:\u00DA,或者使用utf8编码:\xc3\x9a,或者直接使用字符本身:Ú

如果你实际上想要在字符串中表示\xda的单个字节值,那么你已经拥有了它,打印出来的字符并不重要。

英文:

Go strings are just a series of bytes, but when an encoding is needed, it's assumed to be utf8. The value \xda isn't a valid utf8 character, so when printing it's converted to the unicode.ReplacementCharacter "�"

    ReplacementChar = '\uFFFD'     // Represents invalid code points.

If you want the rune value of \xda in a string literal, use a unicode escape: \u00DA, or use the utf8 encoded: \xc3\x9a, or use the character itself: Ú.

https://play.golang.org/p/EJZIqCI_Gr

If you actually want a single byte value of \xda in your string, that is what you have and the printed character is inconsequential.

答案2

得分: 3

你的输入看起来像是ISO-8859-1(Latin-1)编码。将其转换为UTF-8编码。例如,

package main

import (
	"fmt"
	"unicode/utf8"
)

// ISO88591ToString 将ISO-8859-1(Latin-1)映射为字符串(UTF-8)。
func ISO88591ToString(iso string) string {
	var utf []rune
	for i := 0; i < len(iso); i++ {
		r := iso[i]
		if utf == nil {
			if r < utf8.RuneSelf {
				continue
			}
			utf = make([]rune, len(iso))
			for j, r := range iso[:i] {
				utf[j] = rune(r)
			}
		}
		utf[i] = rune(r)
	}
	if utf == nil {
		return string(iso)
	}
	return string(utf)
}

func main() {
	l1 := "\x00\x00\t;IDATx\xda\x010\t\xcf\xf6"
	fmt.Printf("%q\n", l1)
	s := ISO88591ToString(l1)
	fmt.Printf("%q\n", s)
}

输出:

"\x00\x00\t;IDATx\xda\x010\t\xcf\xf6"
"\x00\x00\t;IDATxÚ\x010\tÏö"
英文:

Your input looks like ISO-8859-1 (Latin-1). Convert it to UTF-8. For example,

package main

import (
	&quot;fmt&quot;
	&quot;unicode/utf8&quot;
)

// ISO88591ToString maps ISO-8859-1 (Latin-1) to string (UTF-8).
func ISO88591ToString(iso string) string {
	var utf []rune
	for i := 0; i &lt; len(iso); i++ {
		r := iso[i]
		if utf == nil {
			if r &lt; utf8.RuneSelf {
				continue
			}
			utf = make([]rune, len(iso))
			for j, r := range iso[:i] {
				utf[j] = rune(r)
			}
		}
		utf[i] = rune(r)
	}
	if utf == nil {
		return string(iso)
	}
	return string(utf)
}

func main() {
	l1 := &quot;\x00\x00\t;IDATx\xDA\x010\t\xCF\xF6&quot;
	fmt.Printf(&quot;%q\n&quot;, l1)
	s := ISO88591ToString(l1)
	fmt.Printf(&quot;%q\n&quot;, s)
}

Output:

&quot;\x00\x00\t;IDATx\xda\x010\t\xcf\xf6&quot;
&quot;\x00\x00\t;IDATx&#218;\x010\t&#207;&#246;&quot;

答案3

得分: 1

在Go语言中,字符串是以UTF-8编码的。\xDA本身不是一个有效的UTF-8序列,这意味着将其作为字符串的一部分打印时,将会显示Unicode替换字符U+FFFD,而不是你想要的字符(Ú或U+00DA)。

然而,你似乎正在处理原始字节,所以你需要考虑一下你是否想要表示为\u00DA的符文,它在UTF-8中被编码为2字节序列\xC3\x8F,或者你是否需要单字节\xDA。前者将按照你的期望打印出Ú,但需要2个字节。后者将不会按照你的期望打印,但它会正确地将\xDA解释为1个字节而不是2个字节。

下面是一个示例,你可以在Playground上运行它:

func main() {
    // 由UTF-8首字节组成的字符串。
    dataString := "\xCF\xDA\xF6"

    // 不会打印出你认为的结果。
    for _, c := range dataString {
        fmt.Printf("%X ", c)
    }
    fmt.Println()

    // 将字符串的字节转换为字节切片。
    data := []byte(dataString)

    // 现在应该打印CF、DA、F6。
    for _, b := range data {
        fmt.Printf("%X ", b)
    }
    fmt.Println()
}

希望对你有帮助!

英文:

Strings in Go are UTF-8, and \xDA isn't a valid UTF-8 sequence by itself, meaning printing it as a part of a string will yield the Unicode replacement character U+FFFD instead of what you wanted (Ú, or U+00DA).

You seem to be working with raw bytes, however, so you should consider whether you want the rune represented by \u00DA, which is encoded in UTF-8 as the 2-byte sequence \xC3\x8F, or whether you require the single byte \xDA. The former will print Ú as you want with the caveat that it requires 2 bytes. The latter will not print as you expect, yet it will correctly be interpret \xDA as 1 byte rather than 2 bytes.

Here's an illustrative example you can run on the Playground:

func main() {
	// A string made up of UTF-8 lead bytes.
	dataString := &quot;\xCF\xDA\xF6&quot;

	// Doesn&#39;t print what you think it should.
	for _, c := range dataString {
		fmt.Printf(&quot;%X &quot;, c)
	}
	fmt.Println()

	// Convert the string&#39;s bytes to a byte slice.
	data := []byte(dataString)

	// Now it should print CF, DA, F6.
	for _, b := range data {
		fmt.Printf(&quot;%X &quot;, b)
	}
	fmt.Println()
}

huangapple
  • 本文由 发表于 2017年4月18日 08:27:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/43461679.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定