英文:
Escaping of hex values in string literals
问题
我正在尝试在golang字符串中转义特定的十六进制值。函数调用看起来像这样:
Insert(0, "\x00\x00\x00\rIHDR\x00\x00\x000\x00\x00\x000\b\x03")
Insert(25, "\x00\x00\x00\x06PLTE")
Insert(43, "\x00\x00\x00\x02tRNS")
Insert(57, "\x00\x00\t;IDATx\xDA\x010\t\xCF\xF6") // 有问题的行
Insert(2432, "\x00\x00\x00\x00IEND")
问题出现在语言解释"\xDA"十六进制转义时。它没有正确转义为Ú,而是转义为�(替换字符)。
我通过以下playground示例确保了这一点:
fmt.Println("\xDA")
i := 218
h := fmt.Sprintf("%x", i)
fmt.Printf("Hex conf of '218' is '%s'\n", h)
fmt.Println(string(i))
当运行这段代码时,输出为:
�
Hex conf of '218' is 'da'
Ú
我在这里漏掉了什么吗?"\xDA"被转义为65533的值,这使得我整个程序出现问题,因为它依赖于CRC32和其他一些校验和。这在这个程序的javascript版本中没有发生(该版本本身是从James compface程序(用C编写)翻译而来)。
这是playground链接:https://play.golang.org/p/c-XMK68maX
英文:
I am attempting to escape a specific hex value in a golang string. The function call looks something like this:
Insert(0, "\x00\x00\x00\rIHDR\x00\x00\x000\x00\x00\x000\b\x03")
Insert(25, "\x00\x00\x00\x06PLTE")
Insert(43, "\x00\x00\x00\x02tRNS")
Insert(57, "\x00\x00\t;IDATx\xDA\x010\t\xCF\xF6") // problem line
Insert(2432, "\x00\x00\x00\x00IEND")
The problem arises when the language interprets the "\xDA" hex escape. Instead of correctly escaping to a Ú value, it is escaped to � (the Replacement Character).
I ensured this is what was occuring in the following playground example:
fmt.Println("\xDA")
i := 218
h := fmt.Sprintf("%x", i)
fmt.Printf("Hex conf of '%d' is '%s'\n", i, h)
fmt.Println(string(i))
This snippet, when run, prints
�
Hex conf of '218' is 'da'
Ú
Am I missing something here? The fact that "\xDA" is being escaped to a value of 65533 is throwing off my entire program, which relies on the CRC32 and some other checksums. This does not occur in the javascript version of this program (which itself is a translation from James compface program, written in C).
Here is the playground link: https://play.golang.org/p/c-XMK68maX
答案1
得分: 8
Go语言的字符串实际上是一系列字节,但在需要编码时,默认假设为utf8。值\xda
不是一个有效的utf8字符,所以在打印时会转换为unicode.ReplacementCharacter
,即"�"。
ReplacementChar = '\uFFFD' // 代表无效的码点。
如果你想要在字符串字面值中表示\xda
的rune值,可以使用unicode转义:\u00DA
,或者使用utf8编码:\xc3\x9a
,或者直接使用字符本身:Ú
。
如果你实际上想要在字符串中表示\xda
的单个字节值,那么你已经拥有了它,打印出来的字符并不重要。
英文:
Go strings are just a series of bytes, but when an encoding is needed, it's assumed to be utf8. The value \xda
isn't a valid utf8 character, so when printing it's converted to the unicode.ReplacementCharacter
"�"
ReplacementChar = '\uFFFD' // Represents invalid code points.
If you want the rune value of \xda
in a string literal, use a unicode escape: \u00DA
, or use the utf8 encoded: \xc3\x9a
, or use the character itself: Ú
.
https://play.golang.org/p/EJZIqCI_Gr
If you actually want a single byte value of \xda
in your string, that is what you have and the printed character is inconsequential.
答案2
得分: 3
你的输入看起来像是ISO-8859-1(Latin-1)编码。将其转换为UTF-8编码。例如,
package main
import (
"fmt"
"unicode/utf8"
)
// ISO88591ToString 将ISO-8859-1(Latin-1)映射为字符串(UTF-8)。
func ISO88591ToString(iso string) string {
var utf []rune
for i := 0; i < len(iso); i++ {
r := iso[i]
if utf == nil {
if r < utf8.RuneSelf {
continue
}
utf = make([]rune, len(iso))
for j, r := range iso[:i] {
utf[j] = rune(r)
}
}
utf[i] = rune(r)
}
if utf == nil {
return string(iso)
}
return string(utf)
}
func main() {
l1 := "\x00\x00\t;IDATx\xda\x010\t\xcf\xf6"
fmt.Printf("%q\n", l1)
s := ISO88591ToString(l1)
fmt.Printf("%q\n", s)
}
输出:
"\x00\x00\t;IDATx\xda\x010\t\xcf\xf6"
"\x00\x00\t;IDATxÚ\x010\tÏö"
英文:
Your input looks like ISO-8859-1 (Latin-1). Convert it to UTF-8. For example,
package main
import (
"fmt"
"unicode/utf8"
)
// ISO88591ToString maps ISO-8859-1 (Latin-1) to string (UTF-8).
func ISO88591ToString(iso string) string {
var utf []rune
for i := 0; i < len(iso); i++ {
r := iso[i]
if utf == nil {
if r < utf8.RuneSelf {
continue
}
utf = make([]rune, len(iso))
for j, r := range iso[:i] {
utf[j] = rune(r)
}
}
utf[i] = rune(r)
}
if utf == nil {
return string(iso)
}
return string(utf)
}
func main() {
l1 := "\x00\x00\t;IDATx\xDA\x010\t\xCF\xF6"
fmt.Printf("%q\n", l1)
s := ISO88591ToString(l1)
fmt.Printf("%q\n", s)
}
Output:
"\x00\x00\t;IDATx\xda\x010\t\xcf\xf6"
"\x00\x00\t;IDATxÚ\x010\tÏö"
答案3
得分: 1
在Go语言中,字符串是以UTF-8编码的。\xDA
本身不是一个有效的UTF-8序列,这意味着将其作为字符串的一部分打印时,将会显示Unicode替换字符U+FFFD,而不是你想要的字符(Ú或U+00DA)。
然而,你似乎正在处理原始字节,所以你需要考虑一下你是否想要表示为\u00DA
的符文,它在UTF-8中被编码为2字节序列\xC3\x8F
,或者你是否需要单字节\xDA
。前者将按照你的期望打印出Ú,但需要2个字节。后者将不会按照你的期望打印,但它会正确地将\xDA
解释为1个字节而不是2个字节。
下面是一个示例,你可以在Playground上运行它:
func main() {
// 由UTF-8首字节组成的字符串。
dataString := "\xCF\xDA\xF6"
// 不会打印出你认为的结果。
for _, c := range dataString {
fmt.Printf("%X ", c)
}
fmt.Println()
// 将字符串的字节转换为字节切片。
data := []byte(dataString)
// 现在应该打印CF、DA、F6。
for _, b := range data {
fmt.Printf("%X ", b)
}
fmt.Println()
}
希望对你有帮助!
英文:
Strings in Go are UTF-8, and \xDA
isn't a valid UTF-8 sequence by itself, meaning printing it as a part of a string will yield the Unicode replacement character U+FFFD instead of what you wanted (Ú, or U+00DA).
You seem to be working with raw bytes, however, so you should consider whether you want the rune represented by \u00DA
, which is encoded in UTF-8 as the 2-byte sequence \xC3\x8F
, or whether you require the single byte \xDA
. The former will print Ú as you want with the caveat that it requires 2 bytes. The latter will not print as you expect, yet it will correctly be interpret \xDA
as 1 byte rather than 2 bytes.
Here's an illustrative example you can run on the Playground:
func main() {
// A string made up of UTF-8 lead bytes.
dataString := "\xCF\xDA\xF6"
// Doesn't print what you think it should.
for _, c := range dataString {
fmt.Printf("%X ", c)
}
fmt.Println()
// Convert the string's bytes to a byte slice.
data := []byte(dataString)
// Now it should print CF, DA, F6.
for _, b := range data {
fmt.Printf("%X ", b)
}
fmt.Println()
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论