2017年9月7日 18:16:11go评论92阅读模式

英文:

Unexpected code point conversion

问题

为什么在以下应用程序中，一个字节被转换为值为65533的rune，而不是132？

我有一个ASCII码转换表（旧的ASCII码->新的ASCII码），我应该实现它，所以我需要正确的ASCII值（在这种情况下是132）在转换器中。

示例程序：

package main

import (
    "io/ioutil"
    "flag"
    "bytes"
    "fmt"
)

func converter(r rune) rune {
    fmt.Printf("%v ", int(r))
    return r
}

func main() {

    // 解析命令行参数
    var infile string
    flag.StringVar(&infile, "in", "", "input file")
    flag.Parse()

    // 一次性读取整个文件
    b, err := ioutil.ReadFile(infile)
    if err != nil {
        panic(err)
    }

    fmt.Printf("%v\n", b)

    // 转换字符集
    converted := bytes.Map(converter, b)

    fmt.Printf("\n%v\n", converted)
}

示例输入文件（十六进制）：

4A 84 6C 6B 0D 0A

应用程序的示例输出：

[74 132 108 107 13 10]
74 65533 108 107 13 10
[74 239 191 189 108 107 13 10]

英文:

Why is one byte converted to rune with value 65533 instead of 132 in the following application?

I have ascii code conversion table (old ascii code -> new ascii code) that I should implement, so I need the correct ascii values (132 in this case) in converter.

Sample program:

package main

import (
    &quot;io/ioutil&quot;
    &quot;flag&quot;
    &quot;bytes&quot;
    &quot;fmt&quot;
)

func converter(r rune) rune {
    fmt.Printf(&quot;%v &quot;, int(r))
    return r
}

func main() {

    // parse the command line
    var infile string
    flag.StringVar(&amp;infile, &quot;in&quot;, &quot;&quot;, &quot;input file&quot;)
    flag.Parse()

    // read the whole file at once
    b, err := ioutil.ReadFile(infile)
    if err != nil {
        panic(err)
    }

    fmt.Printf(&quot;%v\n&quot;, b)

    // convert charset
    converted := bytes.Map(converter, b)

    fmt.Printf(&quot;\n%v\n&quot;, converted)
}

Sample input file (in hex):

4A 84 6C 6B 0D 0A

Sample output from the application:

[74 132 108 107 13 10]
74 65533 108 107 13 10
[74 239 191 189 108 107 13 10]

答案1

得分: 1

Rune是Unicode值，而不是ASCII。因此，您的字节被解释为UTF8。

如果我们看一下您正在使用的函数：
https://golang.org/src/bytes/bytes.go?s=9029:9081#L344

我们可以看到对切片中的每个字节都进行了转换为Unicode rune。

r := rune(s[i])

它的作用是将字节从s[i]开始转换为UTF8字符。

在UTF8中，一个字符可以占用多个字节。这与ASCII编码相反，其中一个字符始终占用一个字节。

您可以在这里阅读更多关于UTF8的信息：https://en.wikipedia.org/wiki/UTF-8

这就是您得到错误结果的原因。

要修复它，您应该使用for range循环迭代您的字节，并将输出保存到新的切片中。

func converter(b byte) byte {
    fmt.Printf("%v ", int(r))
    return b
}

...

converted := make([]byte, len(b))

for i, v := range b {
   // v是您的字节值-在这里进行转换
   converted[i] = converter(v)
}

英文:

Rune is a Unicode value, not ASCII. So your bytes are interpreted as UTF8.

If we look at the function that you are using:
https://golang.org/src/bytes/bytes.go?s=9029:9081#L344

We can see that for every byte in slice it is converted to Unicode rune.

r := rune(s[i])

What it does is a conversion of bytes, starting from s[i] to UTF8 letter.

In UTF8 one letter can
occupy more that one byte. This is opposite to ASCII encoding where
one letter always takes one byte.

You can read more about UTF8 here https://en.wikipedia.org/wiki/UTF-8

This is the reason you have the wrong result.

To fix it, you should iterate over your bytes using for range loop and save the output to new slice.

func converter(b byte) byte {
    fmt.Printf(&quot;%v &quot;, int(r))
    return b
}

...

converted := make([]byte, len(b))

for i, v := range b {
   // v is your byte value - convert it here
   converted[i] = converter(v)
}

答案2

得分: 1

从文本中读取字节，然后你可以使用以下代码来处理 - 输出的最后一列将与ASCII值相对应。

package main

import (
	"encoding/hex"
	"fmt"
	"unicode/utf8"
)

func main() {
	//s := "Hello, 世界"
	//假设以下是你从文件中读取的十六进制值..
	b, err := hex.DecodeString("48656c6c6f2c20e4b896e7958c")
	if err != nil {
		fmt.Println(err)
	}
	fmt.Println(b)
	s := string(b)
	for i := 0; i < len(s); {
		r, size := utf8.DecodeRuneInString(s[i:])
		fmt.Printf("%d\t%c\t%d\n", i, r, r)
		i += size
	}
	anotherWay(s)

}
func anotherWay(s string) {
	fmt.Println("\nAnother way")
	for i, r := range s {
		fmt.Printf("%d\t%c\t%d\n", i, r, r)
	}
}

在 playground 上查看：https://play.golang.org/p/9WusGxWv8w

英文:

Read the bytes from the text, and then you can use something on these lines - the last column in the output will be comparable to the ASCII value.

package main

import (
	&quot;encoding/hex&quot;
	&quot;fmt&quot;
	&quot;unicode/utf8&quot;
)

func main() {
	//s := &quot;Hello, 世界&quot;
	//Assuming the following is the hex you have read in from the file..
	b, err := hex.DecodeString(&quot;48656c6c6f2c20e4b896e7958c&quot;)
	if err != nil {
		fmt.Println(err)
	}
	fmt.Println(b)
	s := string(b)
	for i := 0; i &lt; len(s); {
		r, size := utf8.DecodeRuneInString(s[i:])
		fmt.Printf(&quot;%d\t%c\t%d\n&quot;, i, r, r)
		i += size
	}
	anotherWay(s)

}
func anotherWay(s string) {
	fmt.Println(&quot;\nAnother way&quot;)
	for i, r := range s {
		fmt.Printf(&quot;%d\t%c\t%d\n&quot;, i, r, r)
	}
}

On playground : https://play.golang.org/p/9WusGxWv8w

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

意外的代码点转换错误

问题

答案1

答案2

如何从解压后的文件中删除gzip头元数据

Golang的zlib无法进行压缩。

Docker镜像未运行。

How to use a package of generated protobuf inside a go module?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论