当使用libiconv库和iconv二进制文件时,为什么会得到不同的结果?

huangapple go评论92阅读模式
英文:

Why do I get different results when using libiconv over iconv binary?

问题

这是我使用UCS-2编码的示例字符串:

abvgdđežzijklmnjoprstćuvhcčdžš1234567890*+;'

当使用iconv二进制文件将UCS-2转换为ISO-8859-1//TRANSLIT编码时,我得到的结果是:

abvgd?ezzijklmnjoprstcuvhccdzs1234567890*+;'

现在我想在Go项目中使用libiconv。我使用这个库github.com/qiniu/iconv作为libiconv的绑定。但是当我使用这些绑定时,我得到的结果是:

abvgd?e?zijklmnjoprst?uvhc?d??1234567890*+;'

在使用Go中的库时,似乎应用了不同的转换规则。

我检查了Go绑定库,一切似乎都没问题;只有字节被传递,所以不会发生“信息丢失”。

在使用libiconv时,还有其他我应该注意的事项吗?是否有一些环境上下文可能触发不同的转换行为?


编辑(关于调用的额外解释):

我有两个文件“ucs-2.txt”和“latin1.txt”。ucs-2.txt文件包含UCS-2编码的字符串,latin1.txt文件包含通过运行以下命令得到的字符串:

iconv -f UCS2 -t ISO-8859-1//TRANSLIT --verbose data/encoding/ucs-2.txt > data/encoding/latin1.txt

在Go中,我使用以下代码从这些文件中读取内容:

var err error
ucs2, err = ioutil.ReadFile("data/encoding/ucs-2.txt")
if err != nil {
	log.Fatal(err)
}
latin1, err = ioutil.ReadFile("data/encoding/latin1.txt")
if err != nil {
	log.Fatal(err)
}

这个函数用于进行转换:

func convertEnc(content []byte) ([]byte, error) {
    cd, err := iconv.Open("ISO-8859-1//TRANSLIT", "UCS2")
	if err != nil {
		return nil, err
	}
	defer cd.Close()
	var outbuf [255]byte
	res, _, err := cd.Conv(content, outbuf[:])
	log.Printf("result: %+q", res)
	return res, err
}

我使用DeepEqual进行测试:

reflect.DeepEqual(res, latin1)
英文:

Here is the sample string that I am using encoded in UCS-2:

abvgdđežzijklmnjoprstćuvhcčdžš1234567890*+;'

When converting UCS-2 to iso ISO-8859-1//TRANSLIT with iconv binary from file to file I get:

abvgd?ezzijklmnjoprstcuvhccdzs1234567890*+;'

Now I want to use libiconv in go project. I am using this library github.com/qiniu/iconv as bindings for libiconv. But when using bindings I get:

abvgd?e?zijklmnjoprst?uvhc?d??1234567890*+;'

It's like different transliteration rules apply when using library inside go.

I examined go bindings library and everything seems in order; only bytes are passed around so no "loss of information" could happen there.

Is there anything else that I should be aware of when using libiconv? Is there some environment context that could trigger different transliteration behaviour?


EDIT (additional explanation about invocation):

I have two files "ucs-2.txt" and "latin1.txt". ucs-2.txt file contains UCS-2 encoded string and latin1.txt contains string got by running:

iconv -f UCS2 -t ISO-8859-1//TRANSLIT --verbose data/encoding/ucs-2.txt > data/encoding/latin1.txt

In go I use these lines to pull content from these files:

var err error
ucs2, err = ioutil.ReadFile("data/encoding/ucs-2.txt")
if err != nil {
	log.Fatal(err)
}
latin1, err = ioutil.ReadFile("data/encoding/latin1.txt")
if err != nil {
	log.Fatal(err)
}

This function is doing conversion:

func convertEnc(content []byte) ([]byte, error) {
    cd, err := iconv.Open("ISO-8859-1//TRANSLIT", "UCS2")
	if err != nil {
		return nil, err
	}
	defer cd.Close()
	var outbuf [255]byte
	res, _, err := cd.Conv(content, outbuf[:])
	log.Printf("result: %+q", res)
	return res, err
}

And I am using DeepEqual for testing:

reflect.DeepEqual(res, latin1)

答案1

得分: 2

第一个输出包括音译,即某些字符(例如 ž)被音译为它们在不支持原始字符的编码中的“普通”对应字符(这里是 Latin-1 编码中的 z)。

第二个输出没有进行音译,它丢弃了在目标编码(Latin-1)中无法表示的任何字符(如 žć)。

因此,我怀疑你可能使用了与该库不同的选项来进行二进制转换。对于 libiconv 不熟悉,似乎你使用的函数中省略了 //TRANSLIT 部分或不支持该功能...

英文:

The first output includes transliteration, i.e. certain characters (e.g. ž) are transliterated into their not-quite-right "plain" counterpart (z) in order to be representable in an encoding that does not support the original character (here, ž in Latin-1).

The second output did not transliterate anything, it dropped any characters not representable in the target encoding (ž, ć, ... in Latin-1).

Thus, I suspect you can the binary with different options than the library. Not familiar with libiconv, it seems that the //TRANSLIT part was omitted or is not supported by the function you used...?

答案2

得分: 1

转写是与地区相关的。可能是你的libiconv缺少/具有错误的地区设置。或者你正在使用的地区没有配置转写。

请查看这个错误报告,其中包含一些示例和对这个主题的讨论。

英文:

Transliteration is locale dependent. May be your libiconv is lacking/has wrong locale. Or the locale you are using there has no transliteration configured.

Please check this bug report as it has a few examples and a discussion on this topic.

huangapple
  • 本文由 发表于 2015年8月4日 21:32:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/31810729.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定