英文:
Why do I get different results when using libiconv over iconv binary?
问题
这是我使用UCS-2编码的示例字符串:
abvgdđežzijklmnjoprstćuvhcčdžš1234567890*+;'
当使用iconv二进制文件将UCS-2转换为ISO-8859-1//TRANSLIT编码时,我得到的结果是:
abvgd?ezzijklmnjoprstcuvhccdzs1234567890*+;'
现在我想在Go项目中使用libiconv。我使用这个库github.com/qiniu/iconv作为libiconv的绑定。但是当我使用这些绑定时,我得到的结果是:
abvgd?e?zijklmnjoprst?uvhc?d??1234567890*+;'
在使用Go中的库时,似乎应用了不同的转换规则。
我检查了Go绑定库,一切似乎都没问题;只有字节被传递,所以不会发生“信息丢失”。
在使用libiconv时,还有其他我应该注意的事项吗?是否有一些环境上下文可能触发不同的转换行为?
编辑(关于调用的额外解释):
我有两个文件“ucs-2.txt”和“latin1.txt”。ucs-2.txt文件包含UCS-2编码的字符串,latin1.txt文件包含通过运行以下命令得到的字符串:
iconv -f UCS2 -t ISO-8859-1//TRANSLIT --verbose data/encoding/ucs-2.txt > data/encoding/latin1.txt
在Go中,我使用以下代码从这些文件中读取内容:
var err error
ucs2, err = ioutil.ReadFile("data/encoding/ucs-2.txt")
if err != nil {
log.Fatal(err)
}
latin1, err = ioutil.ReadFile("data/encoding/latin1.txt")
if err != nil {
log.Fatal(err)
}
这个函数用于进行转换:
func convertEnc(content []byte) ([]byte, error) {
cd, err := iconv.Open("ISO-8859-1//TRANSLIT", "UCS2")
if err != nil {
return nil, err
}
defer cd.Close()
var outbuf [255]byte
res, _, err := cd.Conv(content, outbuf[:])
log.Printf("result: %+q", res)
return res, err
}
我使用DeepEqual进行测试:
reflect.DeepEqual(res, latin1)
英文:
Here is the sample string that I am using encoded in UCS-2:
abvgdđežzijklmnjoprstćuvhcčdžš1234567890*+;'
When converting UCS-2 to iso ISO-8859-1//TRANSLIT with iconv binary from file to file I get:
abvgd?ezzijklmnjoprstcuvhccdzs1234567890*+;'
Now I want to use libiconv in go project. I am using this library github.com/qiniu/iconv as bindings for libiconv. But when using bindings I get:
abvgd?e?zijklmnjoprst?uvhc?d??1234567890*+;'
It's like different transliteration rules apply when using library inside go.
I examined go bindings library and everything seems in order; only bytes are passed around so no "loss of information" could happen there.
Is there anything else that I should be aware of when using libiconv? Is there some environment context that could trigger different transliteration behaviour?
EDIT (additional explanation about invocation):
I have two files "ucs-2.txt" and "latin1.txt". ucs-2.txt file contains UCS-2 encoded string and latin1.txt contains string got by running:
iconv -f UCS2 -t ISO-8859-1//TRANSLIT --verbose data/encoding/ucs-2.txt > data/encoding/latin1.txt
In go I use these lines to pull content from these files:
var err error
ucs2, err = ioutil.ReadFile("data/encoding/ucs-2.txt")
if err != nil {
log.Fatal(err)
}
latin1, err = ioutil.ReadFile("data/encoding/latin1.txt")
if err != nil {
log.Fatal(err)
}
This function is doing conversion:
func convertEnc(content []byte) ([]byte, error) {
cd, err := iconv.Open("ISO-8859-1//TRANSLIT", "UCS2")
if err != nil {
return nil, err
}
defer cd.Close()
var outbuf [255]byte
res, _, err := cd.Conv(content, outbuf[:])
log.Printf("result: %+q", res)
return res, err
}
And I am using DeepEqual for testing:
reflect.DeepEqual(res, latin1)
答案1
得分: 2
第一个输出包括音译,即某些字符(例如 ž
)被音译为它们在不支持原始字符的编码中的“普通”对应字符(这里是 Latin-1 编码中的 z
)。
第二个输出没有进行音译,它丢弃了在目标编码(Latin-1)中无法表示的任何字符(如 ž
、ć
)。
因此,我怀疑你可能使用了与该库不同的选项来进行二进制转换。对于 libiconv
不熟悉,似乎你使用的函数中省略了 //TRANSLIT
部分或不支持该功能...
英文:
The first output includes transliteration, i.e. certain characters (e.g. ž
) are transliterated into their not-quite-right "plain" counterpart (z
) in order to be representable in an encoding that does not support the original character (here, ž
in Latin-1).
The second output did not transliterate anything, it dropped any characters not representable in the target encoding (ž
, ć
, ... in Latin-1).
Thus, I suspect you can the binary with different options than the library. Not familiar with libiconv
, it seems that the //TRANSLIT
part was omitted or is not supported by the function you used...?
答案2
得分: 1
转写是与地区相关的。可能是你的libiconv缺少/具有错误的地区设置。或者你正在使用的地区没有配置转写。
请查看这个错误报告,其中包含一些示例和对这个主题的讨论。
英文:
Transliteration is locale dependent. May be your libiconv is lacking/has wrong locale. Or the locale you are using there has no transliteration configured.
Please check this bug report as it has a few examples and a discussion on this topic.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论