英文:
How to properly output a string in a Windows console with go?
问题
我有一个用Go语言编写的exe
文件,它打印出以utf-8编码的字符串,其中包含特殊字符。
由于该exe
文件是用于在控制台窗口中使用的,所以它的输出会被破坏,因为Windows使用ibm850
编码(也称为code page 850
)。
您如何确保Go exe
正确打印编码正确的字符串以供控制台窗口使用,例如打印:
éèïöîôùòèìë
而不是(没有任何转换为正确的字符集)
├®├¿├»├Â├«├┤├╣├▓├¿├¼├½
英文:
I have a exe
in go which prints utf-8 encoded strings, with special characters in it.
Since that exe is made to be used from a console window, its output is mangled because Windows uses ibm850
encoding (aka code page 850
).
How would you make sure the go exe
print correctly encoded strings for a console windows, ie print for instance:
éèïöîôùòèìë
instead of (without any translation to the right charset)
├®├¿├»├Â├«├┤├╣├▓├¿├¼├½
答案1
得分: 3
// 警告:这是特定于Windows的,使用了未记录的方法,不处理标准输出重定向,不检查错误等。
// 使用时请自行承担风险。
// 在Go 1.0.2-windows-amd64上进行了测试。
package main
import "unicode/utf16"
import "syscall"
import "unsafe"
var modkernel32 = syscall.NewLazyDLL("kernel32.dll")
var procWriteConsoleW = modkernel32.NewProc("WriteConsoleW")
func consolePrintString(strUtf8 string) {
var strUtf16 []uint16
var charsWritten *uint32
strUtf16 = utf16.Encode([]rune(strUtf8))
if len(strUtf16) < 1 {
return
}
syscall.Syscall6(procWriteConsoleW.Addr(), 5,
uintptr(syscall.Stdout),
uintptr(unsafe.Pointer(&strUtf16[0])),
uintptr(len(strUtf16)),
uintptr(unsafe.Pointer(charsWritten)),
uintptr(0),
0)
}
func main() {
consolePrintString("Hello ☺\n")
consolePrintString("你好世界\n")
}
英文:
// Alert: This is Windows-specific, uses undocumented methods, does not
// handle stdout redirection, does not check for errors, etc.
// Use at your own risk.
// Tested with Go 1.0.2-windows-amd64.
package main
import "unicode/utf16"
import "syscall"
import "unsafe"
var modkernel32 = syscall.NewLazyDLL("kernel32.dll")
var procWriteConsoleW = modkernel32.NewProc("WriteConsoleW")
func consolePrintString(strUtf8 string) {
var strUtf16 []uint16
var charsWritten *uint32
strUtf16 = utf16.Encode([]rune(strUtf8))
if len(strUtf16) < 1 {
return
}
syscall.Syscall6(procWriteConsoleW.Addr(), 5,
uintptr(syscall.Stdout),
uintptr(unsafe.Pointer(&strUtf16[0])),
uintptr(len(strUtf16)),
uintptr(unsafe.Pointer(charsWritten)),
uintptr(0),
0)
}
func main() {
consolePrintString("Hello ☺\n")
consolePrintString("éèïöîôùòèìë\n")
}
答案2
得分: 2
在线书籍“使用Go进行网络编程”(CC BY-NC-SA 3.0)中有一章关于字符集的内容(管理字符集和编码),其中Jan Newmarch详细介绍了将一个字符集转换为另一个字符集的方法。但是这个方法似乎很繁琐。
这里有一个解决方案(可能有更简单的方法),使用库go-charset(来自Roger Peppe)。
我将一个utf-8
字符串转换为一个ibm850
编码的字符串,以便在DOS窗口中打印:
éèïöîôùòèìë
以下是详细的翻译函数:
package main
import (
"bytes"
"code.google.com/p/go-charset/charset"
_ "code.google.com/p/go-charset/data"
"fmt"
"io"
"log"
"strings"
)
func translate(tr charset.Translator, in string) (string, error) {
var buf bytes.Buffer
r := charset.NewTranslatingReader(strings.NewReader(in), tr)
_, err := io.Copy(&buf, r)
if err != nil {
return "", err
}
return string(buf.Bytes()), nil
}
func Utf2dos(in string) string {
dosCharset := "ibm850"
cs := charset.Info(dosCharset)
if cs == nil {
log.Fatal("no info found for %q", dosCharset)
}
fromtr, err := charset.TranslatorTo(dosCharset)
if err != nil {
log.Fatal("error making translator from %q: %v", dosCharset, err)
}
out, err := translate(fromtr, in)
if err != nil {
log.Fatal("error translating from %q: %v", dosCharset, err)
}
return out
}
func main() {
test := "éèïöîôùòèìë"
fmt.Println("utf-8:\n", test)
fmt.Println("ibm850:\n", Utf2dos(test))
}
英文:
The online book "Network programming with Go" (CC BY-NC-SA 3.0) has a chapter on Charsets (Managing character sets and encodings), in which Jan Newmarch details the conversion of one charset to another. But it seems cumbersome.
Here is a solution (I might have missed a much simpler one), using the library go-charset (from Roger Peppe).
I translate an utf-8
string to an ibm850
encoded one, allowing me to print in a DOS windows:
éèïöîôùòèìë
The translation function is detailed below:
package main
import (
"bytes"
"code.google.com/p/go-charset/charset"
_ "code.google.com/p/go-charset/data"
"fmt"
"io"
"log"
"strings"
)
func translate(tr charset.Translator, in string) (string, error) {
var buf bytes.Buffer
r := charset.NewTranslatingReader(strings.NewReader(in), tr)
_, err := io.Copy(&buf, r)
if err != nil {
return "", err
}
return string(buf.Bytes()), nil
}
func Utf2dos(in string) string {
dosCharset := "ibm850"
cs := charset.Info(dosCharset)
if cs == nil {
log.Fatal("no info found for %q", dosCharset)
}
fromtr, err := charset.TranslatorTo(dosCharset)
if err != nil {
log.Fatal("error making translator from %q: %v", dosCharset, err)
}
out, err := translate(fromtr, in)
if err != nil {
log.Fatal("error translating from %q: %v", dosCharset, err)
}
return out
}
func main() {
test := "éèïöîôùòèìë"
fmt.Println("utf-8:\n", test)
fmt.Println("ibm850:\n", Utf2dos(test))
}
答案3
得分: 2
自2016年起,您现在可以考虑使用golang.org/x/text
,它包含了一个编码字符映射,其中包括ISO-8859系列以及Windows 1252字符集。
参见“Go Quickly - 在Golang中转换字符编码”
r := charmap.ISO8859_1.NewDecoder().Reader(f)
io.Copy(out, r)
这是一个示例的摘录,打开一个ISO-8859-1源文本(my_isotext.txt
),创建一个目标文件(my_utf.txt
),并将第一个文件复制到第二个文件中。
但是要从ISO-8859-1解码为UTF-8,我们需要用解码器包装原始文件读取器(f
)。
我刚刚进行了测试(伪代码用于说明):
package main
import (
"fmt"
"golang.org/x/text/encoding"
"golang.org/x/text/encoding/charmap"
)
func main() {
t := "string composed of character in cp 850"
d := charmap.CodePage850.NewDecoder()
st, err := d.String(t)
if err != nil {
panic(err)
}
fmt.Println(st)
}
结果是一个在Windows CMD中可读的字符串。
在这个2018年11月的Reddit帖子中了解更多信息。
英文:
Since 2016, You can now (2017) consider the golang.org/x/text
, which comes with a encoding charmap including the ISO-8859 family as well as the Windows 1252 character set.
See "Go Quickly - Converting Character Encodings In Golang"
r := charmap.ISO8859_1.NewDecoder().Reader(f)
io.Copy(out, r)
That is an extract of an example opening a ISO-8859-1 source text (my_isotext.txt
), creating a destination file (my_utf.txt
), and copying the first to the second.
But to decode from ISO-8859-1 to UTF-8, we wrap the original file reader (f
) with a decoder.
I just tested (pseudo-code for illustration):
package main
import (
"fmt"
"golang.org/x/text/encoding"
"golang.org/x/text/encoding/charmap"
)
func main() {
t := "string composed of character in cp 850"
d := charmap.CodePage850.NewDecoder()
st, err := d.String(t)
if err != nil {
panic(err)
}
fmt.Println(st)
}
The result is a string readable in a Windows CMD.
See more in this Nov. 2018 reddit thread.
答案4
得分: 0
这是Go仍然无法直接完成的事情-请参见http://code.google.com/p/go/issues/detail?id=3376#c6。
英文:
It is something that Go still can't do out of the box - see http://code.google.com/p/go/issues/detail?id=3376#c6.
Alex
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论