如何在Windows控制台中正确输出字符串?

huangapple go评论81阅读模式
英文:

How to properly output a string in a Windows console with go?

问题

我有一个用Go语言编写的exe文件,它打印出以utf-8编码的字符串,其中包含特殊字符。
由于该exe文件是用于在控制台窗口中使用的,所以它的输出会被破坏,因为Windows使用ibm850编码(也称为code page 850)。

您如何确保Go exe正确打印编码正确的字符串以供控制台窗口使用,例如打印:

éèïöîôùòèìë

而不是(没有任何转换为正确的字符集

├®├¿├»├Â├«├┤├╣├▓├¿├¼├½
英文:

I have a exe in go which prints utf-8 encoded strings, with special characters in it.
Since that exe is made to be used from a console window, its output is mangled because Windows uses ibm850 encoding (aka code page 850).

How would you make sure the go exe print correctly encoded strings for a console windows, ie print for instance:

éèïöîôùòèìë

instead of (without any translation to the right charset)

├®├¿├»├Â├«├┤├╣├▓├¿├¼├½

答案1

得分: 3

// 警告:这是特定于Windows的,使用了未记录的方法,不处理标准输出重定向,不检查错误等。
// 使用时请自行承担风险。
// 在Go 1.0.2-windows-amd64上进行了测试。

package main

import "unicode/utf16"
import "syscall"
import "unsafe"

var modkernel32 = syscall.NewLazyDLL("kernel32.dll")
var procWriteConsoleW = modkernel32.NewProc("WriteConsoleW")

func consolePrintString(strUtf8 string) {
var strUtf16 []uint16
var charsWritten *uint32

strUtf16 = utf16.Encode([]rune(strUtf8))
if len(strUtf16) < 1 {
    return
}

syscall.Syscall6(procWriteConsoleW.Addr(), 5,
    uintptr(syscall.Stdout),
    uintptr(unsafe.Pointer(&strUtf16[0])),
    uintptr(len(strUtf16)),
    uintptr(unsafe.Pointer(charsWritten)),
    uintptr(0),
    0)

}

func main() {
consolePrintString("Hello ☺\n")
consolePrintString("你好世界\n")
}

英文:
// Alert: This is Windows-specific, uses undocumented methods, does not
// handle stdout redirection, does not check for errors, etc.
// Use at your own risk.
// Tested with Go 1.0.2-windows-amd64.

package main

import &quot;unicode/utf16&quot;
import &quot;syscall&quot;
import &quot;unsafe&quot;

var modkernel32 = syscall.NewLazyDLL(&quot;kernel32.dll&quot;)
var procWriteConsoleW = modkernel32.NewProc(&quot;WriteConsoleW&quot;)

func consolePrintString(strUtf8 string) {
	var strUtf16 []uint16
	var charsWritten *uint32

	strUtf16 = utf16.Encode([]rune(strUtf8))
	if len(strUtf16) &lt; 1 {
		return
	}
	
	syscall.Syscall6(procWriteConsoleW.Addr(), 5,
		uintptr(syscall.Stdout),
		uintptr(unsafe.Pointer(&amp;strUtf16[0])),
		uintptr(len(strUtf16)),
		uintptr(unsafe.Pointer(charsWritten)),
		uintptr(0),
		0)
}

func main() {
	consolePrintString(&quot;Hello ☺\n&quot;)
	consolePrintString(&quot;&#233;&#232;&#239;&#246;&#238;&#244;&#249;&#242;&#232;&#236;&#235;\n&quot;)
}

答案2

得分: 2

在线书籍“使用Go进行网络编程”(CC BY-NC-SA 3.0)中有一章关于字符集的内容(管理字符集和编码),其中Jan Newmarch详细介绍了将一个字符集转换为另一个字符集的方法。但是这个方法似乎很繁琐。

这里有一个解决方案(可能有更简单的方法),使用go-charset(来自Roger Peppe)。
我将一个utf-8字符串转换为一个ibm850编码的字符串,以便在DOS窗口中打印:

éèïöîôùòèìë

以下是详细的翻译函数:

package main

import (
	"bytes"
	"code.google.com/p/go-charset/charset"
	_ "code.google.com/p/go-charset/data"
	"fmt"
	"io"
	"log"
	"strings"
)

func translate(tr charset.Translator, in string) (string, error) {
	var buf bytes.Buffer
	r := charset.NewTranslatingReader(strings.NewReader(in), tr)
	_, err := io.Copy(&buf, r)
	if err != nil {
		return "", err
	}
	return string(buf.Bytes()), nil
}

func Utf2dos(in string) string {
	dosCharset := "ibm850"
	cs := charset.Info(dosCharset)
	if cs == nil {
		log.Fatal("no info found for %q", dosCharset)
	}
	fromtr, err := charset.TranslatorTo(dosCharset)
	if err != nil {
		log.Fatal("error making translator from %q: %v", dosCharset, err)
	}
	out, err := translate(fromtr, in)
	if err != nil {
		log.Fatal("error translating from %q: %v", dosCharset, err)
	}
	return out
}

func main() {
	test := "éèïöîôùòèìë"
	fmt.Println("utf-8:\n", test)
	fmt.Println("ibm850:\n", Utf2dos(test))
}
英文:

The online book "Network programming with Go" (CC BY-NC-SA 3.0) has a chapter on Charsets (Managing character sets and encodings), in which Jan Newmarch details the conversion of one charset to another. But it seems cumbersome.

Here is a solution (I might have missed a much simpler one), using the library go-charset (from Roger Peppe).
I translate an utf-8 string to an ibm850 encoded one, allowing me to print in a DOS windows:

&#233;&#232;&#239;&#246;&#238;&#244;&#249;&#242;&#232;&#236;&#235;

The translation function is detailed below:

package main

import (
	&quot;bytes&quot;
	&quot;code.google.com/p/go-charset/charset&quot;
	_ &quot;code.google.com/p/go-charset/data&quot;
	&quot;fmt&quot;
	&quot;io&quot;
	&quot;log&quot;
	&quot;strings&quot;
)

func translate(tr charset.Translator, in string) (string, error) {
	var buf bytes.Buffer
	r := charset.NewTranslatingReader(strings.NewReader(in), tr)
	_, err := io.Copy(&amp;buf, r)
	if err != nil {
		return &quot;&quot;, err
	}
	return string(buf.Bytes()), nil
}

func Utf2dos(in string) string {
	dosCharset := &quot;ibm850&quot;
	cs := charset.Info(dosCharset)
	if cs == nil {
		log.Fatal(&quot;no info found for %q&quot;, dosCharset)
	}
	fromtr, err := charset.TranslatorTo(dosCharset)
	if err != nil {
		log.Fatal(&quot;error making translator from %q: %v&quot;, dosCharset, err)
	}
	out, err := translate(fromtr, in)
	if err != nil {
		log.Fatal(&quot;error translating from %q: %v&quot;, dosCharset, err)
	}
	return out
}

func main() {
	test := &quot;&#233;&#232;&#239;&#246;&#238;&#244;&#249;&#242;&#232;&#236;&#235;&quot;
	fmt.Println(&quot;utf-8:\n&quot;, test)
	fmt.Println(&quot;ibm850:\n&quot;, Utf2dos(test))
}

答案3

得分: 2

自2016年起,您现在可以考虑使用golang.org/x/text,它包含了一个编码字符映射,其中包括ISO-8859系列以及Windows 1252字符集。

参见“Go Quickly - 在Golang中转换字符编码

r := charmap.ISO8859_1.NewDecoder().Reader(f)
io.Copy(out, r)

这是一个示例的摘录,打开一个ISO-8859-1源文本(my_isotext.txt),创建一个目标文件(my_utf.txt),并将第一个文件复制到第二个文件中。
但是要从ISO-8859-1解码为UTF-8,我们需要用解码器包装原始文件读取器(f)。

我刚刚进行了测试(伪代码用于说明):

package main

import (
	"fmt"

	"golang.org/x/text/encoding"
	"golang.org/x/text/encoding/charmap"
)

func main() {
	t := "string composed of character in cp 850"
	d := charmap.CodePage850.NewDecoder()
	st, err := d.String(t)
	if err != nil {
		panic(err)
	}
	fmt.Println(st)
}

结果是一个在Windows CMD中可读的字符串。
在这个2018年11月的Reddit帖子中了解更多信息。

英文:

Since 2016, You can now (2017) consider the golang.org/x/text, which comes with a encoding charmap including the ISO-8859 family as well as the Windows 1252 character set.

See "Go Quickly - Converting Character Encodings In Golang"

r := charmap.ISO8859_1.NewDecoder().Reader(f)
io.Copy(out, r)

That is an extract of an example opening a ISO-8859-1 source text (my_isotext.txt), creating a destination file (my_utf.txt), and copying the first to the second.
But to decode from ISO-8859-1 to UTF-8, we wrap the original file reader (f) with a decoder.

I just tested (pseudo-code for illustration):

package main

import (
	&quot;fmt&quot;

	&quot;golang.org/x/text/encoding&quot;
	&quot;golang.org/x/text/encoding/charmap&quot;
)

func main() {
	t := &quot;string composed of character in cp 850&quot;
	d := charmap.CodePage850.NewDecoder()
	st, err := d.String(t)
	if err != nil {
		panic(err)
	}
	fmt.Println(st)
}

The result is a string readable in a Windows CMD.
See more in this Nov. 2018 reddit thread.

答案4

得分: 0

这是Go仍然无法直接完成的事情-请参见http://code.google.com/p/go/issues/detail?id=3376#c6。

英文:

It is something that Go still can't do out of the box - see http://code.google.com/p/go/issues/detail?id=3376#c6.

Alex

huangapple
  • 本文由 发表于 2012年8月21日 18:41:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/12053168.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定