如何将ANSI文本转换为UTF-8

huangapple go评论93阅读模式
英文:

How to convert ansi text to utf8

问题

如何在Go中将ANSI文本转换为UTF8?
我正在尝试将ANSI字符串转换为UTF8字符串。

英文:

How to convert ansi text to utf8 in Go?
I am trying to convert ansi string to utf8 string.

答案1

得分: 6

Go只支持UTF-8字符串。您可以使用从byte[]转换的方法将某个内容转换为UTF8字符串:

http://golang.org/doc/go_spec.html#Conversions

英文:

Go only has UTF-8 strings. You can convert something to a UTF8 string using the conversion described here from a byte[]:

http://golang.org/doc/go_spec.html#Conversions

答案2

得分: 5

这是新的方法。

package main    
import (
    "bytes"
    "fmt"
    "io/ioutil"    
    "golang.org/x/text/encoding/traditionalchinese"
    "golang.org/x/text/transform"
)    
func Decode(s []byte) ([]byte, error) {
    I := bytes.NewReader(s)
    O := transform.NewReader(I, traditionalchinese.Big5.NewDecoder())
    d, e := ioutil.ReadAll(O)
    if e != nil {
        return nil, e
    }
    return d, nil
}
func main() {
    s := []byte{0xB0, 0xAA}
    b, err := Decode(s)
    fmt.Println(string(b))
    fmt.Println(err)
}

我使用iconv-go来进行转换,你必须知道你的ANSI代码页,对我来说,它是'big5'。

package main
import (
    "fmt"
    //iconv "github.com/djimenez/iconv-go"
    iconv "github.com/andelf/iconv-go"
    "log"
)

func main() {
    ibuf := []byte{170,76,80,67}
    var obuf [256]byte

    // 方法1:直接使用Convert
    nR, nW, err := iconv.Convert(ibuf, obuf[:], "big5", "utf-8")
    if err != nil {
        log.Fatalln(err)
    }
    log.Println(nR, ibuf)
    log.Println(obuf[:nW])
    fmt.Println(string(obuf[:nW]))

    // 方法2:先构建一个转换器
    cv, err := iconv.NewConverter("big5", "utf-8")
    if err != nil {
        log.Fatalln(err)
    }
    nR, nW, err = cv.Convert(ibuf, obuf[:])
    if err != nil {
        log.Fatalln(err)
    }
    log.Println(string(obuf[:nW]))
}
英文:

Here is newer method.

package main    
import (
	"bytes"
	"fmt"
	"io/ioutil"    
	"golang.org/x/text/encoding/traditionalchinese"
	"golang.org/x/text/transform"
)    
func Decode(s []byte) ([]byte, error) {
	I := bytes.NewReader(s)
	O := transform.NewReader(I, traditionalchinese.Big5.NewDecoder())
	d, e := ioutil.ReadAll(O)
	if e != nil {
		return nil, e
	}
	return d, nil
}
func main() {
	s := []byte{0xB0, 0xAA}
	b, err := Decode(s)
	fmt.Println(string(b))
	fmt.Println(err)
}

I were use iconv-go to do such convert, you must know what's your ANSI code page, in my case, it is 'big5'.

package main
import (
	"fmt"
	//iconv "github.com/djimenez/iconv-go"
	iconv "github.com/andelf/iconv-go"
	"log"
)

func main() {
	ibuf := []byte{170,76,80,67}
	var obuf [256]byte

    // Method 1: use Convert directly
	nR, nW, err := iconv.Convert(ibuf, obuf[:], "big5", "utf-8")
	if err != nil {
		log.Fatalln(err)
	}
	log.Println(nR, ibuf)
	log.Println(obuf[:nW])
	fmt.Println(string(obuf[:nW]))

    // Method 2: build a converter at first
	cv, err := iconv.NewConverter("big5", "utf-8")
	if err != nil {
		log.Fatalln(err)
	}
	nR, nW, err = cv.Convert(ibuf, obuf[:])
	if err != nil {
		log.Fatalln(err)
	}
	log.Println(string(obuf[:nW]))
}

答案3

得分: 3

我写了一个对我很有用的函数,也许其他人也可以用到。它将Windows-1252转换为UTF-8。我转换了一些Windows-1252将其视为字符,但Unicode认为是控制字符的代码点(http://en.wikipedia.org/wiki/Windows-1252

func fromWindows1252(str string) string {
	var arr = []byte(str)
	var buf bytes.Buffer
	var r rune

	for _, b := range(arr) {
		switch b {
		case 0x80:
			r = 0x20AC
		case 0x82:
			r = 0x201A
		case 0x83:
			r = 0x0192
		case 0x84:
			r = 0x201E
		case 0x85:
			r = 0x2026
		case 0x86:
			r = 0x2020
		case 0x87:
			r = 0x2021
		case 0x88:
			r = 0x02C6
		case 0x89:
			r = 0x2030
		case 0x8A:
			r = 0x0160
		case 0x8B:
			r = 0x2039
		case 0x8C:
			r = 0x0152
		case 0x8E:
			r = 0x017D
		case 0x91:
			r = 0x2018
		case 0x92:
			r = 0x2019
		case 0x93:
			r = 0x201C
		case 0x94:
			r = 0x201D
		case 0x95:
			r = 0x2022
		case 0x96:
			r = 0x2013
		case 0x97:
			r = 0x2014
		case 0x98:
			r = 0x02DC
		case 0x99:
			r = 0x2122
		case 0x9A:
			r = 0x0161
		case 0x9B:
			r = 0x203A
		case 0x9C:
			r = 0x0153
		case 0x9E:
			r = 0x017E
		case 0x9F:
			r = 0x0178
		default:
			r = rune(b)
		}

		buf.WriteRune(r)
	}

	return string(buf.Bytes())
}
英文:

I've written a function that was useful for me, maybe someone else can use this. It converts from Windows-1252 to UTF-8. I've converted some code points that Windows-1252 treats as chars but Unicode considers to be control characters (http://en.wikipedia.org/wiki/Windows-1252)

func fromWindows1252(str string) string {
	var arr = []byte(str)
	var buf bytes.Buffer
	var r rune

	for _, b := range(arr) {
		switch b {
		case 0x80:
			r = 0x20AC
		case 0x82:
			r = 0x201A
		case 0x83:
			r = 0x0192
		case 0x84:
			r = 0x201E
		case 0x85:
			r = 0x2026
		case 0x86:
			r = 0x2020
		case 0x87:
			r = 0x2021
		case 0x88:
			r = 0x02C6
		case 0x89:
			r = 0x2030
		case 0x8A:
			r = 0x0160
		case 0x8B:
			r = 0x2039
		case 0x8C:
			r = 0x0152
		case 0x8E:
			r = 0x017D
		case 0x91:
			r = 0x2018
		case 0x92:
			r = 0x2019
		case 0x93:
			r = 0x201C
		case 0x94:
			r = 0x201D
		case 0x95:
			r = 0x2022
		case 0x96:
			r = 0x2013
		case 0x97:
			r = 0x2014
		case 0x98:
			r = 0x02DC
		case 0x99:
			r = 0x2122
		case 0x9A:
			r = 0x0161
		case 0x9B:
			r = 0x203A
		case 0x9C:
			r = 0x0153
		case 0x9E:
			r = 0x017E
		case 0x9F:
			r = 0x0178
		default:
			r = rune(b)
		}

		buf.WriteRune(r)
	}

	return string(buf.Bytes())
}

答案4

得分: 2

没有办法不自己编写转换代码或使用第三方软件包来完成。你可以尝试使用这个:http://code.google.com/p/go-charset

英文:

There is no way to do it without writing the conversion yourself or using a third-party package. You could try using this: http://code.google.com/p/go-charset

答案5

得分: 2

golang.org/x/text/encoding/charmap包中有专门用于解决这个问题的函数。

import "golang.org/x/text/encoding/charmap"

func DecodeWindows1250(enc []byte) string {
    dec := charmap.Windows1250.NewDecoder()
    out, _ := dec.Bytes(enc)
    return string(out)
}

func EncodeWindows1250(inp string) []byte {
    enc := charmap.Windows1250.NewEncoder()
    out, _ := enc.String(inp)
    return out
}

编辑:undefined: ba应替换为enc

英文:

golang.org/x/text/encoding/charmap package has functions exactly for this problem

import "golang.org/x/text/encoding/charmap"

func DecodeWindows1250(enc []byte) string {
	dec := charmap.Windows1250.NewDecoder()
	out, _ := dec.Bytes(enc)
	return string(out)
}

func EncodeWindows1250(inp string) []byte {
	enc := charmap.Windows1250.NewEncoder()
	out, _ := enc.String(inp)
	return out
}

Edit: undefined: ba is replace enc

huangapple
  • 本文由 发表于 2011年8月3日 21:56:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/6927611.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定