将 []string 转换为 []byte

huangapple go评论87阅读模式
英文:

Convert []string to []byte

问题

我正在寻找一种将字符串数组转换为字节数组的方法,以便将其写入磁盘。有什么最佳解决方案可以将字符串数组([]string)编码和解码为字节数组([]byte)?

我考虑了两次迭代字符串数组的方法,第一次是为了获取字节数组所需的实际大小,然后第二次是为了为每个元素写入长度和实际字符串([]byte(str))。

解决方案必须能够将其从[]byte转换回[]string

英文:

I am looking to convert a string array to a byte array in GO so I can write it down to a disk. What is an optimal solution to encode and decode a string array ([]string) to a byte array ([]byte)?

I was thinking of iterating the string array twice, first one to get the actual size needed for the byte array and then a second one to write the length and actual string ([]byte(str)) for each element.

The solution must be able to convert it the other-way; from a []byte to a []string.

答案1

得分: 31

让我们暂时忽略这是Go语言。你首先需要一个序列化格式来将[]string编组成。

这里有很多选择。你可以自己构建一个,或者使用一个库。我假设你不想自己构建,直接跳到Go支持的序列化格式。

在所有的例子中,data是[]string,fp是你要读取/写入的文件。错误被忽略了,检查函数的返回值来处理错误。

Gob

Gob是一个仅用于Go的二进制格式。随着字符串数量的增加,它应该相对节省空间。

enc := gob.NewEncoder(fp)
enc.Encode(data)

读取也很简单

var data []string
dec := gob.NewDecoder(fp)
dec.Decode(&data)

Gob简单明了。然而,该格式只能被其他Go代码读取。

Json

接下来是json。Json是一种几乎在任何地方都使用的格式。这种格式同样易于使用。

enc := json.NewEncoder(fp)
enc.Encode(data)

读取的方式如下:

var data []string
dec := json.NewDecoder(fp)
dec.Decode(&data)

XML

XML是另一种常见的格式。然而,它的开销相对较高,使用起来也不太容易。虽然你可以像gob和json那样简单地处理,但是正确的xml需要一个根标签。在这个例子中,我们使用根标签"Strings",每个字符串都包裹在"S"标签中。

type Strings struct {
    S []string
}

enc := xml.NewEncoder(fp)
enc.Encode(Strings{data})

var x Strings
dec := xml.NewDecoder(fp)
dec.Decode(&x)
data := x.S

CSV

CSV与其他格式不同。你有两个选择,使用一个记录和n行,或者n个记录和1行。下面的例子使用n个记录。如果使用一个记录会很无聊,它看起来太像其他的了。CSV只能保存字符串。

enc := csv.NewWriter(fp)
for _, v := range data {
    enc.Write([]string{v})
}
enc.Flush()

读取的方式如下:

var err error
var data string
dec := csv.NewReader(fp)
for err == nil {        // 当遇到错误时(可能是io.EOF),读取结束
    var s []string

    s, err = dec.Read()
    if len(s) > 0 {
        data = append(data, s[0])
    }
}

使用哪种格式取决于个人偏好。还有许多其他可能的编码方式我没有提到。例如,有一个名为bencode的外部库。我个人不喜欢bencode,但它可以工作。它是BitTorrent元数据文件使用的相同编码方式。

如果你想自己创建编码方式,encoding/binary是一个很好的起点。这将允许你创建最紧凑的文件,但我认为这样做并不值得努力。

英文:

Lets ignore the fact that this is Go for a second. The first thing you need is a serialization format to marshal the []string into.

There are many option here. You could build your own or use a library. I am going to assume you don't want to build your own and jump to serialization formats go supports.

In all examples, data is the []string and fp is the file you are reading/writing to. Errors are being ignored, check the returns of functions to handle errors.

Gob

Gob is a go only binary format. It should be relatively space efficient as the number of strings increases.

enc := gob.NewEncoder(fp)
enc.Encode(data)

Reading is also simple

var data []string
dec := gob.NewDecoder(fp)
dec.Decode(&data)

Gob is simple and to the point. However, the format is only readable with other Go code.

Json

Next is json. Json is a format used just about everywhere. This format is just as easy to use.

enc := json.NewEncoder(fp)
enc.Encode(data)

And for reading:

var data []string
dec := json.NewDecoder(fp)
dec.Decode(&data)

XML

XML is another common format. However, it has pretty high overhead and not as easy to use. While you could just do the same you did for gob and json, proper xml requires a root tag. In this case, we are using the root tag "Strings" and each string is wrapped in an "S" tag.

type Strings struct {
    S []string
}

enc := xml.NewEncoder(fp)
enc.Encode(Strings{data})

var x Strings
dec := xml.NewDecoder(fp)
dec.Decode(&x)
data := x.S

CSV

CSV is different from the others. You have two options, use one record with n rows or n records with 1 row. The following example uses n records. It would be boring if I used one record. It would look too much like the others. CSV can ONLY hold strings.

enc := csv.NewWriter(fp)
for _, v := range data {
    enc.Write([]string{v})
}
enc.Flush()

To read:

var err error
var data string
dec := csv.NewReader(fp)
for err == nil {        // reading ends when an error is reached (perhaps io.EOF)
    var s []string

    s, err = dec.Read()
    if len(s) > 0 {
        data = append(data, s[0])
    }
}

Which format you use is a matter of preference. There are many other possible encodings that I have not mentioned. For example, there is an external library called bencode. I don't personally like bencode, but it works. It is the same encoding used by bittorrent metadata files.

If you want to make your own encoding, encoding/binary is a good place to start. That would allow you to make the most compact file possible, but I hardly thing it is worth the effort.

答案2

得分: 11

The gob package will do this for you http://godoc.org/encoding/gob

Example to play with http://play.golang.org/p/e0FEZm-qiS

same source code is below.

package main

import (
	"bytes"
	"encoding/gob"
	"fmt"
)

func main() {
	// store to byte array
	strs := []string{"foo", "bar"}
	buf := &bytes.Buffer{}
	gob.NewEncoder(buf).Encode(strs)
	bs := buf.Bytes()
	fmt.Printf("%q", bs)

	// Decode it back
	strs2 := []string{}
	gob.NewDecoder(buf).Decode(&strs2)
	fmt.Printf("%v", strs2)
}
英文:

The gob package will do this for you http://godoc.org/encoding/gob

Example to play with http://play.golang.org/p/e0FEZm-qiS

same source code is below.

package main

import (
	"bytes"
	"encoding/gob"
	"fmt"
)

func main() {
	// store to byte array
	strs := []string{"foo", "bar"}
	buf := &bytes.Buffer{}
	gob.NewEncoder(buf).Encode(strs)
	bs := buf.Bytes()
	fmt.Printf("%q", bs)

	// Decode it back
	strs2 := []string{}
	gob.NewDecoder(buf).Decode(&strs2)
	fmt.Printf("%v", strs2)
}

答案3

得分: 2

var str = []string{"str1","str2"}
var x = []byte{}

for i:=0; i<len(str); i++{
b := []byte(str[i])
for j:=0; j<len(b); j++{
x = append(x,b[j])
}
}

英文:

to convert []string to []byte

var str = []string{&quot;str1&quot;,&quot;str2&quot;}
var x = []byte{}

for i:=0; i&lt;len(str); i++{
    b := []byte(str[i])
    for j:=0; j&lt;len(b); j++{
        x = append(x,b[j])
    }
}

to convert []byte to string

str := &quot;&quot;
var x = []byte{&#39;c&#39;,&#39;a&#39;,&#39;t&#39;}
for i := 0; i &lt; len(x); i++ {
    str += string(x[i])
}

答案4

得分: 2

为了说明问题,将[]string转换为[]byte,然后将[]byte转换回[]string,这里有一个简单的解决方案:

package main

import (
	"encoding/binary"
	"fmt"
)

const maxInt32 = 1<<(32-1) - 1

func writeLen(b []byte, l int) []byte {
	if 0 > l || l > maxInt32 {
		panic("writeLen: invalid length")
	}
	var lb [4]byte
	binary.BigEndian.PutUint32(lb[:], uint32(l))
	return append(b, lb[:]...)
}

func readLen(b []byte) ([]byte, int) {
	if len(b) < 4 {
		panic("readLen: invalid length")
	}
	l := binary.BigEndian.Uint32(b)
	if l > maxInt32 {
		panic("readLen: invalid length")
	}
	return b[4:], int(l)
}

func Decode(b []byte) []string {
	b, ls := readLen(b)
	s := make([]string, ls)
	for i := range s {
		b, ls = readLen(b)
		s[i] = string(b[:ls])
		b = b[ls:]
	}
	return s
}

func Encode(s []string) []byte {
	var b []byte
	b = writeLen(b, len(s))
	for _, ss := range s {
		b = writeLen(b, len(ss))
		b = append(b, ss...)
	}
	return b
}

func codecEqual(s []string) bool {
	return fmt.Sprint(s) == fmt.Sprint(Decode(Encode(s)))
}

func main() {
	var s []string
	fmt.Println("equal", codecEqual(s))
	s = []string{"", "a", "bc"}
	e := Encode(s)
	d := Decode(e)
	fmt.Println("s", len(s), s)
	fmt.Println("e", len(e), e)
	fmt.Println("d", len(d), d)
	fmt.Println("equal", codecEqual(s))
}

输出:

equal true
s 3 [ a bc]
e 19 [0 0 0 3 0 0 0 0 0 0 0 1 97 0 0 0 2 98 99]
d 3 [ a bc]
equal true
英文:

To illustrate the problem, convert []string to []byte and then convert []byte back to []string, here's a simple solution:

package main
import (
&quot;encoding/binary&quot;
&quot;fmt&quot;
)
const maxInt32 = 1&lt;&lt;(32-1) - 1
func writeLen(b []byte, l int) []byte {
if 0 &gt; l || l &gt; maxInt32 {
panic(&quot;writeLen: invalid length&quot;)
}
var lb [4]byte
binary.BigEndian.PutUint32(lb[:], uint32(l))
return append(b, lb[:]...)
}
func readLen(b []byte) ([]byte, int) {
if len(b) &lt; 4 {
panic(&quot;readLen: invalid length&quot;)
}
l := binary.BigEndian.Uint32(b)
if l &gt; maxInt32 {
panic(&quot;readLen: invalid length&quot;)
}
return b[4:], int(l)
}
func Decode(b []byte) []string {
b, ls := readLen(b)
s := make([]string, ls)
for i := range s {
b, ls = readLen(b)
s[i] = string(b[:ls])
b = b[ls:]
}
return s
}
func Encode(s []string) []byte {
var b []byte
b = writeLen(b, len(s))
for _, ss := range s {
b = writeLen(b, len(ss))
b = append(b, ss...)
}
return b
}
func codecEqual(s []string) bool {
return fmt.Sprint(s) == fmt.Sprint(Decode(Encode(s)))
}
func main() {
var s []string
fmt.Println(&quot;equal&quot;, codecEqual(s))
s = []string{&quot;&quot;, &quot;a&quot;, &quot;bc&quot;}
e := Encode(s)
d := Decode(e)
fmt.Println(&quot;s&quot;, len(s), s)
fmt.Println(&quot;e&quot;, len(e), e)
fmt.Println(&quot;d&quot;, len(d), d)
fmt.Println(&quot;equal&quot;, codecEqual(s))
}

Output:

equal true
s 3 [ a bc]
e 19 [0 0 0 3 0 0 0 0 0 0 0 1 97 0 0 0 2 98 99]
d 3 [ a bc]
equal true

答案5

得分: 2

使用strings包可以很容易地完成。首先,您需要将字符串切片转换为字符串。

func Join(elems []string, sep string) string

您需要传递字符串切片和您需要在字符串中分隔元素的分隔符(例如:空格或逗号)。

然后,您可以通过类型转换将字符串轻松地转换为字节切片。

package main
import (
"fmt"
"strings"
)
func main() {
//字符串切片
sliceStr := []string{"a","b","c","d"}
fmt.Println(sliceStr) //打印 [a b c d]
//将字符串切片转换为字符串
str := strings.Join(sliceStr,"")
fmt.Println(str)  // 打印 abcd
//将字符串转换为字节切片
sliceByte := []byte(str) //打印 [97 98 99 100]
fmt.Println(sliceByte)
//将字节切片转换为字符串
str2 := string(sliceByte)
fmt.Println(str2) // 打印 abcd
//将字符串转换为字符串切片
sliceStr2 := strings.Split(str2,"")
fmt.Println(sliceStr2) //打印 [a b c d]
}
英文:

It can be done easily using strings package. First you need to convert the slice of string to a string.

func Join(elems []string, sep string) string

You need to pass the slice of strings and the separator you need to separate the elements in the string. (examples: space or comma)

Then you can easily convert the string to a slice of bytes by type conversion.

package main
import (
&quot;fmt&quot;
&quot;strings&quot;
)
func main() {
//Slice of Strings
sliceStr := []string{&quot;a&quot;,&quot;b&quot;,&quot;c&quot;,&quot;d&quot;}
fmt.Println(sliceStr) //prints [a b c d]
//Converting slice of String to String
str := strings.Join(sliceStr,&quot;&quot;)
fmt.Println(str)  // prints abcd
//Converting String to slice of Bytes
sliceByte := []byte(str) //prints [97 98 99 100]
fmt.Println(sliceByte)
//Converting slice of bytes a String
str2 := string(sliceByte)
fmt.Println(str2) // prints abcd
//Converting string to a slice of Strings
sliceStr2 := strings.Split(str2,&quot;&quot;)
fmt.Println(sliceStr2) //prints [a b c d]
}

答案6

得分: 1

我建议使用PutUvarintUvarint来存储/检索len(s),并使用[]byte(str)str传递给某个io.Writer。通过从Uvarint获取字符串长度,可以创建buf := make([]byte, n)并将buf传递给某个io.Reader

在字符串数组之前添加整个内容的长度,并对其所有项重复上述操作。再次读取整个内容时,首先读取外部长度,然后重复n次读取项。

英文:

I would suggest to use PutUvarint and Uvarint for storing/retrieving len(s) and using []byte(str) to pass str to some io.Writer. With a string length known from Uvarint, one can buf := make([]byte, n) and pass the buf to some io.Reader.

Prepend the whole thing with length of the string array and repeat the above for all of its items. Reading the whole thing back is again reading first the outer length and repeating n-times the item read.

答案7

得分: 1

你可以这样做:

var lines = []string
var ctx = []byte{}
for _, s := range lines {
	ctx = append(ctx, []byte(s)...)
}
英文:

You can do something like this:

var lines = []string
var ctx = []byte{}
for _, s := range lines {
	ctx = append(ctx, []byte(s)...)
}

huangapple
  • 本文由 发表于 2012年11月27日 05:14:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/13573269.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定