如何在golang中将UUID原始的16字节写入CSV文件?

huangapple go评论75阅读模式
英文:

Howto write UUID raw 16 bytes to CSV in golang?

问题

我有以下代码,试图将UUID原始的16个字节(其中包含0x0A)保存为CSV格式。

package main

import (
	"encoding/csv"
	"github.com/satori/go.uuid"
	"log"
	"os"
)

func main() {
	u, err := uuid.FromString("e1393c62-877a-4adc-8ffb-f1bf0a337c5f")
	if err != nil {
		log.Fatal(err)
	}
	csv_file, err := os.OpenFile("csv_wtf.csv", os.O_WRONLY|os.O_CREATE, 0644)
	if err != nil {
		log.Fatal(err)
	}
	s := string(u.Bytes())
	log.Printf("len(s)=%d",len(s))
	csv_writer := csv.NewWriter(csv_file)
	csv_writer.UseCRLF = false
	csv_writer.Write([]string{s})
	csv_writer.Flush()
	finfo, err := csv_file.Stat()
	if err != nil {
		log.Fatal(err)
	}
	log.Printf("size csv_wtf.csv = %d", finfo.Size())
	csv_file.Close()
}

这段代码将数据输出到CSV文件中,并添加了额外的字节。

2017/04/16 12:37:14 len(s)=16
2017/04/16 12:37:14 size csv_wtf.csv = 29

为什么在我的字符串超出范围时,encoding/csv会添加额外的字节(请参见https://golang.org/src/encoding/csv/writer.go#L38,https://golang.org/src/encoding/csv/writer.go#L50和https://golang.org/src/encoding/csv/writer.go#L76)?

有人可以帮我找到不进行奇怪转换的CSV包吗?

英文:

I have following code, which try save UUID raw 16 bytes (with 0x0A inside) to CSV format

package main

import (
	"encoding/csv"
	"github.com/satori/go.uuid"
	"log"
	"os"
)

func main() {
	u, err := uuid.FromString("e1393c62-877a-4adc-8ffb-f1bf0a337c5f")
	if err != nil {
		log.Fatal(err)
	}
	csv_file, err := os.OpenFile("csv_wtf.csv", os.O_WRONLY|os.O_CREATE, 0644)
	if err != nil {
		log.Fatal(err)
	}
	s := string(u.Bytes())
	log.Printf("len(s)=%d",len(s))
	csv_writer := csv.NewWriter(csv_file)
	csv_writer.UseCRLF = false
	csv_writer.Write([]string{s})
	csv_writer.Flush()
	finfo, err := csv_file.Stat()
	if err != nil {
		log.Fatal(err)
	}
	log.Printf("size csv_wtf.csv = %d", finfo.Size())
	csv_file.Close()
}

this code output data to csv with add extra bytes

2017/04/16 12:37:14 len(s)=16
2017/04/16 12:37:14 size csv_wtf.csv = 29

why encoding/csv add extra bytes when follow my string over range (see https://golang.org/src/encoding/csv/writer.go#L38, https://golang.org/src/encoding/csv/writer.go#L50 and https://golang.org/src/encoding/csv/writer.go#L76)?

could somebody help me find CSV package who don't do it strange conversion ??

答案1

得分: 3

这是因为CSV格式不适合存储原始二进制数据,这些数据不太可能是有效的utf-8序列。

csv_writer.Writerange循环中迭代字符串时,每当遇到无效的utf-8序列时,符文r1的值将等于65533,它被编码为3个字节:0xef, 0xbf, 0xbd

举个例子:

package main

import (
	"bytes"
	"fmt"
)

func main() {
	invalidString := string([]byte{0xff, 0xfe, 0xfd})
	var b bytes.Buffer
	for _, r := range invalidString {
		fmt.Printf("当前符文:%v\n", r)
		b.WriteRune(r)
	}

	fmt.Printf("总数据:%v\n", b.Bytes())
}

输出结果为:

当前符文:65533
当前符文:65533
当前符文:65533
总数据:[239 191 189 239 191 189 239 191 189]

因此,你应该放弃使用CSV,选择其他适合存储二进制数据的格式,或者将UUID以字符串形式存储。

英文:

This is because CSV format is not suitable for storing raw binary data, which is unlikely to be a valid utf-8 sequence.

What happens is that when csv_writer.Write iterates a string with range loop, every time it encounters an invalid utf-8 sequence, the rune r1 gets equal to 65533, which is encoded as 3 bytes: 0xef, 0xbf, 0xbd.

Illustrative example:

package main

import (
	"bytes"
	"fmt"
)

func main() {
	invalidString := string([]byte{0xff, 0xfe, 0xfd})
	var b bytes.Buffer
	for _, r := range invalidString {
		fmt.Printf("current rune: %v\n", r)
		b.WriteRune(r)
	}

	fmt.Printf("total data: %v\n", b.Bytes())
}

The output is:

current rune: 65533
current rune: 65533
current rune: 65533
total data: [239 191 189 239 191 189 239 191 189]

So you should either abandon CSV in favour of some other format (suitable for storing binary data), or store UUIDs in their string form.

huangapple
  • 本文由 发表于 2017年4月16日 18:16:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/43436121.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定