Go bufio.Writer, gzip.Writer and upload to AWS S3 in memory

huangapple go评论69阅读模式
英文:

Go bufio.Writer, gzip.Writer and upload to AWS S3 in memory

问题

我正在尝试从内存中写入一个压缩文件并上传到S3。

我正在将一个大型的type Data struct数组序列化到一个bufio.Writer中,该写入器以逐行的方式写入到一个gzip.Writer中:

### 数据和序列化

type Data struct {
  field_1 int
  field_2 string
}

func (d *Data) Serialize() []byte {
  return []byte(fmt.Sprintf(`%d;%s\n`, d.field_1, d.field_2))
}
### 创建压缩字节文件

var datas []*Data   // 假设这个数组已经填充好了

buffer := &bytes.Buffer{}
compressor := gzip.NewWriter(buffer)
writer := bufio.NewWriter(compressor)

for _, data := range datas {
  writer.Write(data.Serialize())
}

writer.Flush()
compressor.Close()
### 上传压缩文件到S3

key := "file.gz"
payload := bytes.NewReader(buffer.Bytes())

upload := &s3.PutObjectInput{
  Body:   payload,
  Bucket: aws.String(bucket),
  Key:    aws.String(key),
}

这个方法可以工作,速度似乎很快,效率也还可以。

然而,生成的文件虽然在Linux下被认为是文本文件,但不会保留通过\n添加的换行符。我不确定这是否是一个特定于操作系统的问题,或者是通过某种方式定义文件类型的问题(例如,使用文件格式结尾file.txt.gzfile.csv.gz,或者通过添加特定的头字节),或者是我创建这些文件的方式有问题。

创建一个完全合格的内存文件类型作为[]byte(或者一般情况下在io.ReadSeeker接口中),最好是以逐行的方式上传到S3的正确方法是什么?


更新:

我通过将字符串包装在fmt.Sprintln的调用中解决了这个问题:

func (d *Data) Serialize() []byte {
  return []byte(fmt.Sprintln(fmt.Sprintf(`%d;%s`, d.field_1, d.field_2)))
}

当查看fmt.Sprintln的实现时,它会附加\n符号 - 我可能不知道其中的微妙差别。

英文:

I am attempting to write a compressed file from memory and upload to S3.

I am serializing a large array of type Data struct into a bufio.Writer that writes to a gzip.Writer in a line-by-line fashion:

### DATA AND SERIALIZATION

type Data struct {
  field_1 int
  field_2 string
}

func (d *Data) Serialize() []byte {
  return []byte( fmt.Sprintf(`%d;%s\n`, d.field_1, d.field_2) )
}
### CREATE FILE AS COMPRESSED BYTES

var datas []*Data   // assume this is filled

buffer := &bytes.Buffer{}
compressor := gzip.NewWriter(buffer)
writer := bufio.NewWriter(compressor)

for _, data := range datas {
  writer.Write(data.Serialize())
}

writer.Flush()
compressor.Close()
### UPLOAD COMPRESSED FILE TO S3

key := "file.gz"
payload := bytes.NewReader(buffer.Bytes())

upload := &s3.PutObjectInput{
  Body:   payload,
  Bucket: aws.String(bucket),
  Key:    aws.String(key),
}

This works, seems fast and somewhat efficient.

However, the resulting file, although considered a text file under Linux, does not honor the line breaks added via \n. Not sure if this is an OS specific issue, an issue with defining the file type by some means (e.g. use a file format ending file.txt.gz or file.csv.gz, or by adding specific header bytes), or an issue with the way I am creating these file in the first place.

What would be the proper way to create a fully qualified in-memory file type as []byte (or inside an io.ReadSeeker interface in general) to upload to S3, preferably in a line-by-line fashion?


Update:

I was able to solve this by wrapping the string in a call to fmt.Sprintln:

func (d *Data) Serialize() []byte {
  return []byte( fmt.Sprintln(fmt.Sprintf(`%d;%s`, d.field_1, d.field_2) )
}

When looking at the implementation of fmt.Sprintln it appends the \n rune - there must be subtle differences I am not aware of.

答案1

得分: 1

`%d;%s\n`

替换为

"%d;%s\n"

%d;%s\n 是一个原始字符串字面量。在原始字符串字面量中,反斜杠没有特殊含义。请参考语言规范中的字符串字面量

原始字符串字面量是位于反引号之间的字符序列,例如 `foo`。在引号内,除了反引号之外,任何字符都可以出现。原始字符串字面量的值是由引号之间的未解释(隐式UTF-8编码)字符组成的字符串;特别地,反斜杠没有特殊含义,字符串可以包含换行符。

英文:

Replace

`%d;%s\n`

with

"%d;%s\n"

`%d;%s\n` is a raw string literal. And in a raw string literal, backslashes have no special meaning. See String literals in the language spec:

> Raw string literals are character sequences between back quotes, as in `foo`. Within the quotes, any character may appear except back quote. The value of a raw string literal is the string composed of the uninterpreted (implicitly UTF-8-encoded) characters between the quotes; in particular, backslashes have no special meaning and the string may contain newlines.

huangapple
  • 本文由 发表于 2023年7月14日 19:15:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76687154.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定