2023年7月14日 19:15:30go评论90阅读模式

英文:

Go bufio.Writer, gzip.Writer and upload to AWS S3 in memory

问题

我正在尝试从内存中写入一个压缩文件并上传到S3。

我正在将一个大型的type Data struct数组序列化到一个bufio.Writer中，该写入器以逐行的方式写入到一个gzip.Writer中：

### 数据和序列化

type Data struct {
  field_1 int
  field_2 string
}

func (d *Data) Serialize() []byte {
  return []byte(fmt.Sprintf(`%d;%s\n`, d.field_1, d.field_2))
}

### 创建压缩字节文件

var datas []*Data   // 假设这个数组已经填充好了

buffer := &bytes.Buffer{}
compressor := gzip.NewWriter(buffer)
writer := bufio.NewWriter(compressor)

for _, data := range datas {
  writer.Write(data.Serialize())
}

writer.Flush()
compressor.Close()

### 上传压缩文件到S3

key := "file.gz"
payload := bytes.NewReader(buffer.Bytes())

upload := &s3.PutObjectInput{
  Body:   payload,
  Bucket: aws.String(bucket),
  Key:    aws.String(key),
}

这个方法可以工作，速度似乎很快，效率也还可以。

然而，生成的文件虽然在Linux下被认为是文本文件，但不会保留通过\n添加的换行符。我不确定这是否是一个特定于操作系统的问题，或者是通过某种方式定义文件类型的问题（例如，使用文件格式结尾file.txt.gz或file.csv.gz，或者通过添加特定的头字节），或者是我创建这些文件的方式有问题。

创建一个完全合格的内存文件类型作为[]byte（或者一般情况下在io.ReadSeeker接口中），最好是以逐行的方式上传到S3的正确方法是什么？

更新：

我通过将字符串包装在fmt.Sprintln的调用中解决了这个问题：

func (d *Data) Serialize() []byte {
  return []byte(fmt.Sprintln(fmt.Sprintf(`%d;%s`, d.field_1, d.field_2)))
}

当查看fmt.Sprintln的实现时，它会附加\n符号 - 我可能不知道其中的微妙差别。

英文:

I am attempting to write a compressed file from memory and upload to S3.

I am serializing a large array of type Data struct into a bufio.Writer that writes to a gzip.Writer in a line-by-line fashion:

### DATA AND SERIALIZATION

type Data struct {
  field_1 int
  field_2 string
}

func (d *Data) Serialize() []byte {
  return []byte( fmt.Sprintf(`%d;%s\n`, d.field_1, d.field_2) )
}

### CREATE FILE AS COMPRESSED BYTES

var datas []*Data   // assume this is filled

buffer := &amp;bytes.Buffer{}
compressor := gzip.NewWriter(buffer)
writer := bufio.NewWriter(compressor)

for _, data := range datas {
  writer.Write(data.Serialize())
}

writer.Flush()
compressor.Close()

### UPLOAD COMPRESSED FILE TO S3

key := &quot;file.gz&quot;
payload := bytes.NewReader(buffer.Bytes())

upload := &amp;s3.PutObjectInput{
  Body:   payload,
  Bucket: aws.String(bucket),
  Key:    aws.String(key),
}

This works, seems fast and somewhat efficient.

However, the resulting file, although considered a text file under Linux, does not honor the line breaks added via \n. Not sure if this is an OS specific issue, an issue with defining the file type by some means (e.g. use a file format ending file.txt.gz or file.csv.gz, or by adding specific header bytes), or an issue with the way I am creating these file in the first place.

What would be the proper way to create a fully qualified in-memory file type as []byte (or inside an io.ReadSeeker interface in general) to upload to S3, preferably in a line-by-line fashion?

Update:

I was able to solve this by wrapping the string in a call to fmt.Sprintln:

func (d *Data) Serialize() []byte {
  return []byte( fmt.Sprintln(fmt.Sprintf(`%d;%s`, d.field_1, d.field_2) )
}

When looking at the implementation of fmt.Sprintln it appends the \n rune - there must be subtle differences I am not aware of.

答案1

得分: 1

将

`%d;%s\n`

替换为

"%d;%s\n"

%d;%s\n 是一个原始字符串字面量。在原始字符串字面量中，反斜杠没有特殊含义。请参考语言规范中的字符串字面量：

原始字符串字面量是位于反引号之间的字符序列，例如 `foo`。在引号内，除了反引号之外，任何字符都可以出现。原始字符串字面量的值是由引号之间的未解释（隐式UTF-8编码）字符组成的字符串；特别地，反斜杠没有特殊含义，字符串可以包含换行符。

英文:

Replace

`%d;%s\n`

with

&quot;%d;%s\n&quot;

`%d;%s\n` is a raw string literal. And in a raw string literal, backslashes have no special meaning. See String literals in the language spec:

> Raw string literals are character sequences between back quotes, as in `foo`. Within the quotes, any character may appear except back quote. The value of a raw string literal is the string composed of the uninterpreted (implicitly UTF-8-encoded) characters between the quotes; in particular, backslashes have no special meaning and the string may contain newlines.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Go bufio.Writer, gzip.Writer and upload to AWS S3 in memory

问题

答案1

如何将上传的文件作为数组接收

在Go语言中生成有效的随机Faker值

How to use variable package selector in go

通过父实体筛选数据存储结果

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论