英文:
Go bufio.Writer, gzip.Writer and upload to AWS S3 in memory
问题
我正在尝试从内存中写入一个压缩文件并上传到S3。
我正在将一个大型的type Data struct
数组序列化到一个bufio.Writer
中,该写入器以逐行的方式写入到一个gzip.Writer
中:
### 数据和序列化
type Data struct {
field_1 int
field_2 string
}
func (d *Data) Serialize() []byte {
return []byte(fmt.Sprintf(`%d;%s\n`, d.field_1, d.field_2))
}
### 创建压缩字节文件
var datas []*Data // 假设这个数组已经填充好了
buffer := &bytes.Buffer{}
compressor := gzip.NewWriter(buffer)
writer := bufio.NewWriter(compressor)
for _, data := range datas {
writer.Write(data.Serialize())
}
writer.Flush()
compressor.Close()
### 上传压缩文件到S3
key := "file.gz"
payload := bytes.NewReader(buffer.Bytes())
upload := &s3.PutObjectInput{
Body: payload,
Bucket: aws.String(bucket),
Key: aws.String(key),
}
这个方法可以工作,速度似乎很快,效率也还可以。
然而,生成的文件虽然在Linux下被认为是文本文件,但不会保留通过\n
添加的换行符。我不确定这是否是一个特定于操作系统的问题,或者是通过某种方式定义文件类型的问题(例如,使用文件格式结尾file.txt.gz
或file.csv.gz
,或者通过添加特定的头字节),或者是我创建这些文件的方式有问题。
创建一个完全合格的内存文件类型作为[]byte
(或者一般情况下在io.ReadSeeker
接口中),最好是以逐行的方式上传到S3的正确方法是什么?
更新:
我通过将字符串包装在fmt.Sprintln
的调用中解决了这个问题:
func (d *Data) Serialize() []byte {
return []byte(fmt.Sprintln(fmt.Sprintf(`%d;%s`, d.field_1, d.field_2)))
}
当查看fmt.Sprintln
的实现时,它会附加\n
符号 - 我可能不知道其中的微妙差别。
英文:
I am attempting to write a compressed file from memory and upload to S3.
I am serializing a large array of type Data struct
into a bufio.Writer
that writes to a gzip.Writer
in a line-by-line fashion:
### DATA AND SERIALIZATION
type Data struct {
field_1 int
field_2 string
}
func (d *Data) Serialize() []byte {
return []byte( fmt.Sprintf(`%d;%s\n`, d.field_1, d.field_2) )
}
### CREATE FILE AS COMPRESSED BYTES
var datas []*Data // assume this is filled
buffer := &bytes.Buffer{}
compressor := gzip.NewWriter(buffer)
writer := bufio.NewWriter(compressor)
for _, data := range datas {
writer.Write(data.Serialize())
}
writer.Flush()
compressor.Close()
### UPLOAD COMPRESSED FILE TO S3
key := "file.gz"
payload := bytes.NewReader(buffer.Bytes())
upload := &s3.PutObjectInput{
Body: payload,
Bucket: aws.String(bucket),
Key: aws.String(key),
}
This works, seems fast and somewhat efficient.
However, the resulting file, although considered a text file under Linux, does not honor the line breaks added via \n
. Not sure if this is an OS specific issue, an issue with defining the file type by some means (e.g. use a file format ending file.txt.gz
or file.csv.gz
, or by adding specific header bytes), or an issue with the way I am creating these file in the first place.
What would be the proper way to create a fully qualified in-memory file type as []byte
(or inside an io.ReadSeeker
interface in general) to upload to S3, preferably in a line-by-line fashion?
Update:
I was able to solve this by wrapping the string in a call to fmt.Sprintln
:
func (d *Data) Serialize() []byte {
return []byte( fmt.Sprintln(fmt.Sprintf(`%d;%s`, d.field_1, d.field_2) )
}
When looking at the implementation of fmt.Sprintln
it appends the \n
rune - there must be subtle differences I am not aware of.
答案1
得分: 1
将
`%d;%s\n`
替换为
"%d;%s\n"
%d;%s\n
是一个原始字符串字面量。在原始字符串字面量中,反斜杠没有特殊含义。请参考语言规范中的字符串字面量:
原始字符串字面量是位于反引号之间的字符序列,例如 `foo`。在引号内,除了反引号之外,任何字符都可以出现。原始字符串字面量的值是由引号之间的未解释(隐式UTF-8编码)字符组成的字符串;特别地,反斜杠没有特殊含义,字符串可以包含换行符。
英文:
Replace
`%d;%s\n`
with
"%d;%s\n"
`%d;%s\n` is a raw string literal. And in a raw string literal, backslashes have no special meaning. See String literals in the language spec:
> Raw string literals are character sequences between back quotes, as in `foo`. Within the quotes, any character may appear except back quote. The value of a raw string literal is the string composed of the uninterpreted (implicitly UTF-8-encoded) characters between the quotes; in particular, backslashes have no special meaning and the string may contain newlines.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论