英文:
How to save data streams in S3? aws-sdk-go example not working?
问题
我正在尝试将给定的数据流持久化到兼容S3的存储中。在流结束之前,无法确定其大小,可能会在5MB到约500GB之间变化。
我尝试了不同的可能性,但没有找到比自己实现分片更好的解决方案。我最好的猜测是创建一个固定大小的缓冲区,将流填充到其中,然后将其写入S3。是否有更好的解决方案?也许有一种方式可以对我来说是透明的,而不需要将整个流写入内存中?
aws-sdk-go的自述文件中有一个示例程序,它从stdin获取数据并将其写入S3:https://github.com/aws/aws-sdk-go#using-the-go-sdk
当我尝试使用管道|
传输数据时,出现以下错误:
failed to upload object, SerializationError: failed to compute request body size
caused by: seek /dev/stdin: illegal seek
我是做错了什么还是示例程序不按照我的预期工作?
我还尝试了minio-go,使用PutObject()或client.PutObjectStreaming()。这是可行的,但会消耗与要存储的数据一样多的内存。
- 是否有更好的解决方案?
- 是否有一个小的示例程序,可以将任意数据传输到S3中?
英文:
I am trying to persist a given stream of data to an S3 compatible storage.
The size is not known before the stream ends and can vary from 5MB to ~500GB.
I tried different possibilities but did not find a better solution than to implement sharding myself. My best guess is to make a buffer of a fixed size fill it with my stream and write it to the S3.
Is there a better solution? Maybe a way where this is transparent to me, without writing the whole stream to memory?
The aws-sdk-go readme has an example programm that takes data from stdin and writes it to S3: https://github.com/aws/aws-sdk-go#using-the-go-sdk
When I try to pipe data in with a pipe |
I get the following error:
failed to upload object, SerializationError: failed to compute request body size
caused by: seek /dev/stdin: illegal seek
Am I doing something wrong or is the example not working as I expect it to?
I although tried minio-go, with PutObject() or client.PutObjectStreaming().
This is functional but consumes as much memory as the data to store.
- Is there a better solution?
- Is there a small example program that can pipe arbitrary data into S3?
答案1
得分: 10
您可以使用SDK的Uploader来处理未知大小的上传,但您需要将os.Stdin
包装成io.Reader
,使其成为“不可寻址”的。这是因为Uploader
在底层需要一个io.Reader
作为输入体,但它会检查输入体是否也是Seeker
,如果是,它会调用Seek
。由于os.Stdin
只是一个实现了Seeker
接口的*os.File
,默认情况下,您将得到与PutObjectWithContext
相同的错误。
Uploader
还允许您将数据分块上传,您可以配置每个块的大小,还可以配置同时上传的块数。
以下是链接示例的修改版本,已删除可以保持不变的代码部分。
package main
import (
// ...
"io"
"github.com/aws/aws-sdk-go/service/s3/s3manager"
)
type reader struct {
r io.Reader
}
func (r *reader) Read(p []byte) (int, error) {
return r.r.Read(p)
}
func main() {
// ... 解析标志
sess := session.Must(session.NewSession())
uploader := s3manager.NewUploader(sess, func(u *s3manager.Uploader) {
u.PartSize = 20 << 20 // 20MB
// ... 更多配置
})
// ... 上下文相关的内容
_, err := uploader.UploadWithContext(ctx, &s3manager.UploadInput{
Bucket: aws.String(bucket),
Key: aws.String(key),
Body: &reader{os.Stdin},
})
// ... 处理错误
}
至于这是否比minio-go
更好的解决方案,我不知道,您需要自行测试。
英文:
You can use the sdk's Uploader to handle uploads of unknown size but you'll need to make the os.Stdin
"unseekable" by wrapping it into an io.Reader
. This is because the Uploader
, while it requires only an io.Reader
as the input body, under the hood it does a check to see whether the input body is also a Seeker
and if it is, it does call Seek
on it. And since os.Stdin
is just an *os.File
which implements the Seeker
interface, by default, you would get the same error you got from PutObjectWithContext
.
The Uploader
also allows you to upload the data in chunks whose size you can configure and you can also configure how many of those chunks should be uploaded concurrently.
Here's a modified version of the linked example, stripped off of code that can remain unchanged.
package main
import (
// ...
"io"
"github.com/aws/aws-sdk-go/service/s3/s3manager"
)
type reader struct {
r io.Reader
}
func (r *reader) Read(p []byte) (int, error) {
return r.r.Read(p)
}
func main() {
// ... parse flags
sess := session.Must(session.NewSession())
uploader := s3manager.NewUploader(sess, func(u *s3manager.Uploader) {
u.PartSize = 20 << 20 // 20MB
// ... more configuration
})
// ... context stuff
_, err := uploader.UploadWithContext(ctx, &s3manager.UploadInput{
Bucket: aws.String(bucket),
Key: aws.String(key),
Body: &reader{os.Stdin},
})
// ... handle error
}
As to whether this is a better solution than minio-go
I do not know, you'll have to test that yourself.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论