如何将数据流保存到S3?aws-sdk-go示例无法工作?

huangapple go评论101阅读模式
英文:

How to save data streams in S3? aws-sdk-go example not working?

问题

我正在尝试将给定的数据流持久化到兼容S3的存储中。在流结束之前,无法确定其大小,可能会在5MB到约500GB之间变化。

我尝试了不同的可能性,但没有找到比自己实现分片更好的解决方案。我最好的猜测是创建一个固定大小的缓冲区,将流填充到其中,然后将其写入S3。是否有更好的解决方案?也许有一种方式可以对我来说是透明的,而不需要将整个流写入内存中?

aws-sdk-go的自述文件中有一个示例程序,它从stdin获取数据并将其写入S3:https://github.com/aws/aws-sdk-go#using-the-go-sdk

当我尝试使用管道|传输数据时,出现以下错误:

failed to upload object, SerializationError: failed to compute request body size
caused by: seek /dev/stdin: illegal seek

我是做错了什么还是示例程序不按照我的预期工作?

我还尝试了minio-go,使用PutObject()client.PutObjectStreaming()。这是可行的,但会消耗与要存储的数据一样多的内存。

  1. 是否有更好的解决方案?
  2. 是否有一个小的示例程序,可以将任意数据传输到S3中?
英文:

I am trying to persist a given stream of data to an S3 compatible storage.
The size is not known before the stream ends and can vary from 5MB to ~500GB.

I tried different possibilities but did not find a better solution than to implement sharding myself. My best guess is to make a buffer of a fixed size fill it with my stream and write it to the S3.
Is there a better solution? Maybe a way where this is transparent to me, without writing the whole stream to memory?

The aws-sdk-go readme has an example programm that takes data from stdin and writes it to S3: https://github.com/aws/aws-sdk-go#using-the-go-sdk

When I try to pipe data in with a pipe | I get the following error:

failed to upload object, SerializationError: failed to compute request body size
caused by: seek /dev/stdin: illegal seek

Am I doing something wrong or is the example not working as I expect it to?

I although tried minio-go, with PutObject() or client.PutObjectStreaming().
This is functional but consumes as much memory as the data to store.

  1. Is there a better solution?
  2. Is there a small example program that can pipe arbitrary data into S3?

答案1

得分: 10

您可以使用SDK的Uploader来处理未知大小的上传,但您需要将os.Stdin包装成io.Reader,使其成为“不可寻址”的。这是因为Uploader在底层需要一个io.Reader作为输入体,但它会检查输入体是否也是Seeker,如果是,它会调用Seek。由于os.Stdin只是一个实现了Seeker接口的*os.File,默认情况下,您将得到与PutObjectWithContext相同的错误。

Uploader还允许您将数据分块上传,您可以配置每个块的大小,还可以配置同时上传的块数。

以下是链接示例的修改版本,已删除可以保持不变的代码部分。

package main

import (
    // ...
    "io"
    "github.com/aws/aws-sdk-go/service/s3/s3manager"
)

type reader struct {
    r io.Reader
}

func (r *reader) Read(p []byte) (int, error) {
    return r.r.Read(p)
}

func main() {
    // ... 解析标志

    sess := session.Must(session.NewSession())
    uploader := s3manager.NewUploader(sess, func(u *s3manager.Uploader) {
        u.PartSize = 20 << 20 // 20MB
        // ... 更多配置
    })

    // ... 上下文相关的内容

    _, err := uploader.UploadWithContext(ctx, &s3manager.UploadInput{
        Bucket: aws.String(bucket),
        Key:    aws.String(key),
        Body:   &reader{os.Stdin},
    })

    // ... 处理错误
}

至于这是否比minio-go更好的解决方案,我不知道,您需要自行测试。

英文:

You can use the sdk's Uploader to handle uploads of unknown size but you'll need to make the os.Stdin "unseekable" by wrapping it into an io.Reader. This is because the Uploader, while it requires only an io.Reader as the input body, under the hood it does a check to see whether the input body is also a Seeker and if it is, it does call Seek on it. And since os.Stdin is just an *os.File which implements the Seeker interface, by default, you would get the same error you got from PutObjectWithContext.

The Uploader also allows you to upload the data in chunks whose size you can configure and you can also configure how many of those chunks should be uploaded concurrently.

Here's a modified version of the linked example, stripped off of code that can remain unchanged.

package main

import (
    // ...
    &quot;io&quot;
    &quot;github.com/aws/aws-sdk-go/service/s3/s3manager&quot;
)

type reader struct {
	r io.Reader
}

func (r *reader) Read(p []byte) (int, error) {
	return r.r.Read(p)
}

func main() {
    // ... parse flags
    
    sess := session.Must(session.NewSession())
    uploader := s3manager.NewUploader(sess, func(u *s3manager.Uploader) {
	    u.PartSize = 20 &lt;&lt; 20 // 20MB
        // ... more configuration
    })
    
    // ... context stuff
    
    _, err := uploader.UploadWithContext(ctx, &amp;s3manager.UploadInput{
	    Bucket: aws.String(bucket),
	    Key:    aws.String(key),
	    Body:   &amp;reader{os.Stdin},
    })
    
    // ... handle error
}

As to whether this is a better solution than minio-go I do not know, you'll have to test that yourself.

huangapple
  • 本文由 发表于 2017年4月25日 03:06:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/43595911.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定