2017年4月25日 03:06:20go评论101阅读模式

英文:

How to save data streams in S3? aws-sdk-go example not working?

问题

我正在尝试将给定的数据流持久化到兼容S3的存储中。在流结束之前，无法确定其大小，可能会在5MB到约500GB之间变化。

我尝试了不同的可能性，但没有找到比自己实现分片更好的解决方案。我最好的猜测是创建一个固定大小的缓冲区，将流填充到其中，然后将其写入S3。是否有更好的解决方案？也许有一种方式可以对我来说是透明的，而不需要将整个流写入内存中？

aws-sdk-go的自述文件中有一个示例程序，它从stdin获取数据并将其写入S3：https://github.com/aws/aws-sdk-go#using-the-go-sdk

当我尝试使用管道|传输数据时，出现以下错误：

failed to upload object, SerializationError: failed to compute request body size
caused by: seek /dev/stdin: illegal seek

我是做错了什么还是示例程序不按照我的预期工作？

我还尝试了minio-go，使用PutObject()或client.PutObjectStreaming()。这是可行的，但会消耗与要存储的数据一样多的内存。

是否有更好的解决方案？
是否有一个小的示例程序，可以将任意数据传输到S3中？

英文:

I am trying to persist a given stream of data to an S3 compatible storage.
The size is not known before the stream ends and can vary from 5MB to ~500GB.

I tried different possibilities but did not find a better solution than to implement sharding myself. My best guess is to make a buffer of a fixed size fill it with my stream and write it to the S3.
Is there a better solution? Maybe a way where this is transparent to me, without writing the whole stream to memory?

The aws-sdk-go readme has an example programm that takes data from stdin and writes it to S3: https://github.com/aws/aws-sdk-go#using-the-go-sdk

When I try to pipe data in with a pipe | I get the following error:

failed to upload object, SerializationError: failed to compute request body size
caused by: seek /dev/stdin: illegal seek

Am I doing something wrong or is the example not working as I expect it to?

I although tried minio-go, with PutObject() or client.PutObjectStreaming().
This is functional but consumes as much memory as the data to store.

Is there a better solution?
Is there a small example program that can pipe arbitrary data into S3?

答案1

得分: 10

您可以使用SDK的Uploader来处理未知大小的上传，但您需要将os.Stdin包装成io.Reader，使其成为“不可寻址”的。这是因为Uploader在底层需要一个io.Reader作为输入体，但它会检查输入体是否也是Seeker，如果是，它会调用Seek。由于os.Stdin只是一个实现了Seeker接口的*os.File，默认情况下，您将得到与PutObjectWithContext相同的错误。

Uploader还允许您将数据分块上传，您可以配置每个块的大小，还可以配置同时上传的块数。

以下是链接示例的修改版本，已删除可以保持不变的代码部分。

package main

import (
    // ...
    "io"
    "github.com/aws/aws-sdk-go/service/s3/s3manager"
)

type reader struct {
    r io.Reader
}

func (r *reader) Read(p []byte) (int, error) {
    return r.r.Read(p)
}

func main() {
    // ... 解析标志

    sess := session.Must(session.NewSession())
    uploader := s3manager.NewUploader(sess, func(u *s3manager.Uploader) {
        u.PartSize = 20 << 20 // 20MB
        // ... 更多配置
    })

    // ... 上下文相关的内容

    _, err := uploader.UploadWithContext(ctx, &s3manager.UploadInput{
        Bucket: aws.String(bucket),
        Key:    aws.String(key),
        Body:   &reader{os.Stdin},
    })

    // ... 处理错误
}

至于这是否比minio-go更好的解决方案，我不知道，您需要自行测试。

英文:

You can use the sdk's Uploader to handle uploads of unknown size but you'll need to make the os.Stdin "unseekable" by wrapping it into an io.Reader. This is because the Uploader, while it requires only an io.Reader as the input body, under the hood it does a check to see whether the input body is also a Seeker and if it is, it does call Seek on it. And since os.Stdin is just an *os.File which implements the Seeker interface, by default, you would get the same error you got from PutObjectWithContext.

The Uploader also allows you to upload the data in chunks whose size you can configure and you can also configure how many of those chunks should be uploaded concurrently.

Here's a modified version of the linked example, stripped off of code that can remain unchanged.

package main

import (
    // ...
    &quot;io&quot;
    &quot;github.com/aws/aws-sdk-go/service/s3/s3manager&quot;
)

type reader struct {
	r io.Reader
}

func (r *reader) Read(p []byte) (int, error) {
	return r.r.Read(p)
}

func main() {
    // ... parse flags
    
    sess := session.Must(session.NewSession())
    uploader := s3manager.NewUploader(sess, func(u *s3manager.Uploader) {
	    u.PartSize = 20 &lt;&lt; 20 // 20MB
        // ... more configuration
    })
    
    // ... context stuff
    
    _, err := uploader.UploadWithContext(ctx, &amp;s3manager.UploadInput{
	    Bucket: aws.String(bucket),
	    Key:    aws.String(key),
	    Body:   &amp;reader{os.Stdin},
    })
    
    // ... handle error
}

As to whether this is a better solution than minio-go I do not know, you'll have to test that yourself.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何将数据流保存到S3？aws-sdk-go示例无法工作？

问题

答案1

参考’unit’是不明确的，可能是：单位，单元

使用Golang编写的Project Euler＃22问题；每次返回不同的结果。

It is possible pod in kubernetes terminate before service process success?

Golang：同时进行的函数调用用于HTTP POST请求。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论