使用Go语言的`http.GET`和Goamz的`multi.PutAll`进行分块的Golang多部分上传。

huangapple go评论78阅读模式
英文:

Golang multipart uploads with chunked `http.GET` and Goamz `multi.PutAll`

问题

我正在使用Goamz包,并且需要帮助使bucket.Multi能够将HTTP GET响应流式传输到S3。

我将通过分块的HTTP下载一个2GB以上的文件,并希望直接将其流式传输到S3存储桶中。

看起来我需要使用某种方式包装resp.Body,以便我可以将s3.ReaderAtSeeker的实现传递给multi.PutAll

// 设置S3
auth, _ := aws.EnvAuth()
s3Con := s3.New(auth, aws.USEast)
bucket := s3Con.Bucket("bucket-name")

// 发起HTTP请求
resp, err := http.Get(export_url)
if err != nil {
    fmt.Printf("Get error %v\n", err)
    return
}

defer resp.Body.Close()

// 设置多部分上传
multi, err := bucket.InitMulti(s3Path, "text/plain", s3.Private, s3.Options{})
if err != nil {
    fmt.Printf("InitMulti error %v\n", err)
    return
}

// 需要实现s3.ReaderAtSeeker的结构体
// type ReaderAtSeeker interface {
//     io.ReaderAt
//     io.ReadSeeker
// }

rs := // 问题:我应该用什么包装`resp.Body`?

parts, err := multi.PutAll(rs, 5120)
if err != nil {
    fmt.Printf("PutAll error %v\n", err)
    return
}

err = multi.Complete(parts)
if err != nil {
    fmt.Printf("Complete error %v\n", err)
    return
}

目前,当我尝试运行我的程序时,我得到了以下(预期的)错误:

./main.go:50: cannot use resp.Body (type io.ReadCloser) as type s3.ReaderAtSeeker in argument to multi.PutAll:
    io.ReadCloser does not implement s3.ReaderAtSeeker (missing ReadAt method)
英文:

I'm using the Goamz package and could use some help getting bucket.Multi to stream an HTTP GET response to S3.

I'll be downloading a 2+ GB file via chunked HTTP and I'd like to stream it directly into an S3 bucket.

It appears that I need to wrap the resp.Body with something so I can pass an implementation of s3.ReaderAtSeeker to multi.PutAll

// set up s3
auth, _ := aws.EnvAuth()
s3Con := s3.New(auth, aws.USEast)
bucket := s3Con.Bucket("bucket-name")

// make http request to URL
resp, err := http.Get(export_url)
if err != nil {
	fmt.Printf("Get error %v\n", err)
	return
}

defer resp.Body.Close()

// set up multi-part 
multi, err := bucket.InitMulti(s3Path, "text/plain", s3.Private, s3.Options{})
if err != nil {
	fmt.Printf("InitMulti error %v\n", err)
	return
}

// Need struct that implements: s3.ReaderAtSeeker
// type ReaderAtSeeker interface {
// 	io.ReaderAt
// 	io.ReadSeeker
// }

rs := // Question: what can i wrap `resp.Body` in?

parts, err := multi.PutAll(rs, 5120)
if err != nil {
	fmt.Printf("PutAll error %v\n", err)
	return
}

err = multi.Complete(parts)
if err != nil {
	fmt.Printf("Complete error %v\n", err)
	return
}

Currently I get the following (expected) error when trying to run my program:

./main.go:50: cannot use resp.Body (type io.ReadCloser) as type s3.ReaderAtSeeker in argument to multi.PutAll:
    io.ReadCloser does not implement s3.ReaderAtSeeker (missing ReadAt method)

答案1

得分: 1

你没有指明你使用的是哪个包来访问S3 API,但我假设你使用的是这个包:https://github.com/mitchellh/goamz/。

由于你的文件大小相当大,一个可能的解决方案是使用multi.PutPart。这将比multi.PutAll提供更多的控制。使用标准库中的Reader,你可以按照以下步骤进行操作:

  1. 从响应头中获取Content-Length。
  2. 根据Content-Length和partSize计算所需的分块数。
  3. 循环遍历分块数,从response.Body中读取[]byte到bytes.Reader,并调用multi.PutPart。
  4. 使用multi.ListParts获取分块。
  5. 调用multi.Complete并传入分块。

我无法访问S3,所以无法测试我的假设,但如果你还没有尝试过,上述方法可能值得探索一下。

英文:

<del>You haven't indicated which package you're using to access the S3 api but I'm assuming it's this one</del> https://github.com/mitchellh/goamz/.

Since your file is of a significant in size, a possible solution might be to use the multi.PutPart. This will give you more control than multi.PutAll. Using the Reader from the standard library, your approach would be:

  1. Get the Content-Length from the response header
  2. Get the number of parts needed based on Content-Length and partSize
  3. Loop over number of part and read []byte from response.Body into bytes.Reader and call multi.PutPart
  4. Get parts from multi.ListParts
  5. call multi.Complete with parts.

I don't have access to S3 so I can't test my hypothesis but the above could be worth exploring if you haven't already.

答案2

得分: 0

一个更简单的方法是使用 - http://github.com/minio/minio-go

它实现了PutObject(),这是一个完全托管的自包含操作,用于上传大文件。对于超过5MB的数据,它还会自动并行进行分块处理。如果没有预定义的ContentLength,它会一直上传直到达到EOF。

以下示例展示了如何在没有预定义输入长度但有一个流式的io.Reader的情况下进行操作。在这个示例中,我使用"os.Stdin"作为等效的分块输入。

package main

import (
	"log"
	"os"

	"github.com/minio/minio-go"
)

func main() {
	config := minio.Config{
		AccessKeyID:     "YOUR-ACCESS-KEY-HERE",
		SecretAccessKey: "YOUR-PASSWORD-HERE",
		Endpoint:        "https://s3.amazonaws.com",
	}
	s3Client, err := minio.New(config)
	if err != nil {
		log.Fatalln(err)
	}

	err = s3Client.PutObject("mybucket", "myobject", "application/octet-stream", 0, os.Stdin)
	if err != nil {
		log.Fatalln(err)
	}

}
$ echo "Hello my new-object" | go run stream-object.go
英文:

A simpler approach is to use - http://github.com/minio/minio-go

It implements PutObject() which is a fully managed self contained operation for uploading large files. It also automatically does multipart for more than 5MB worth of data in parallel. if no pre-defined ContentLength is specified. It will keep uploading until it reaches EOF.

Following example shows how to do it, when one doesn't have a pre-defined input length but an io.Reader which is streaming. In this example i have used "os.Stdin" as an equivalent for your chunked input.

package main

import (
	&quot;log&quot;
	&quot;os&quot;

	&quot;github.com/minio/minio-go&quot;
)

func main() {
	config := minio.Config{
		AccessKeyID:     &quot;YOUR-ACCESS-KEY-HERE&quot;,
		SecretAccessKey: &quot;YOUR-PASSWORD-HERE&quot;,
		Endpoint:        &quot;https://s3.amazonaws.com&quot;,
	}
	s3Client, err := minio.New(config)
	if err != nil {
		log.Fatalln(err)
	}

	err = s3Client.PutObject(&quot;mybucket&quot;, &quot;myobject&quot;, &quot;application/octet-stream&quot;, 0, os.Stdin)
	if err != nil {
		log.Fatalln(err)
	}

}
$ echo &quot;Hello my new-object&quot; | go run stream-object.go

huangapple
  • 本文由 发表于 2014年12月26日 06:50:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/27651344.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定