为什么从AWS S3下载到内存比从文件系统下载要慢?

huangapple go评论70阅读模式
英文:

Why downloading in memory is slower than downloding in file system from aws s3?

问题

我正在使用AWS gosdk从某个存储桶下载文件。以下是两种下载实现的代码:

  1. 下载到文件
func (a *AwsClient) DownloadToFile(ctx context.Context, objectKey string) (string, error) {
    params := &awsS3.GetObjectInput{
        Bucket: aws.String(a.bucket),
        Key:    aws.String(objectKey),
    }

    downloadPath := "some/valid/path"
    f, err := os.Create(downloadPath)
    defer f.Close()
    _, err = a.downloader.Download(ctx, f, params)
    return downloadPath, err
}
  1. 下载到内存
func (a *AwsClient) DownloadToMemory(ctx context.Context, objectKey string) ([]byte, error) {
    params := &awsS3.GetObjectInput{
        Bucket: aws.String(a.bucket),
        Key:    aws.String(objectKey),
    }

    buffer := manager.NewWriteAtBuffer([]byte{})  
    _, err := a.downloader.Download(ctx, buffer, params)
    return buffer.Bytes(), err
}

对于100 MB的文件,内存下载需要30秒,而文件系统下载只需要8秒。我原本期望内存下载会更快。我的系统(Apple M1,Ventura,8GB RAM)有足够的空闲RAM,所以这不是问题。有人可以帮我理解这种行为吗?

英文:

I am using AWS gosdk to download from some bucket. Below are the two implementation of download

  1. Download to file
func (a *AwsClient) DownloadToFile(ctx context.Context, objectKey string) (string, error) {
	params := &awsS3.GetObjectInput{
		Bucket: aws.String(a.bucket),
		Key:    aws.String(objectKey),
	}

	downloadPath := "some/valid/path"
	f, err := os.Create(downloadPath)
	defer f.Close()
	_, err = a.downloader.Download(ctx, f, params)
	return downloadPath, err
}
  1. Download to memory
func (a *AwsClient) DownloadToMemory(ctx context.Context, objectKey string) (string, error) {
    params := &awsS3.GetObjectInput{
        Bucket: aws.String(a.bucket),
        Key:    aws.String(objectKey),
    }

    buffer := manager.NewWriteAtBuffer([]byte{})  
    _, err = a.downloader.Download(ctx, buffer, params)
    return buffer.Bytes(), err
}

For 100 MB file, download in memory is taking 30s and download in file system is taking only 8 seconds. My expectation was memory download should be much faster. My system (Apple M1, Ventura, 8GB RAM) has enough free RAM, so that is not the problem. Can someone help me understand this behaviour?

答案1

得分: 3

将一个大的S3对象下载到一个动态缓冲区中是相当低效的。该缓冲区会被多次重新分配以处理100M的数据和多个下载线程。这种内存重新分配会消耗大量的CPU时间。

尝试在开始时分配100M的内存,而不是使用一个空的字节切片。

如果对象的大小未知,可以使用S3.HeadObject实时获取对象的长度。

英文:

Downloading a big S3 Object into a dynamic buffer is quite inefficient. That buffer is reallocated many times to handle 100M of data and multiple download threads. That memory reallocation takes lots of CPU time.

Try to allocate 100M at the beginning instead of using an empty slice of bytes.

You can use S3.HeadObject to get Object Length in real time if size of object is unknown.

huangapple
  • 本文由 发表于 2023年7月1日 12:00:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/76593221.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定