2023年7月1日 12:00:49go评论108阅读模式

英文:

Why downloading in memory is slower than downloding in file system from aws s3?

问题

我正在使用AWS gosdk从某个存储桶下载文件。以下是两种下载实现的代码：

下载到文件

func (a *AwsClient) DownloadToFile(ctx context.Context, objectKey string) (string, error) {
    params := &awsS3.GetObjectInput{
        Bucket: aws.String(a.bucket),
        Key:    aws.String(objectKey),
    }
    downloadPath := "some/valid/path"
    f, err := os.Create(downloadPath)
    defer f.Close()
    _, err = a.downloader.Download(ctx, f, params)
    return downloadPath, err
}

下载到内存

func (a *AwsClient) DownloadToMemory(ctx context.Context, objectKey string) ([]byte, error) {
    params := &awsS3.GetObjectInput{
        Bucket: aws.String(a.bucket),
        Key:    aws.String(objectKey),
    }
    buffer := manager.NewWriteAtBuffer([]byte{})  
    _, err := a.downloader.Download(ctx, buffer, params)
    return buffer.Bytes(), err
}

对于100 MB的文件，内存下载需要30秒，而文件系统下载只需要8秒。我原本期望内存下载会更快。我的系统（Apple M1，Ventura，8GB RAM）有足够的空闲RAM，所以这不是问题。有人可以帮我理解这种行为吗？

英文:

I am using AWS gosdk to download from some bucket. Below are the two implementation of download

Download to file

func (a *AwsClient) DownloadToFile(ctx context.Context, objectKey string) (string, error) {
	params := &amp;awsS3.GetObjectInput{
		Bucket: aws.String(a.bucket),
		Key:    aws.String(objectKey),
	}
	downloadPath := &quot;some/valid/path&quot;
	f, err := os.Create(downloadPath)
	defer f.Close()
	_, err = a.downloader.Download(ctx, f, params)
	return downloadPath, err
}

Download to memory

func (a *AwsClient) DownloadToMemory(ctx context.Context, objectKey string) (string, error) {
    params := &amp;awsS3.GetObjectInput{
        Bucket: aws.String(a.bucket),
        Key:    aws.String(objectKey),
    }
    buffer := manager.NewWriteAtBuffer([]byte{})  
    _, err = a.downloader.Download(ctx, buffer, params)
    return buffer.Bytes(), err
}

For 100 MB file, download in memory is taking 30s and download in file system is taking only 8 seconds. My expectation was memory download should be much faster. My system (Apple M1, Ventura, 8GB RAM) has enough free RAM, so that is not the problem. Can someone help me understand this behaviour?

答案1

得分: 3

将一个大的S3对象下载到一个动态缓冲区中是相当低效的。该缓冲区会被多次重新分配以处理100M的数据和多个下载线程。这种内存重新分配会消耗大量的CPU时间。

尝试在开始时分配100M的内存，而不是使用一个空的字节切片。

如果对象的大小未知，可以使用S3.HeadObject实时获取对象的长度。

英文:

Downloading a big S3 Object into a dynamic buffer is quite inefficient. That buffer is reallocated many times to handle 100M of data and multiple download threads. That memory reallocation takes lots of CPU time.

Try to allocate 100M at the beginning instead of using an empty slice of bytes.

You can use S3.HeadObject to get Object Length in real time if size of object is unknown.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么从AWS S3下载到内存比从文件系统下载要慢？

问题

答案1

为什么在goroutine中似乎无法使用睡眠（sleep）函数？

合并数千万个文件的最快方法是什么？

设置地图迭代顺序的种子。

在Golang中实现一个日志文件附加器。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。