2023年6月5日 20:04:54go评论138阅读模式

英文:

How to quickly request for restoring object in deep glacier in s3?

问题

我在一个存储桶中有超过300,000个对象。

它们位于深层冰川之下，我想恢复这些对象以进行分析。

当我尝试测试“恢复 API”的速度时，似乎处理所有这些对象对我来说速度较慢。

以下是我的Python代码。

paginator = s3.get_paginator('list_objects')
operation_parameters = {'Bucket': bucket, 'Prefix': prefix}

page_iterator = paginator.paginate(**operation_parameters)

cnt = 0
for page in page_iterator:
    for content in page['Contents']:
        try:
            print(content['Key'])
            s3.restore_object(
                Bucket=bucket,
                Key=content['Key'],
                RestoreRequest={
                    'Days': 1,
                    'GlacierJobParameters': {
                        'Tier': 'Standard',
                    },
                },
            )
        except:
             print("已恢复..")

我发现批量操作侧重于加速这种情况，但我不想使用 AWS 控制台。

与其使用 AWS 控制台作为 GUI 界面，我想使用Python代码来编写它，就像上面的代码一样。

在Python的Boto中，是否有更有效的方式来恢复大型对象？（不使用大数据工具如Spark）

谢谢。

英文:

I have more than 300,000 objects in a bucket.

They are under the deep glacier and I would like to restore objects for analysis.

When I tried to benchmark the speed of the restore api, it seems to be slow for me to process all these objects.

Below is my python code.

paginator = s3.get_paginator(&#39;list_objects&#39;)
operation_parameters = {&#39;Bucket&#39;: bucket,
                        &#39;Prefix&#39;: prefix}

page_iterator = paginator.paginate(**operation_parameters)

cnt = 0
for page in page_iterator:
    for content in page[&#39;Contents&#39;]:
        try:
            print(content[&#39;Key&#39;])
            s3.restore_object(
                Bucket=bucket,
                Key=content[&#39;Key&#39;],
                RestoreRequest={
                    &#39;Days&#39;: 1,
                    &#39;GlacierJobParameters&#39;: {
                        &#39;Tier&#39;: &#39;Standard&#39;,
                    },
                },
            )
        except:
             print(&quot;already restored..&quot;)

I found that batch operation is focused on speeding up for this scenario but I don't want to use the aws console.

Rather than using the aws console as GUI interface, I want to code it using python as code above.

Is there more efficient way to restore large objects in python boto? (Not using big data tools such as spark)

Thanks.

答案1

得分: 2

你可以选择遍历对象并为每个对象调用restore_object()，或者使用S3批量操作来进行'批量还原'。

控制台中的任何操作都可以通过代码完成。

要创建S3批量操作作业，请使用create_job() - Boto3文档。

如果您希望继续使用现有的代码，可以通过并行调用restore_object()（例如使用asyncio）来加速运行，而不必等待响应，但这需要一些高级的Python技能。

英文:

Your choice is either loop through the objects and call restore_object() for each of them, or use S3 Batch Operations to 'bulk restore' them.

Anything you can do in the console can also be done through code.

To create an S3 Batch Operations job, use create_job() - Boto3 documentation.

If you wish to use your existing code, you could make it run faster by calling restore_object() in parallel (eg using asyncio) without waiting for a response, but this requires some advanced Python skills.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何快速请求在S3中深层冰川中恢复对象？

问题

答案1

Cognito JWT 在 ASP.NET Core 6 Web API 中进行授权

将列表导出到Excel并创建超链接 – 由于元组数据而出现错误

Using Python, how to print output string as -> aaa3bb2c1ddddd5 when Input string is aaabbcddddd

Flatlist是否自动支持分页？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论