英文:
How to quickly request for restoring object in deep glacier in s3?
问题
我在一个存储桶中有超过300,000个对象。
它们位于深层冰川之下,我想恢复这些对象以进行分析。
当我尝试测试“恢复 API”的速度时,似乎处理所有这些对象对我来说速度较慢。
以下是我的Python代码。
paginator = s3.get_paginator('list_objects')
operation_parameters = {'Bucket': bucket, 'Prefix': prefix}
page_iterator = paginator.paginate(**operation_parameters)
cnt = 0
for page in page_iterator:
for content in page['Contents']:
try:
print(content['Key'])
s3.restore_object(
Bucket=bucket,
Key=content['Key'],
RestoreRequest={
'Days': 1,
'GlacierJobParameters': {
'Tier': 'Standard',
},
},
)
except:
print("已恢复..")
我发现批量操作侧重于加速这种情况,但我不想使用 AWS 控制台。
与其使用 AWS 控制台作为 GUI 界面,我想使用Python代码来编写它,就像上面的代码一样。
在Python的Boto中,是否有更有效的方式来恢复大型对象?(不使用大数据工具如Spark)
谢谢。
英文:
I have more than 300,000 objects in a bucket.
They are under the deep glacier and I would like to restore objects for analysis.
When I tried to benchmark the speed of the restore api
, it seems to be slow for me to process all these objects.
Below is my python code.
paginator = s3.get_paginator('list_objects')
operation_parameters = {'Bucket': bucket,
'Prefix': prefix}
page_iterator = paginator.paginate(**operation_parameters)
cnt = 0
for page in page_iterator:
for content in page['Contents']:
try:
print(content['Key'])
s3.restore_object(
Bucket=bucket,
Key=content['Key'],
RestoreRequest={
'Days': 1,
'GlacierJobParameters': {
'Tier': 'Standard',
},
},
)
except:
print("already restored..")
I found that batch operation is focused on speeding up for this scenario but I don't want to use the aws console.
Rather than using the aws console as GUI interface, I want to code it using python as code above.
Is there more efficient way to restore large objects in python boto? (Not using big data tools such as spark)
Thanks.
答案1
得分: 2
你可以选择遍历对象并为每个对象调用restore_object()
,或者使用S3批量操作来进行'批量还原'。
控制台中的任何操作都可以通过代码完成。
要创建S3批量操作作业,请使用create_job()
- Boto3文档。
如果您希望继续使用现有的代码,可以通过并行调用restore_object()
(例如使用asyncio
)来加速运行,而不必等待响应,但这需要一些高级的Python技能。
英文:
Your choice is either loop through the objects and call restore_object()
for each of them, or use S3 Batch Operations to 'bulk restore' them.
Anything you can do in the console can also be done through code.
To create an S3 Batch Operations job, use create_job()
- Boto3 documentation.
If you wish to use your existing code, you could make it run faster by calling restore_object()
in parallel (eg using asyncio
) without waiting for a response, but this requires some advanced Python skills.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论