英文:
400 GB tar.gz file upload from ec2 to s3
问题
我有一个 EC2 实例,上面存储着一个大约 400GB 的文件(tar.gz 格式)。
我现在想要解压并上传该文件,将其存储在同一 AWS 帐户中的一个 S3 存储桶中。
关于正常的 AWS S3 复制命令,我总是遇到超时问题。
完成这个任务的最简单方法是什么?
英文:
I have an ec2 instance where a approx. 400 GB file (tar.gz) is stored.
I now want to unzip and upload that file and store it inside a s3 bucket which is in the same aws account.
Regarding the normal aws s3 cp command i always ran into timeouts.
What is the easiest way to accomplish that task
答案1
得分: 1
我建议使用 s3 sync
而不是 s3 cp
。
aws s3 --region us-east-1 sync [folder] s3://[bucketname]
英文:
I would recommend using s3 sync
instead of s3 cp
.
aws s3 --region us-east-1 sync [folder] s3://[bucketname]
答案2
得分: 1
上传部分的代码如下:
初始化上传:
aws s3api create-multipart-upload --bucket my-bucket --key 'multipart/01'
{
"Bucket": "my-bucket",
"UploadId": "dfRtDYU0WWCCcH43C3WFbkRONycyCpTJJvxu2i5GYkZljF.Yxwh6XG7WfS2vC4to6HiV6Yjlx.cph0gtNBtJ8P3URCSbB7rjxI5iEwVDmgaXZOGgkk5nVTW16HOQ5l0R",
"Key": "multipart/01"
}
上传部分:
aws s3api upload-part --bucket my-bucket --key 'multipart/01' --part-number 1 --body part01 --upload-id "dfRtDYU0WWCCcH43C3WFbkRONycyCpTJJvxu2i5GYkZljF.Yxwh6XG7WfS2vC4to6HiV6Yjlx.cph0gtNBtJ8P3URCSbB7rjxI5iEwVDmgaXZOGgkk5nVTW16HOQ5l0R"
完成多部分上传:
aws s3api complete-multipart-upload --multipart-upload file://mpustruct --bucket my-bucket --key 'multipart/01' --upload-id dfRtDYU0WWCCcH43C3WFbkRONycyCpTJJvxu2i5GYkZljF.Yxwh6XG7WfS2vC4to6HiV6Yjlx.cph0gtNBtJ8P3URCSbB7rjxI5iEwVDmgaXZOGgkk5nVTW16HOQ5l0R
mpstruct
结构如下:
{
"Parts": [
{
"ETag": "e868e0f4719e394144ef36531ee6824c",
"PartNumber": 1
},
{
"ETag": "6bb2b12753d66fe86da4998aa33fffb0",
"PartNumber": 2
},
{
"ETag": "d0a0112e841abec9c9ec83406f0159c8",
"PartNumber": 3
}
]
}
结构也可以从 list-multipart-uploads
命令中获取。
英文:
They sync should possibly work. In general, the underlying mechanics is probably built on multi-part upload. That may be useful to know if you want to implement it yourself. It can be done even from command line.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpu-upload-object.html
The process may look intimidating, but it is not that bad and the benefit is that you can upload a large file even over a bad line, because you can re-try individual parts. You can also finish the upload later - for example after your temporary permissions do expire or the next day. Or possibly you can upload from two locations as long as the file is split the exact same way.
Initiate the upload:
aws s3api create-multipart-upload --bucket my-bucket --key 'multipart/01'
{
"Bucket": "my-bucket",
"UploadId": "dfRtDYU0WWCCcH43C3WFbkRONycyCpTJJvxu2i5GYkZljF.Yxwh6XG7WfS2vC4to6HiV6Yjlx.cph0gtNBtJ8P3URCSbB7rjxI5iEwVDmgaXZOGgkk5nVTW16HOQ5l0R",
"Key": "multipart/01"
}
Upload part:
aws s3api upload-part --bucket my-bucket --key 'multipart/01' --part-number 1 --body part01 --upload-id "dfRtDYU0WWCCcH43C3WFbkRONycyCpTJJvxu2i5GYkZljF.Yxwh6XG7WfS2vC4to6HiV6Yjlx.cph0gtNBtJ8P3URCSbB7rjxI5iEwVDmgaXZOGgkk5nVTW16HOQ5l0R"
Complete multi-part:
aws s3api complete-multipart-upload --multipart-upload file://mpustruct --bucket my-bucket --key 'multipart/01' --upload-id dfRtDYU0WWCCcH43C3WFbkRONycyCpTJJvxu2i5GYkZljF.Yxwh6XG7WfS2vC4to6HiV6Yjlx.cph0gtNBtJ8P3URCSbB7rjxI5iEwVDmgaXZOGgkk5nVTW16HOQ5l0R
The mpstruct is:
{
"Parts": [
{
"ETag": "e868e0f4719e394144ef36531ee6824c",
"PartNumber": 1
},
{
"ETag": "6bb2b12753d66fe86da4998aa33fffb0",
"PartNumber": 2
},
{
"ETag": "d0a0112e841abec9c9ec83406f0159c8",
"PartNumber": 3
}
]
}
the structure can be obtained also from the list-multipart-uploads
command.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论