400 GB tar.gz 文件从 EC2 上传到 S3

huangapple go评论46阅读模式
英文:

400 GB tar.gz file upload from ec2 to s3

问题

我有一个 EC2 实例,上面存储着一个大约 400GB 的文件(tar.gz 格式)。

我现在想要解压并上传该文件,将其存储在同一 AWS 帐户中的一个 S3 存储桶中。

关于正常的 AWS S3 复制命令,我总是遇到超时问题。

完成这个任务的最简单方法是什么?

英文:

I have an ec2 instance where a approx. 400 GB file (tar.gz) is stored.

I now want to unzip and upload that file and store it inside a s3 bucket which is in the same aws account.

Regarding the normal aws s3 cp command i always ran into timeouts.

What is the easiest way to accomplish that task

答案1

得分: 1

我建议使用 s3 sync 而不是 s3 cp

aws s3 --region us-east-1 sync [folder] s3://[bucketname]
英文:

I would recommend using s3 sync instead of s3 cp.

aws s3 --region us-east-1 sync [folder] s3://[bucketname]

答案2

得分: 1

上传部分的代码如下:

初始化上传:

aws s3api create-multipart-upload --bucket my-bucket --key 'multipart/01'
{
    "Bucket": "my-bucket",
    "UploadId": "dfRtDYU0WWCCcH43C3WFbkRONycyCpTJJvxu2i5GYkZljF.Yxwh6XG7WfS2vC4to6HiV6Yjlx.cph0gtNBtJ8P3URCSbB7rjxI5iEwVDmgaXZOGgkk5nVTW16HOQ5l0R",
    "Key": "multipart/01"
}

上传部分:

aws s3api upload-part --bucket my-bucket --key 'multipart/01' --part-number 1 --body part01 --upload-id "dfRtDYU0WWCCcH43C3WFbkRONycyCpTJJvxu2i5GYkZljF.Yxwh6XG7WfS2vC4to6HiV6Yjlx.cph0gtNBtJ8P3URCSbB7rjxI5iEwVDmgaXZOGgkk5nVTW16HOQ5l0R"

完成多部分上传:

aws s3api complete-multipart-upload --multipart-upload file://mpustruct --bucket my-bucket --key 'multipart/01' --upload-id dfRtDYU0WWCCcH43C3WFbkRONycyCpTJJvxu2i5GYkZljF.Yxwh6XG7WfS2vC4to6HiV6Yjlx.cph0gtNBtJ8P3URCSbB7rjxI5iEwVDmgaXZOGgkk5nVTW16HOQ5l0R

mpstruct 结构如下:

{
  "Parts": [
    {
      "ETag": "e868e0f4719e394144ef36531ee6824c",
      "PartNumber": 1
    },
    {
      "ETag": "6bb2b12753d66fe86da4998aa33fffb0",
      "PartNumber": 2
    },
    {
      "ETag": "d0a0112e841abec9c9ec83406f0159c8",
      "PartNumber": 3
    }
  ]
}

结构也可以从 list-multipart-uploads 命令中获取。

英文:

They sync should possibly work. In general, the underlying mechanics is probably built on multi-part upload. That may be useful to know if you want to implement it yourself. It can be done even from command line.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpu-upload-object.html

The process may look intimidating, but it is not that bad and the benefit is that you can upload a large file even over a bad line, because you can re-try individual parts. You can also finish the upload later - for example after your temporary permissions do expire or the next day. Or possibly you can upload from two locations as long as the file is split the exact same way.

Initiate the upload:

aws s3api create-multipart-upload --bucket my-bucket --key 'multipart/01'
{
    "Bucket": "my-bucket",
    "UploadId": "dfRtDYU0WWCCcH43C3WFbkRONycyCpTJJvxu2i5GYkZljF.Yxwh6XG7WfS2vC4to6HiV6Yjlx.cph0gtNBtJ8P3URCSbB7rjxI5iEwVDmgaXZOGgkk5nVTW16HOQ5l0R",
    "Key": "multipart/01"
}

Upload part:

aws s3api upload-part --bucket my-bucket --key 'multipart/01' --part-number 1 --body part01 --upload-id  "dfRtDYU0WWCCcH43C3WFbkRONycyCpTJJvxu2i5GYkZljF.Yxwh6XG7WfS2vC4to6HiV6Yjlx.cph0gtNBtJ8P3URCSbB7rjxI5iEwVDmgaXZOGgkk5nVTW16HOQ5l0R"

Complete multi-part:

aws s3api complete-multipart-upload --multipart-upload file://mpustruct --bucket my-bucket --key 'multipart/01' --upload-id dfRtDYU0WWCCcH43C3WFbkRONycyCpTJJvxu2i5GYkZljF.Yxwh6XG7WfS2vC4to6HiV6Yjlx.cph0gtNBtJ8P3URCSbB7rjxI5iEwVDmgaXZOGgkk5nVTW16HOQ5l0R

The mpstruct is:

{
  "Parts": [
    {
      "ETag": "e868e0f4719e394144ef36531ee6824c",
      "PartNumber": 1
    },
    {
      "ETag": "6bb2b12753d66fe86da4998aa33fffb0",
      "PartNumber": 2
    },
    {
      "ETag": "d0a0112e841abec9c9ec83406f0159c8",
      "PartNumber": 3
    }
  ]
}

the structure can be obtained also from the list-multipart-uploads command.

huangapple
  • 本文由 发表于 2023年3月7日 17:21:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/75660052.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定