如何将 RDS 快照(以 Parquet 格式)加载到本地 PostgreSQL

huangapple go评论59阅读模式
英文:

How to load an RDS snapshot (in parquet format) into a local PostgreSQL

问题

有一百万个关于如何将PostgreSQL数据导入RDS、如何将RDS数据库快照导出到S3以及如何从PostgreSQL转换为Parquet的教程,但我找不到任何关于如何正确进行相反操作的文章或SO问题:我需要加载RDS导出到S3的数据库快照(作为一系列目录和Parquet文件)到本地的PostgreSQL实例。

我成功导出了数据,并使用AWS CLI将其同步到我的本地系统:

aws s3 cp s3://snapshotbucketname .

但接下来该怎么办,既然我已经在我的驱动器上有了实际的快照?将这个从“根本不是psql数据库”转变为我可以在本地使用的psql数据库的正确方式是什么?

英文:

There are a million tutorials on how to import PostgreSQL data into RDS, and how to export RDS database snapshots to S3, and how to convert from PostgreSQL to Parquet, but I can't find a single article or SO question about how to properly go the other way: I need to load a database snapshot that RDS exported to S3 (as a series of directories and parquet files) into a local PostgreSQL instance.

I exported the data just fine, and synced it to my local system using the AWS CLI:

aws s3 cp s3://snapshotbucketname .

But what comes next, now that I have the actual snapshot living on my drive? What's the correct way to turn this from "not a psql database at all" into a psql database that I can work with locally?

答案1

得分: 0

据我所知,使用AWS控制台和CLI工具直接进行此操作几乎是不可能的。相反,您需要在RDS中设置数据库,使其位于您可以访问的VPC中,然后可以使用 pg_dump 下载数据库,之后您可以在本地使用 pg_restore

对于生产数据库,它们位于一个经过严格限制的VPC中(如果配置正确),您几乎不太可能直接连接到它。在这些情况下,对数据库进行快照(在RDS控制台的“维护”选项卡中),然后选择该快照,并将其恢复到一个新的数据库中,该数据库位于您可以访问的VPC中,并且属于您可以访问的安全组。完成后,进行数据库转储,删除新数据库,然后删除快照,以免留下虚拟数据库会增加费用。

英文:

As far as I can tell this is straight up not possible using the AWS console and CLI tool. Instead you need to set up your database in RDS to be in a VPC that you can access, so that you can then use pg_dump to download your database, after which you can use pg_restore locally.

For prod databases, they're in a VPC that's heavily restricted (if configured right) and the odds that you directly connect to it are near zero. In those cases, take a snapshot of the database ("maintenance" tab in the RDS console), then select that snapshop, and restore it to a new database, in a VPC and part of a security group that you do have access to. Once that's done, dump it, deleted the new database, and then delete the snapshot so you don't have a phantom db hanging around costing you money.

huangapple
  • 本文由 发表于 2023年4月11日 06:23:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/75981165.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定