英文:
How to load an RDS snapshot (in parquet format) into a local PostgreSQL
问题
有一百万个关于如何将PostgreSQL数据导入RDS、如何将RDS数据库快照导出到S3以及如何从PostgreSQL转换为Parquet的教程,但我找不到任何关于如何正确进行相反操作的文章或SO问题:我需要加载RDS导出到S3的数据库快照(作为一系列目录和Parquet文件)到本地的PostgreSQL实例。
我成功导出了数据,并使用AWS CLI将其同步到我的本地系统:
aws s3 cp s3://snapshotbucketname .
但接下来该怎么办,既然我已经在我的驱动器上有了实际的快照?将这个从“根本不是psql数据库”转变为我可以在本地使用的psql数据库的正确方式是什么?
英文:
There are a million tutorials on how to import PostgreSQL data into RDS, and how to export RDS database snapshots to S3, and how to convert from PostgreSQL to Parquet, but I can't find a single article or SO question about how to properly go the other way: I need to load a database snapshot that RDS exported to S3 (as a series of directories and parquet files) into a local PostgreSQL instance.
I exported the data just fine, and synced it to my local system using the AWS CLI:
aws s3 cp s3://snapshotbucketname .
But what comes next, now that I have the actual snapshot living on my drive? What's the correct way to turn this from "not a psql database at all" into a psql database that I can work with locally?
答案1
得分: 0
据我所知,使用AWS控制台和CLI工具直接进行此操作几乎是不可能的。相反,您需要在RDS中设置数据库,使其位于您可以访问的VPC中,然后可以使用 pg_dump
下载数据库,之后您可以在本地使用 pg_restore
。
对于生产数据库,它们位于一个经过严格限制的VPC中(如果配置正确),您几乎不太可能直接连接到它。在这些情况下,对数据库进行快照(在RDS控制台的“维护”选项卡中),然后选择该快照,并将其恢复到一个新的数据库中,该数据库位于您可以访问的VPC中,并且属于您可以访问的安全组。完成后,进行数据库转储,删除新数据库,然后删除快照,以免留下虚拟数据库会增加费用。
英文:
As far as I can tell this is straight up not possible using the AWS console and CLI tool. Instead you need to set up your database in RDS to be in a VPC that you can access, so that you can then use pg_dump
to download your database, after which you can use pg_restore
locally.
For prod databases, they're in a VPC that's heavily restricted (if configured right) and the odds that you directly connect to it are near zero. In those cases, take a snapshot of the database ("maintenance" tab in the RDS console), then select that snapshop, and restore it to a new database, in a VPC and part of a security group that you do have access to. Once that's done, dump it, deleted the new database, and then delete the snapshot so you don't have a phantom db hanging around costing you money.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论