如何克隆一个带有部分数据的 postgreSQL 数据库

huangapple go评论67阅读模式
英文:

How to clone a postgreSQL database with partial data

问题

我面临的任务是克隆一个PostgreSQL数据库,保留所有约束、索引等,并包括与特定列值相关的所有记录。

换句话说,它是将一个大型数据库分离成多个较小数据库。

例如,我的原始数据库具有许多模式,在每个模式中有许多表,在每个表中都有关于多个人的记录。我想将其克隆到新数据库,但只克隆与特定人员的记录相关的记录,其中person_id = xxx。

是否有适用于此任务的工具或任何建议?(我熟悉Java和Python)

英文:

I'm facing a task that clone a PostgreSQL database, keep all constraints, indexes, etc., and including all records that related to specific column value.

In other word, it's a separate big database to multiple smaller databases.

For example, my original database has numerous schemas, in each schema has numerous tables and in each table has records about multiple person. I want to clone it to new database but clone only records that related to specific person with person id (clone all records in all tables that have person_id = xxx).

Is there a tool for this task or any suggestions? (I'm familiar with Java and Python)

答案1

得分: 2

以下是您要翻译的内容:

最好的方法我发现是首先使用 pg_dump 工具导出完整模式,使用 -s 标志(仅导出模式,不包括数据),然后单独导出数据。

要在从头开始的新空数据库中加载您的模式,请使用 pg_restore。它将读取 pg_dump 的输出并用它来构建数据库。

在导出数据时,您需要对每个表进行分类(您可以编写一个辅助脚本以简化此过程,或使用 Excel 等工具):

  1. 基于某些条件(例如 person_id 中的某个特定值)从中获取数据子集的表
  2. 想要完整复制的表(例如维度表,如 calendarcompany_locations
  3. 不需要任何数据的表

对于(1),您需要编写一个适当的 SELECT 查询,以返回您想要复制的数据子集。将这些查询放入脚本中,并让每个查询将结果写入单独的文件,命名为 <schema>.<table>。最后,使用 psql 实用程序将数据加载到测试数据库中。psql 有一个特殊的 \copy 命令,使此操作变得简单。可以像这样从终端使用它:

psql --c "\copy schema.table FROM '~/dump_data/schema.table' WITH DELIMITER ',' CSV;"

对于(2),再次使用 pg_dump 处理它们,使用 -t <table name> 标志仅导出指定的表,并使用 -a 仅导出数据(不包括模式)。这也可以添加到(1)的脚本中,只需为(2)中的每个表添加一个未限定的 SELECT * 并以相同方式加载数据。

(3)类别的表已经通过初始导出处理过了。

英文:

The best way I have found to do this is to first export the complete schema using the pg_dump tool with the -s flag (to dump schema only, and not data), and then export data separately.

To load your schemas starting from a fresh, empty database, use pg_restore. It will read the output from pg_dump and use it to build a database.

When exporting the data, you'll need to classify each table (you can write a helper script to make this easier, or use excel, etc...):

  1. Tables that you want a subset of data from, based on some condition (ie. a certain value in person_id)
  2. Tables you want to copy in their entirety (like dimension tables, such as calendar and company_locations)
  3. Tables that you don't need any data from

For (1), you will need to write an appropriate SELECT query that returns the subset of data that you want to copy. Put those queries in a script and have each one write the result to a separate file, named &lt;schema&gt;.&lt;table&gt;. Lastly, use the psql utility to load the data to the test database. psql has a special \copy command that makes this easy. It can be used from a terminal like this:

psql --c &quot;\copy schema.table FROM ‘~/dump_data/schema.table’ WITH DELIMITER ‘,’ CSV;&quot;

Use pg_dump again to take care of all of those falling under (2), using the -t &lt;table name&gt; flag to dump only the named tables, and -a to dump only data (no schema). This could also be added to the script for (1) by just adding an unqualified SELECT * for each table in (2) and loading the data the same way.

Tables falling under (3) were already handled by the initial export.

huangapple
  • 本文由 发表于 2020年8月12日 09:25:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/63368450.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定