英文:
psycopg COPY command for large dataset taking forever
问题
我有2个不同的数据库,需要在其中一个数据库和另一个数据库之间复制数据。
我正在使用以下代码:
with psycopg.connect(dsn_src) as conn1, \
psycopg.connect(dsn_tgt) as conn2:
with conn1.cursor().copy("COPY " + table + " TO STDOUT (FORMAT BINARY)") as copy1:
with conn2.cursor().copy("COPY " + table + " FROM STDIN (FORMAT BINARY)") as copy2:
for data in copy1:
copy2.write(data)
问题是,我正在复制大约10M+行的数据,看起来循环需要很长时间。有没有办法提高性能?
英文:
I have 2 different databases where I need to copy the data between one database to another.
I'm using the following code:
with psycopg.connect(dsn_src) as conn1, \
psycopg.connect(dsn_tgt) as conn2:
with conn1.cursor().copy("COPY " + table + " TO STDOUT (FORMAT BINARY)") as copy1:
with conn2.cursor().copy("COPY " + table + " FROM STDIN (FORMAT BINARY)") as copy2:
for data in copy1:
copy2.write(data)
The problem is that I'm copying around 10M+ rows, and it seems that the loop takes forever. Is there any way to improve the performance?
答案1
得分: 2
- 使用
pg_dump
命令将表以PostgreSQL本机格式复制到psql
中。pg_dump -h 源数据库主机 -p 源数据库端口 -U 源数据库用户 \ --schema='你的模式' \ --table='你的模式.你的表' 源数据库名称 \ | psql -h 目标数据库主机 -p 目标数据库端口 -U 目标数据库用户 \ -v ON_ERROR_STOP=1 目标数据库名称 \ &>> 你的表.clone.log &
- 首先使用高压缩的多线程
pg_dump
,然后使用多线程pg_restore
:pg_dump -Z9 -Fd -h 127.0.0.1 -p 5432 --verbose --jobs=16 \ --schema '源模式' --table='源模式.源表' \ -U 源用户 源数据库名 -f 传输文件名 \ &> 源表.backup.log 2>&1 & pg_restore -h 目标数据库主机 -p 5432 -U 目标用户 \ --dbname=目标数据库名 --format=d --exit-on-error \ --verbose --jobs=16 传输文件名 \ &>> 源表.restore.14.log &
- 使用
postgres_fdw
将其链接为外部表,使其可访问,无需在目标上物理写入第二次。CREATE EXTENSION IF NOT EXISTS postgres_fdw; CREATE SERVER 源数据库 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '127.0.0.1', dbname '源数据库名', port '5432'); CREATE USER MAPPING FOR 目标上的用户 SERVER 源数据库 OPTIONS (user '源上的用户', password '你的密码'); CREATE SCHEMA 从源链接的模式; IMPORT FOREIGN SCHEMA 源上的模式 LIMIT TO (源上的表) FROM SERVER 源数据库 INTO 从源链接的模式;
与
dblink
类似的思路。 - 设置复制以持续流式传输数据。
- 尝试替代的
psycopg
插入方法。
英文:
- Pipe
pg_dump
topsql
to copy the table using PostgreSQL native format.pg_dump -h source_db_host -p source_db_port -U source_db_user \ --schema='your_schema' \ --table='your_schema.your_table' source_db_name \ | psql -h target_db_host -p target_db_port -U target_db_user \ -v ON_ERROR_STOP=1 target_db_name \ &>> your_table.clone.log &
- Multithreaded
pg_dump
first with high compression, then multithreadedpg_restore
:pg_dump -Z9 -Fd -h 127.0.0.1 -p 5432 --verbose --jobs=16 \ --schema 'source_schema' --table='source_schema.source_table' \ -U source_user source_dbname -f transfer_file_name \ > source_table.backup.log 2>&1 & pg_restore -h target_db_host -p 5432 -U target_user \ --dbname=target_db_name --format=d --exit-on-error \ --verbose --jobs=16 transfer_file_name \ &>> source_table.restore.14.log &
- Use
postgres_fdw
to link it as a foreign table to make it accessible, without having to physically write it the second time on
the target.CREATE EXTENSION IF NOT EXISTS postgres_fdw; CREATE SERVER source_db FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '127.0.0.1', dbname 'source_db_name', port '5432'); CREATE USER MAPPING FOR your_user_on_target SERVER source_db OPTIONS (user 'your_user_on_source', password 'your_password'); CREATE SCHEMA schema_linked_from_source; IMPORT FOREIGN SCHEMA schema_on_source LIMIT TO (table_on_source) FROM SERVER source_db INTO schema_linked_from_source;
Similar idea with
dblink
. - Set up replication to stream the data continuously.
- Play around with alternative
psycopg
insertion methods.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论