psycopg用于大型数据集的COPY命令花费了很长时间。

huangapple go评论48阅读模式
英文:

psycopg COPY command for large dataset taking forever

问题

我有2个不同的数据库,需要在其中一个数据库和另一个数据库之间复制数据。
我正在使用以下代码:

with psycopg.connect(dsn_src) as conn1, \
    psycopg.connect(dsn_tgt) as conn2:
    with conn1.cursor().copy("COPY " + table + " TO STDOUT (FORMAT BINARY)") as copy1:
        with conn2.cursor().copy("COPY " + table + " FROM STDIN (FORMAT BINARY)") as copy2:
            for data in copy1:
                copy2.write(data)

问题是,我正在复制大约10M+行的数据,看起来循环需要很长时间。有没有办法提高性能?

英文:

I have 2 different databases where I need to copy the data between one database to another.
I'm using the following code:

with psycopg.connect(dsn_src) as conn1, \
    psycopg.connect(dsn_tgt) as conn2:
    with conn1.cursor().copy("COPY " + table + " TO STDOUT (FORMAT BINARY)") as copy1:
        with conn2.cursor().copy("COPY " + table + " FROM STDIN (FORMAT BINARY)") as copy2:
            for data in copy1:
                copy2.write(data)

The problem is that I'm copying around 10M+ rows, and it seems that the loop takes forever. Is there any way to improve the performance?

答案1

得分: 2

  1. 使用pg_dump命令将表以PostgreSQL本机格式复制到psql中。
    pg_dump -h 源数据库主机 -p 源数据库端口 -U 源数据库用户 \
        --schema='你的模式'                                    \
        --table='你的模式.你的表' 源数据库名称           \
    | psql -h 目标数据库主机 -p 目标数据库端口 -U 目标数据库用户  \
        -v ON_ERROR_STOP=1 目标数据库名称                         \
    &>> 你的表.clone.log  &
    
  2. 首先使用高压缩的多线程pg_dump,然后使用多线程pg_restore
    pg_dump -Z9 -Fd -h 127.0.0.1 -p 5432 --verbose --jobs=16 \
    --schema '源模式' --table='源模式.源表' \
    -U 源用户 源数据库名 -f 传输文件名 \
    &> 源表.backup.log 2>&1 &
    
    pg_restore -h 目标数据库主机 -p 5432 -U 目标用户 \
    --dbname=目标数据库名 --format=d --exit-on-error \
    --verbose --jobs=16 传输文件名 \
    &>> 源表.restore.14.log &
    
  3. 使用postgres_fdw将其链接为外部表,使其可访问,无需在目标上物理写入第二次。
    CREATE EXTENSION IF NOT EXISTS postgres_fdw;
    
    CREATE SERVER 源数据库 FOREIGN DATA WRAPPER postgres_fdw 
    OPTIONS (host '127.0.0.1', dbname '源数据库名', port '5432');
    
    CREATE USER MAPPING FOR 目标上的用户 SERVER 源数据库
    OPTIONS (user '源上的用户', password '你的密码');
    
    CREATE SCHEMA 从源链接的模式;
    
    IMPORT FOREIGN SCHEMA 源上的模式 LIMIT TO (源上的表)
    FROM SERVER 源数据库 INTO 从源链接的模式;
    

    dblink类似的思路。

  4. 设置复制以持续流式传输数据。
  5. 尝试替代的psycopg插入方法
英文:
  1. Pipe pg_dump to psql to copy the table using PostgreSQL native format.
    pg_dump -h source_db_host -p source_db_port -U source_db_user \
        --schema='your_schema'                                    \
        --table='your_schema.your_table' source_db_name           \
    | psql -h target_db_host -p target_db_port -U target_db_user  \
        -v ON_ERROR_STOP=1 target_db_name                         \
    &>> your_table.clone.log  &
    
  2. Multithreaded pg_dump first with high compression, then multithreaded pg_restore:
    pg_dump -Z9 -Fd -h 127.0.0.1 -p 5432 --verbose --jobs=16 \
    --schema 'source_schema' --table='source_schema.source_table' \
    -U source_user source_dbname -f transfer_file_name \
    > source_table.backup.log 2>&1 &
    
    pg_restore -h target_db_host -p 5432 -U target_user \
    --dbname=target_db_name --format=d --exit-on-error \
    --verbose --jobs=16 transfer_file_name \
    &>> source_table.restore.14.log &
    
  3. Use postgres_fdw to link it as a foreign table to make it accessible, without having to physically write it the second time on
    the target.

    CREATE EXTENSION IF NOT EXISTS postgres_fdw;
    
    CREATE SERVER source_db FOREIGN DATA WRAPPER postgres_fdw 
    OPTIONS (host '127.0.0.1', dbname 'source_db_name', port '5432');
    
    CREATE USER MAPPING FOR your_user_on_target SERVER source_db
    OPTIONS (user 'your_user_on_source', password 'your_password');
    
    CREATE SCHEMA schema_linked_from_source;
    
    IMPORT FOREIGN SCHEMA schema_on_source LIMIT TO (table_on_source)
    FROM SERVER source_db INTO schema_linked_from_source;
    

    Similar idea with dblink.

  4. Set up replication to stream the data continuously.
  5. Play around with alternative psycopg insertion methods.

huangapple
  • 本文由 发表于 2023年6月5日 18:22:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76405476.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定