问题

我有2个不同的数据库，需要在其中一个数据库和另一个数据库之间复制数据。
我正在使用以下代码：

with psycopg.connect(dsn_src) as conn1, \
    psycopg.connect(dsn_tgt) as conn2:
    with conn1.cursor().copy("COPY " + table + " TO STDOUT (FORMAT BINARY)") as copy1:
        with conn2.cursor().copy("COPY " + table + " FROM STDIN (FORMAT BINARY)") as copy2:
            for data in copy1:
                copy2.write(data)

问题是，我正在复制大约10M+行的数据，看起来循环需要很长时间。有没有办法提高性能？

英文:

I have 2 different databases where I need to copy the data between one database to another.
I'm using the following code:

with psycopg.connect(dsn_src) as conn1, \
    psycopg.connect(dsn_tgt) as conn2:
    with conn1.cursor().copy(&quot;COPY &quot; + table + &quot; TO STDOUT (FORMAT BINARY)&quot;) as copy1:
        with conn2.cursor().copy(&quot;COPY &quot; + table + &quot; FROM STDIN (FORMAT BINARY)&quot;) as copy2:
            for data in copy1:
                copy2.write(data)

The problem is that I'm copying around 10M+ rows, and it seems that the loop takes forever. Is there any way to improve the performance?

答案1

得分: 2

使用pg_dump命令将表以PostgreSQL本机格式复制到psql中。

pg_dump -h 源数据库主机 -p 源数据库端口 -U 源数据库用户 \
    --schema='你的模式'                                    \
    --table='你的模式.你的表' 源数据库名称           \
| psql -h 目标数据库主机 -p 目标数据库端口 -U 目标数据库用户  \
    -v ON_ERROR_STOP=1 目标数据库名称                         \
&>> 你的表.clone.log  &

首先使用高压缩的多线程pg_dump，然后使用多线程pg_restore：

pg_dump -Z9 -Fd -h 127.0.0.1 -p 5432 --verbose --jobs=16 \
--schema '源模式' --table='源模式.源表' \
-U 源用户 源数据库名 -f 传输文件名 \
&> 源表.backup.log 2>&1 &

pg_restore -h 目标数据库主机 -p 5432 -U 目标用户 \
--dbname=目标数据库名 --format=d --exit-on-error \
--verbose --jobs=16 传输文件名 \
&>> 源表.restore.14.log &

使用postgres_fdw将其链接为外部表，使其可访问，无需在目标上物理写入第二次。

CREATE EXTENSION IF NOT EXISTS postgres_fdw;

CREATE SERVER 源数据库 FOREIGN DATA WRAPPER postgres_fdw 
OPTIONS (host '127.0.0.1', dbname '源数据库名', port '5432');

CREATE USER MAPPING FOR 目标上的用户 SERVER 源数据库
OPTIONS (user '源上的用户', password '你的密码');

CREATE SCHEMA 从源链接的模式;

IMPORT FOREIGN SCHEMA 源上的模式 LIMIT TO (源上的表)
FROM SERVER 源数据库 INTO 从源链接的模式;

与dblink类似的思路。

设置复制以持续流式传输数据。
尝试替代的psycopg插入方法。

英文:

Pipe pg_dump to psql to copy the table using PostgreSQL native format.

pg_dump -h source_db_host -p source_db_port -U source_db_user \
    --schema=&#39;your_schema&#39;                                    \
    --table=&#39;your_schema.your_table&#39; source_db_name           \
| psql -h target_db_host -p target_db_port -U target_db_user  \
    -v ON_ERROR_STOP=1 target_db_name                         \
&amp;&gt;&gt; your_table.clone.log  &amp;

Multithreaded pg_dump first with high compression, then multithreaded pg_restore:

pg_dump -Z9 -Fd -h 127.0.0.1 -p 5432 --verbose --jobs=16 \
--schema &#39;source_schema&#39; --table=&#39;source_schema.source_table&#39; \
-U source_user source_dbname -f transfer_file_name \
&gt; source_table.backup.log 2&gt;&amp;1 &amp;

pg_restore -h target_db_host -p 5432 -U target_user \
--dbname=target_db_name --format=d --exit-on-error \
--verbose --jobs=16 transfer_file_name \
&amp;&gt;&gt; source_table.restore.14.log &amp;

Use postgres_fdw to link it as a foreign table to make it accessible, without having to physically write it the second time on
the target.

CREATE EXTENSION IF NOT EXISTS postgres_fdw;

CREATE SERVER source_db FOREIGN DATA WRAPPER postgres_fdw 
OPTIONS (host &#39;127.0.0.1&#39;, dbname &#39;source_db_name&#39;, port &#39;5432&#39;);

CREATE USER MAPPING FOR your_user_on_target SERVER source_db
OPTIONS (user &#39;your_user_on_source&#39;, password &#39;your_password&#39;);

CREATE SCHEMA schema_linked_from_source;

IMPORT FOREIGN SCHEMA schema_on_source LIMIT TO (table_on_source)
FROM SERVER source_db INTO schema_linked_from_source;

Similar idea with dblink.

Set up replication to stream the data continuously.
Play around with alternative psycopg insertion methods.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

psycopg用于大型数据集的COPY命令花费了很长时间。

问题

答案1

SQL使用三个表的选择查询使用三个表

Postgres删除数据库错误：pq: 无法删除当前打开的数据库。

将每一行的值与不同的值相乘。

可以从我的Java应用程序重置数据库到特定状态吗？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论