2023年6月1日 21:09:30go评论111阅读模式

英文:

Copy remote postgres database to second remote server

问题

我目前有一个生产环境和测试环境的数据库，它们位于两个Azure Postgres服务器上。我想每晚备份生产数据库到测试数据库，以便每天早上两者都是相同的。我的数据表具有约束和键，因此我不能只复制数据本身，还必须复制模式，所以简单的pandas df.to_sql无法解决问题。

我的当前计划是运行一个每晚的Azure Functions Python脚本来执行复制操作。我尝试过使用SQLAlchemy，但在正确复制元数据方面遇到了重大问题。
现在我正在尝试使用Postgres的pg_dump和pg_restore/psql命令通过子进程执行以下代码：

def backup_database(location, database, password, username, backup_file):
    # 使用pg_dump命令创建指定数据库的备份
    cmd = [
        'pg_dump',
        '-Fc',
        '-f', backup_file,
        '-h', location,
        '-d', database,
        '-U', username,
        '-p', '5432',
        '-W',
    ]
    subprocess.run(cmd, check=True, input=password.encode())
def clear_database(engine, metadata):
    # 删除数据库中的所有表
    metadata.drop_all(bind=engine, checkfirst=False)
def restore_database(location, database, password, username, backup_file):
    # 使用pg_restore命令将备份还原到数据库
    # cmd = ['pg_restore', '-Fc', '-d', engine.url.database, backup_file]
    cmd = [
        'pg_restore',
        '-Fc',
        '-C',
        '-f', backup_file,
        '-h', location,
        # '-d', database,
        '-U', username,
        '-p', '5432',
        '-W',
    ]
    try:
        subprocess.run(cmd, check=True, capture_output=True, text=True)
        print("备份已还原到测试服务器。")
    except subprocess.CalledProcessError as e:
        print("在还原备份时发生错误：")
        print(e.stdout)  # 打印命令的输出
        print(e.stderr)  # 打印错误消息（如果可用）
# 定义备份文件路径
backup_file = '/pathtofile/backup_file.dump'  # 使用所需的备份文件路径进行更新
backup_file2 = 'backup_file.dump'  # 使用所需的备份文件路径进行更新
# 备份生产数据库
backup_database(input_host, input_database, input_password, input_user, backup_file)
print("已创建生产数据库的备份。")
# 为测试服务器创建元数据对象
output_metadata = MetaData(bind=output_engine)
clear_database(output_engine, output_metadata)
print("已清除测试服务器。")
restore_database(output_host, output_database, output_password, output_user, backup_file2)
print("备份已还原到测试服务器。")

这段代码似乎正在创建一个倒转文件，但无法成功还原到测试数据库。
如果我让这段代码工作，如何在Azure Functions中指定文件路径？这是否是从Azure Functions运行的合适解决方案？
如果不是，请问如何使用SQLAlchemy成功清除测试数据/元数据，然后每晚从生产环境复制数据？

英文:

I currently have a prod and test database that live on 2 servers azure postgres servers. I want to do a nightly backup of the prod database onto test, such that every morning the two are identical. My datatables have contraints and keys, so I can't just copy over the data itself but also the schemas, so a simple pandas df.to_sql won't cover it.

My current plan is to run a nightly Azure Functions python script that does the copying over. I tried sqlalchemy but had significant issues copying over metadata correctly.
Now I am trying to use postgres' pg_dump and pg_restore/psql commands via a subprocess with the following code:

def backup_database(location, database, password, username, backup_file):
    # Use pg_dump command to create a backup of the specified database
    cmd = [
        &#39;pg_dump&#39;,
        &#39;-Fc&#39;,
        &#39;-f&#39;, backup_file,
        &#39;-h&#39;, location,
        &#39;-d&#39;, database,
        &#39;-U&#39;, username,
        &#39;-p&#39;, &#39;5432&#39;,
        &#39;-W&#39;,
    ]
    subprocess.run(cmd, check=True, input=password.encode())
def clear_database(engine, metadata):
    # Drop all tables in the database
    metadata.drop_all(bind=engine, checkfirst=False)
def restore_database(location, database, password, username, backup_file):
    # Use pg_restore command to restore the backup onto the database
    # cmd = [&#39;pg_restore&#39;, &#39;-Fc&#39;, &#39;-d&#39;, engine.url.database, backup_file]
    cmd = [
        &#39;pg_restore&#39;,
        &#39;-Fc&#39;,
        &#39;-C&#39;,
        &#39;-f&#39;, backup_file,
        &#39;-h&#39;, location,
        #&#39;-d&#39;, database,
        &#39;-U&#39;, username,
        &#39;-p&#39;, &#39;5432&#39;,
        &#39;-W&#39;,
    ]
    try:
        subprocess.run(cmd, check=True, capture_output=True, text=True)
        print(&quot;Backup restored onto the test server.&quot;)
    except subprocess.CalledProcessError as e:
        print(&quot;Error occurred while restoring the backup:&quot;)
        print(e.stdout)  # Print the output from the command
        print(e.stderr)  # Print the error message, if available
# Define backup file path
backup_file = &#39;/pathtofile/backup_file.dump&#39;  # Update with the desired backup file path
backup_file2 = &#39;backup_file.dump&#39;  # Update with the desired backup file path
# Backup the production database
backup_database(input_host, input_database, input_password, input_user, backup_file)
print(&quot;Backup of the production database created.&quot;)
# Create metadata object for test server
output_metadata = MetaData(bind=output_engine)
clear_database(output_engine, output_metadata)
print(&quot;Test server cleared.&quot;)
restore_database(output_host, output_datebase, output_password, output_user, backup_file2)
print(&quot;Backup restored onto the test server.&quot;)

This code appears to be creating a dump file, but it is not successfully restoring to the test database.
If I get this code to work, how do I specify file paths within Azure Functions, is this a suitable solution to run from Azure Functions?
If not, how to get sqlalchemy to successfully clear test data/metadata, then copy over data from prod every night?

答案1

得分: 0

>我已经参考了MSDOC Psycopg和PostgreSQL。

import psycopg2
src_conn_string = "源连接字符串"
dst_conn_string = "目标连接字符串"
try:
    src_conn = psycopg2.connect(src_conn_string)
    src_cursor = src_conn.cursor()
    print("已连接到源数据库。")
    try:
        dst_conn = psycopg2.connect(dst_conn_string)
        dst_cursor = dst_conn.cursor()
        print("已连接到目标数据库。")
        try:
            src_cursor.execute(
                "SELECT table_name FROM information_schema.tables WHERE table_schema='public' AND table_type='BASE TABLE'"
            )
            tables = src_cursor.fetchall()
            for table in tables:
                src_cursor.execute("SELECT * FROM {0}".format(table[0]))
                rows = src_cursor.fetchall()
                for row in rows:
                    dst_cursor.execute("INSERT INTO {0} VALUES {1}".format(table[0], row))
                print("数据成功传输。")
        except psycopg2.Error as e:
            print("传输数据时出错：", e)
        finally:
            dst_conn.commit()
            dst_cursor.close()
            dst_conn.close()
            print("目标数据库连接已关闭。")
    except psycopg2.Error as e:
        print("连接到目标数据库时出错：", e)
    finally:
        src_cursor.close()
        src_conn.close()
        print("源数据库连接已关闭。")
except psycopg2.Error as e:
    print("连接到源数据库时出错：", e)

输出：

复制远程的PostgreSQL数据库到第二个远程服务器。

在Azure中：

源：

复制远程的PostgreSQL数据库到第二个远程服务器。

目标：

复制远程的PostgreSQL数据库到第二个远程服务器。

英文:

>I have referred MSDOC Psycopg and PostgreSQL.

import  psycopg2
src_conn_string = &quot;SourceConnectionString&quot;
dst_conn_string = &quot;DStConnectionString&quot;
try:
src_conn = psycopg2.connect(src_conn_string)
src_cursor = src_conn.cursor()
print(&quot;Connected to source database.&quot;)
try:
dst_conn = psycopg2.connect(dst_conn_string)
dst_cursor = dst_conn.cursor()
print(&quot;Connected to destination database.&quot;)
try:
src_cursor.execute(
&quot;SELECT table_name FROM information_schema.tables WHERE table_schema=&#39;public&#39; AND table_type=&#39;BASE TABLE&#39;&quot;
)
tables = src_cursor.fetchall()
for  table  in  tables:
src_cursor.execute(&quot;SELECT * FROM {0}&quot;.format(table[0]))
rows = src_cursor.fetchall()
for  row  in  rows:
dst_cursor.execute(&quot;INSERT INTO {0} VALUES {1}&quot;.format(table[0],  row))
print(&quot;Data transferred successfully.&quot;)
except  psycopg2.Error  as  e:
print(&quot;Error transferring data: &quot;,  e)
finally:
dst_conn.commit()
dst_cursor.close()
dst_conn.close()
print(&quot;Destination database connection closed.&quot;)
except  psycopg2.Error  as  e:
print(&quot;Error connecting to destination database: &quot;,  e)
finally:
src_cursor.close()
src_conn.close()
print(&quot;Source database connection closed.&quot;)
except  psycopg2.Error  as  e:
print(&quot;Error connecting to source database: &quot;,  e)

Output:

复制远程的PostgreSQL数据库到第二个远程服务器。

In Azure:

Source:

复制远程的PostgreSQL数据库到第二个远程服务器。

Destination:

复制远程的PostgreSQL数据库到第二个远程服务器。

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

复制远程的PostgreSQL数据库到第二个远程服务器。

问题

答案1

如何以高效的方式使用列表中的元素替换字符串的特定字符？

从字符串中提取基于特定键值对的数值。

<Blob: BUCKETNAME, PATH, None> could not be converted to unicode python copying data between gcs buckets

如何在Linux Mint上将Python 3.11.3降级到Python 3.9。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论