改善使用 psycopg2 执行 “SELECT * from table” 的执行时间。

huangapple go评论70阅读模式
英文:

Improve execution time of SELECT * from table using psycopg2

问题

我有一个包含大约70000行的表,我正在尝试使用psycopg2模块在Python中加载整个表格。

当我使用这段代码时,仅加载表格需要大约110秒。

with conn.cursor() as cursor:
    cursor.execute("SELECT * FROM Table;")
    rows = cursor.fetchall()
    print(rows)

我想要问的是,上述Python代码中的瓶颈工作是什么。是读取表格还是将数据转换为Python字典和元组?是否有任何方法可以提高Python代码的运行时间?

现在,当我使用以下代码时,它能够几乎立即加载整个表格并将其打印到文件中,这表明加载不是问题所在。

COPY (SELECT * FROM Table) TO 'file.txt';

为什么这两种情况之间有如此大的运行时差异?

该表包含id和以JSONB形式的字典。

Postgres 14.7

操作系统:Ubuntu 22.04 WSL(一切在WSL中运行)

编辑:
从终端运行的此命令仅需要0.83秒。

bin/psql -d db -c "SELECT * FROM Table;" -o output_file.txt
英文:

I have a table with around 70000 rows, and I am trying to load the entire table in Python using the psycopg2 module.

When I use this code, it takes around 110 seconds only to load the table.

with conn.cursor() as cursor:
     cursor.execute("""SELECT * FROM Table;""")
     rows = cursor.fetchall()
     print(rows)

I want to ask what is the bottleneck job in the above Python code. Is it reading the table or converting the data into Python dictionaries and tuples? And is there any way to improve the runtime of the Python code?

Now, when I use the below code, it is able to load the entire table and print it in a file almost instantly, which indicates loading is not the problem.

COPY (SELECT * FROM Table) TO 'file.txt';

Why is there so much runtime difference between these two cases?

The table consists of id and a dictionary in the form of JSONB.

Postgres 14.7

OS: Ubuntu 22.04 WSL (Everything running within the WSL)

Edit:
This command from the terminal also takes only 0.83 seconds.

bin/psql -d db -c "SELECT * FROM Table;" -o output_file.txt

答案1

得分: 1

我相信你可以使用服务器端游标并对其进行迭代以提高性能。因此,你应该使用以下命令:

cursor = conn.cursor('cursor_unique_name')
cursor.execute("SELECT * FROM Table;")
for row in cursor:
    print "%s\n" % (row)

当你为游标指定一个名称时,psycopg2 会创建一个服务器端游标,它阻止从服务器一次下载所有记录。详细了解请点击这里

英文:

I believe you can use a server-side cursor and iterate over it to improve your performance. So, you should use the following commands:

cursor = conn.cursor('cursor_unique_name')
cursor.execute("""SELECT * FROM Table;""")
for row in cursor:
	print "%s\n" % (row)

When you specify a name for the cursor, psycopg2 creates a server-side cursor, which prevents all of the records from beign downloaded at once from the server.
Read more about it here.

huangapple
  • 本文由 发表于 2023年7月18日 02:32:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76707212.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定