为什么通过SQLite命令行shell导出查询结果如此缓慢?

huangapple go评论67阅读模式
英文:

Why is exporting a query result via SQLite command line shell so slow?

问题

要将查询结果(约4500万条记录)导出到CSV文件,我使用了命令行 shell:

$ sqlite3 db.db3
> .headers on
> .mode csv
> .once result.csv
> select .....

这个过程大约花费了9个小时才完成。然后我使用了Python:

import sqlite3
import pandas as pd

conn = sqlite3.connect('db.db3')
df = pd.read_sql(query, conn)
df.to_csv('output.csv')

这个过程大约花费了20分钟。我理解为什么Python可能会快一点,但没有预料到会有如此巨大的差异。为什么SQLite命令行 shell 运行得这么慢?

英文:

To export the result of a query (~45 million records) to a CSV file I used the command line shell:

$ sqlite3 db.db3
> .headers on
> .mode csv
> .once result.csv
> select .....

This took about 9 hours to run. I then used Python:

import sqlite3
import pandas as pd

conn = sqlite3.connect('db.db3')
df = pd.read_sql(query, conn)
df.to_csv('output.csv')

This took about 20 minutes. I understand why Python might be a little bit faster but did not expect such a huge difference. Why is the SQLite command line shell so slow?

答案1

得分: 1

我在SQLite论坛上得到了帮助。

看起来CLI方法的运行时间很大程度上是由于串行磁盘I/O造成的。如果我们可以首先将整个查询加载到内存中,然后再写入,那么写入时间将是可比较的。

以下是如何执行的一个示例:

$ sqlite3 db.db3
> attach database ':memory:' as in_memory;
> create table in_memory.large_result as [在这里插入原始查询];
> .mode csv
> .headers on
> .once result.csv
> select * from in_memory.large_result;
英文:

I got help with this on the SQLite forum.

It seems like much of the running time with the CLI method is serial disk I/O. If we can load the entire query into memory first and then write it, the write time will be comparable.

Here is one example of how to do it:

$ sqlite3 db.db3
> attach database ':memory:' as in_memory;
> create table in_memory.large_result as [insert original query here];
> .mode csv
> .headers on
> .once result.csv
> select * from in_memory.large_result;

huangapple
  • 本文由 发表于 2023年7月17日 16:32:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/76702709.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定