英文:
Why is exporting a query result via SQLite command line shell so slow?
问题
要将查询结果(约4500万条记录)导出到CSV文件,我使用了命令行 shell:
$ sqlite3 db.db3
> .headers on
> .mode csv
> .once result.csv
> select .....
这个过程大约花费了9个小时才完成。然后我使用了Python:
import sqlite3
import pandas as pd
conn = sqlite3.connect('db.db3')
df = pd.read_sql(query, conn)
df.to_csv('output.csv')
这个过程大约花费了20分钟。我理解为什么Python可能会快一点,但没有预料到会有如此巨大的差异。为什么SQLite命令行 shell 运行得这么慢?
英文:
To export the result of a query (~45 million records) to a CSV file I used the command line shell:
$ sqlite3 db.db3
> .headers on
> .mode csv
> .once result.csv
> select .....
This took about 9 hours to run. I then used Python:
import sqlite3
import pandas as pd
conn = sqlite3.connect('db.db3')
df = pd.read_sql(query, conn)
df.to_csv('output.csv')
This took about 20 minutes. I understand why Python might be a little bit faster but did not expect such a huge difference. Why is the SQLite command line shell so slow?
答案1
得分: 1
我在SQLite论坛上得到了帮助。
看起来CLI方法的运行时间很大程度上是由于串行磁盘I/O造成的。如果我们可以首先将整个查询加载到内存中,然后再写入,那么写入时间将是可比较的。
以下是如何执行的一个示例:
$ sqlite3 db.db3
> attach database ':memory:' as in_memory;
> create table in_memory.large_result as [在这里插入原始查询];
> .mode csv
> .headers on
> .once result.csv
> select * from in_memory.large_result;
英文:
I got help with this on the SQLite forum.
It seems like much of the running time with the CLI method is serial disk I/O. If we can load the entire query into memory first and then write it, the write time will be comparable.
Here is one example of how to do it:
$ sqlite3 db.db3
> attach database ':memory:' as in_memory;
> create table in_memory.large_result as [insert original query here];
> .mode csv
> .headers on
> .once result.csv
> select * from in_memory.large_result;
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论