如何优化与SQLite数据库的工作?

huangapple go评论89阅读模式
英文:

How to optimize working with SQLite database?

问题

我有以下代码来处理我的文件。如果文件在数据库中已经添加过了,那么就跳过这个文件(使用pass语句)。

import sqlite3 as lite

for i, file in enumerate(allMedia):
  con = lite.connect(DB_PATH)
  con.text_factory = str
  with con:
    cur = con.cursor()
    cur.execute("SELECT rowid,files_id,path,set_id,md5,tagged FROM files WHERE path = ?", (file,))
    row = cur.fetchone()
    if (row is not None):
       pass

这段代码的问题是处理速度慢(每个在数据库中找到的文件需要2-3秒)。数据库的大小约为30MB,需要处理成千上万个文件。

有没有办法加快处理速度?

英文:

I have the following code to work with my files. If file is added to the database before, then the file is just skipped (pass).

import sqlite3 as lite

for i, file in enumerate( allMedia ):
  con = lite.connect(DB_PATH)
  con.text_factory = str
  with con:
    cur = con.cursor()
    cur.execute("SELECT rowid,files_id,path,set_id,md5,tagged FROM files WHERE path = ?", (file,))
    row = cur.fetchone()
    if (row is not None):
       pass

The problem with this code is slow processing (2-3 seconds for each file found in the database). The database size is ~30 Mb. And thousands of files should be processed.

Is there any way to speed up the process?

答案1

得分: 2

如果您只对文件路径匹配的表记录感兴趣,您可以使用一个SQL查询来实现这一点,该查询使用in运算符。您需要准备一个?的列表:

con = lite.connect(DB_PATH)
con.text_factory = str
cur = con.cursor()
anchors = (",?" * len(allMedia))[1:]
sql = f"SELECT * FROM files WHERE path in ({anchors})"
result = cur.execute(sql, allMedia)

for row in result.fetchall():
    # 处理每一行
    pass

如果您的列表(allMedia)比Sqlite支持的in运算符的长度更长,则首先将列表拆分成,并为每个块执行上述查询。

con = lite.connect(DB_PATH)
con.text_factory = str
cur = con.cursor()
for media in chunks(allMedia, 500):
    anchors = (",?" * len(media))[1:]
    sql = f"SELECT * FROM files WHERE path in ({anchors})"
    result = cur.execute(sql, media)
    for row in result.fetchall():
        # 处理每一行
        pass

如果您的allMedia列表可能包含重复项,并且您不需要处理相同的文件两次,则首先从要迭代的集合中消除这些重复项:

for media in chunks(list(set(allMedia)), 500):
英文:

If you are only interested in the records of the table where the file path matches, you could do this with one SQL query -- one that uses the in operator. You need to then prepare a list of ?:

con = lite.connect(DB_PATH)
con.text_factory = str
cur = con.cursor()
anchors = (",?" * len(allMedia))[1:]
sql = f"SELECT * FROM files WHERE path in ({anchors})"
result = cur.execute(sql, allMedia)

for row in result.fetchall():
    # do something with row
    pass

If you have lists (allMedia) that are longer than Sqlite supports for its in operator, then first split your list into chunks, and perform the above query for each chunk.

> python
> # From the linked Q&A
> def chunks(lst, n):
> """Yield successive n-sized chunks from lst."""
> for i in range(0, len(lst), n):
> yield lst[i:i + n]
>

con = lite.connect(DB_PATH)
con.text_factory = str
cur = con.cursor()
for media in chunks(allMedia, 500):
    anchors = (",?" * len(media))[1:]
    sql = f"SELECT * FROM files WHERE path in ({anchors})"
    result = cur.execute(sql, media)
    for row in result.fetchall():
        # do something with row
        pass

If your allMedia lists is likely to have duplicates, and you don't need to process the same file twice, then first eliminate these duplicates from the collection being iterated:

for media in chunks(list(set(allMedia)), 500):

huangapple
  • 本文由 发表于 2023年8月8日 21:34:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76860082.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定