问题

我有以下代码来处理我的文件。如果文件在数据库中已经添加过了，那么就跳过这个文件（使用pass语句）。

import sqlite3 as lite

for i, file in enumerate(allMedia):
  con = lite.connect(DB_PATH)
  con.text_factory = str
  with con:
    cur = con.cursor()
    cur.execute("SELECT rowid,files_id,path,set_id,md5,tagged FROM files WHERE path = ?", (file,))
    row = cur.fetchone()
    if (row is not None):
       pass

这段代码的问题是处理速度慢（每个在数据库中找到的文件需要2-3秒）。数据库的大小约为30MB，需要处理成千上万个文件。

有没有办法加快处理速度？

英文:

I have the following code to work with my files. If file is added to the database before, then the file is just skipped (pass).

import sqlite3 as lite

for i, file in enumerate( allMedia ):
  con = lite.connect(DB_PATH)
  con.text_factory = str
  with con:
    cur = con.cursor()
    cur.execute(&quot;SELECT rowid,files_id,path,set_id,md5,tagged FROM files WHERE path = ?&quot;, (file,))
    row = cur.fetchone()
    if (row is not None):
       pass

The problem with this code is slow processing (2-3 seconds for each file found in the database). The database size is ~30 Mb. And thousands of files should be processed.

Is there any way to speed up the process?

答案1

得分: 2

如果您只对文件路径匹配的表记录感兴趣，您可以使用一个SQL查询来实现这一点，该查询使用in运算符。您需要准备一个?的列表：

con = lite.connect(DB_PATH)
con.text_factory = str
cur = con.cursor()
anchors = (",?" * len(allMedia))[1:]
sql = f"SELECT * FROM files WHERE path in ({anchors})"
result = cur.execute(sql, allMedia)

for row in result.fetchall():
    # 处理每一行
    pass

如果您的列表（allMedia）比Sqlite支持的in运算符的长度更长，则首先将列表拆分成块，并为每个块执行上述查询。

con = lite.connect(DB_PATH)
con.text_factory = str
cur = con.cursor()
for media in chunks(allMedia, 500):
    anchors = (",?" * len(media))[1:]
    sql = f"SELECT * FROM files WHERE path in ({anchors})"
    result = cur.execute(sql, media)
    for row in result.fetchall():
        # 处理每一行
        pass

如果您的allMedia列表可能包含重复项，并且您不需要处理相同的文件两次，则首先从要迭代的集合中消除这些重复项：

for media in chunks(list(set(allMedia)), 500):

英文:

If you are only interested in the records of the table where the file path matches, you could do this with one SQL query -- one that uses the in operator. You need to then prepare a list of ?:

con = lite.connect(DB_PATH)
con.text_factory = str
cur = con.cursor()
anchors = (&quot;,?&quot; * len(allMedia))[1:]
sql = f&quot;SELECT * FROM files WHERE path in ({anchors})&quot;
result = cur.execute(sql, allMedia)

for row in result.fetchall():
    # do something with row
    pass

If you have lists (allMedia) that are longer than Sqlite supports for its in operator, then first split your list into chunks, and perform the above query for each chunk.

> python > # From the linked Q&A > def chunks(lst, n): > """Yield successive n-sized chunks from lst.""" > for i in range(0, len(lst), n): > yield lst[i:i + n] >

con = lite.connect(DB_PATH)
con.text_factory = str
cur = con.cursor()
for media in chunks(allMedia, 500):
    anchors = (&quot;,?&quot; * len(media))[1:]
    sql = f&quot;SELECT * FROM files WHERE path in ({anchors})&quot;
    result = cur.execute(sql, media)
    for row in result.fetchall():
        # do something with row
        pass

If your allMedia lists is likely to have duplicates, and you don't need to process the same file twice, then first eliminate these duplicates from the collection being iterated:

for media in chunks(list(set(allMedia)), 500):

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何优化与SQLite数据库的工作？

问题

答案1

无法使用Flask登录。错误：方法不允许，请求的URL不允许使用该方法。

Python/Pandas. For loop on multiple dataFrames not working correctly.

为什么在scipy.optimize.minimize中约束失败？

XGBoost中的自定义损失未更新。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论