“Race Condition with Thread in Python” remains the same in Chinese.

huangapple go评论78阅读模式
英文:

Race Condition with Thread in Python

问题

我使用backupFile函数备份文件,将备份的文件通过散列添加到hashList中,备份其他文件时,通过查看hashList来检查它们是否已备份。我可以使用threadqueue同时备份多个文件,但由于多个线程在同一个hashList上进行处理,导致竞争条件错误。我使用锁来解决这个问题,但使用lock = threading.Lock()会阻止并行性。当一个线程运行时,其他线程都在等待,这使得使用线程的目的没有意义,因为我使用线程的目的是为了节省时间。

我想既使用线程又检查文件是否已备份。

可能要求有点多,但我需要您的想法,谢谢。

我的代码如下:(已删除不相关的部分)

import threading, hashlib, queue, os

# 其他部分...
英文:

I back up files with the backupFile function, I add the backed up files to the hashList by hashing them, and when backing up other files, I check whether they have been backed up before by looking at the hashList. I can backup multiple files at the same time using thread and queue, but I get a race condition error because more than one thread is processing on the same hashList. I used a lock to solve this, but using lock = threading.Lock() prevents parallelism. While a thread is running, other threads are waiting. which makes my purpose of using threads meaningless. Because my purpose of using threads was to save time.

I want to both use the thread and check if the file has been backed up before.

I may be asking a lot but I need your ideas, thanks

my code;

import threading, hashlib, queue, os


def hashFile(fileName):
    with open(fileName, "rb") as f:
        sha256 = hashlib.sha256()
        while chunk := f.read(4096):
            sha256.update(chunk)
        return sha256.hexdigest()


def backupFile(q):
    while not q.empty():
        fileName = q.get()

        with lock:
            if hashFile(filesToBackupPath+fileName) in hashList:
                print(f"3[33m{fileName} daha once yedeklenmis3[0m")
            else:
                print(f"3[32m{fileName} yedeklendi3[0m")
                hashList.append(hashFile(filesToBackupPath+fileName))

        q.task_done()


filesToBackupPath = "yedeklenecekDosyalar/"
fileList = os.listdir(filesToBackupPath)
hashList = []

q = queue.Queue()

for file in fileList:
    q.put(file)

lock = threading.Lock()

for i in range(20):
    t = threading.Thread(target=backupFile, args=(q,))
    t.start()

q.join()

print('\n',len(hashList))

答案1

得分: 1

不需要锁定对 hashfile 的调用。

hash = hashFile(filesToBackupPath+fileName)

with lock:
    if hash in hashList:
         alreadyBackedUp = True
    else:
         alreadyBackedUp = False
         hashList.append(hash)

锁定仅在访问 `hashList` 时才需要为什么使用列表而不是集合
英文:

There is no reason for you to be locking the call to hashfile.

hash = hashFile(filesToBackupPath+fileName)

with lock:
    if hash in hashList:
         alreadyBackedUp = True
    else:
         alreadyBackedUp = False
         hashList.append(hash)

Everything else outside the lock.

The only place you need to lock in when accessing hashList.
Why are you using a list rather than set?

huangapple
  • 本文由 发表于 2023年4月13日 23:10:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76007071.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定