英文:
Race Condition with Thread in Python
问题
我使用backupFile
函数备份文件,将备份的文件通过散列添加到hashList
中,备份其他文件时,通过查看hashList
来检查它们是否已备份。我可以使用thread
和queue
同时备份多个文件,但由于多个线程在同一个hashList
上进行处理,导致竞争条件
错误。我使用锁来解决这个问题,但使用lock = threading.Lock()
会阻止并行性。当一个线程运行时,其他线程都在等待,这使得使用线程的目的没有意义,因为我使用线程的目的是为了节省时间。
我想既使用线程又检查文件是否已备份。
可能要求有点多,但我需要您的想法,谢谢。
我的代码如下:(已删除不相关的部分)
import threading, hashlib, queue, os
# 其他部分...
英文:
I back up files with the backupFile
function, I add the backed up files to the hashList
by hashing them, and when backing up other files, I check whether they have been backed up before by looking at the hashList
. I can backup multiple files at the same time using thread
and queue
, but I get a race condition
error because more than one thread is processing on the same hashList
. I used a lock to solve this, but using lock = threading.Lock()
prevents parallelism. While a thread is running, other threads are waiting. which makes my purpose of using threads meaningless. Because my purpose of using threads was to save time.
I want to both use the thread and check if the file has been backed up before.
I may be asking a lot but I need your ideas, thanks
my code;
import threading, hashlib, queue, os
def hashFile(fileName):
with open(fileName, "rb") as f:
sha256 = hashlib.sha256()
while chunk := f.read(4096):
sha256.update(chunk)
return sha256.hexdigest()
def backupFile(q):
while not q.empty():
fileName = q.get()
with lock:
if hashFile(filesToBackupPath+fileName) in hashList:
print(f"3[33m{fileName} daha once yedeklenmis3[0m")
else:
print(f"3[32m{fileName} yedeklendi3[0m")
hashList.append(hashFile(filesToBackupPath+fileName))
q.task_done()
filesToBackupPath = "yedeklenecekDosyalar/"
fileList = os.listdir(filesToBackupPath)
hashList = []
q = queue.Queue()
for file in fileList:
q.put(file)
lock = threading.Lock()
for i in range(20):
t = threading.Thread(target=backupFile, args=(q,))
t.start()
q.join()
print('\n',len(hashList))
答案1
得分: 1
不需要锁定对 hashfile
的调用。
hash = hashFile(filesToBackupPath+fileName)
with lock:
if hash in hashList:
alreadyBackedUp = True
else:
alreadyBackedUp = False
hashList.append(hash)
锁定仅在访问 `hashList` 时才需要。为什么使用列表而不是集合?
英文:
There is no reason for you to be locking the call to hashfile
.
hash = hashFile(filesToBackupPath+fileName)
with lock:
if hash in hashList:
alreadyBackedUp = True
else:
alreadyBackedUp = False
hashList.append(hash)
Everything else outside the lock.
The only place you need to lock in when accessing hashList
.
Why are you using a list rather than set?
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论