“Race Condition with Thread in Python” remains the same in Chinese.

huangapple go评论107阅读模式
英文:

Race Condition with Thread in Python

问题

我使用backupFile函数备份文件,将备份的文件通过散列添加到hashList中,备份其他文件时,通过查看hashList来检查它们是否已备份。我可以使用threadqueue同时备份多个文件,但由于多个线程在同一个hashList上进行处理,导致竞争条件错误。我使用锁来解决这个问题,但使用lock = threading.Lock()会阻止并行性。当一个线程运行时,其他线程都在等待,这使得使用线程的目的没有意义,因为我使用线程的目的是为了节省时间。

我想既使用线程又检查文件是否已备份。

可能要求有点多,但我需要您的想法,谢谢。

我的代码如下:(已删除不相关的部分)

  1. import threading, hashlib, queue, os
  2. # 其他部分...
英文:

I back up files with the backupFile function, I add the backed up files to the hashList by hashing them, and when backing up other files, I check whether they have been backed up before by looking at the hashList. I can backup multiple files at the same time using thread and queue, but I get a race condition error because more than one thread is processing on the same hashList. I used a lock to solve this, but using lock = threading.Lock() prevents parallelism. While a thread is running, other threads are waiting. which makes my purpose of using threads meaningless. Because my purpose of using threads was to save time.

I want to both use the thread and check if the file has been backed up before.

I may be asking a lot but I need your ideas, thanks

my code;

  1. import threading, hashlib, queue, os
  2. def hashFile(fileName):
  3. with open(fileName, "rb") as f:
  4. sha256 = hashlib.sha256()
  5. while chunk := f.read(4096):
  6. sha256.update(chunk)
  7. return sha256.hexdigest()
  8. def backupFile(q):
  9. while not q.empty():
  10. fileName = q.get()
  11. with lock:
  12. if hashFile(filesToBackupPath+fileName) in hashList:
  13. print(f"3[33m{fileName} daha once yedeklenmis3[0m")
  14. else:
  15. print(f"3[32m{fileName} yedeklendi3[0m")
  16. hashList.append(hashFile(filesToBackupPath+fileName))
  17. q.task_done()
  18. filesToBackupPath = "yedeklenecekDosyalar/"
  19. fileList = os.listdir(filesToBackupPath)
  20. hashList = []
  21. q = queue.Queue()
  22. for file in fileList:
  23. q.put(file)
  24. lock = threading.Lock()
  25. for i in range(20):
  26. t = threading.Thread(target=backupFile, args=(q,))
  27. t.start()
  28. q.join()
  29. print('\n',len(hashList))

答案1

得分: 1

不需要锁定对 hashfile 的调用。

  1. hash = hashFile(filesToBackupPath+fileName)
  2. with lock:
  3. if hash in hashList:
  4. alreadyBackedUp = True
  5. else:
  6. alreadyBackedUp = False
  7. hashList.append(hash)
  8. 锁定仅在访问 `hashList` 时才需要为什么使用列表而不是集合
英文:

There is no reason for you to be locking the call to hashfile.

  1. hash = hashFile(filesToBackupPath+fileName)
  2. with lock:
  3. if hash in hashList:
  4. alreadyBackedUp = True
  5. else:
  6. alreadyBackedUp = False
  7. hashList.append(hash)
  8. Everything else outside the lock.

The only place you need to lock in when accessing hashList.
Why are you using a list rather than set?

huangapple
  • 本文由 发表于 2023年4月13日 23:10:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76007071.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定