英文:
why does multiprocessing create a clone for base variables for each thread
问题
from multiprocessing import Pool
number_of_doe_jobs = 0
def thefunction():
global number_of_doe_jobs
# JOB CODE GOES HERE
number_of_doe_jobs+=1
if __name__ =="__main__":
p = Pool(3)
p.map(checker, datalist)
英文:
So I'm using multiprocessing pool with 3 threads, to run a function that does a certain job, and I have a variable defined outside this function which equals 0, and every time the function do it job it should add 1 to that variable and print it, but every thread uses a separated variable
here is the code:
from multiprocessing import Pool
number_of_doe_jobs = 0
def thefunction():
global number_of_doe_jobs
# JOB CODE GOES HERE
number_of_doe_jobs+=1
if __name__ =="__main__":
p = Pool(3)
p.map(checker, datalist)
the desired output is that it adds 1 to number_of_doe_jobs ,
but every thread add 1 to it own number_of_doe_jobs , so there are 3 number_of_doe_jobs variables now.
答案1
得分: 1
你没有创建3个线程。你创建了3个进程。每个进程都有自己的内存空间,有自己的解释器副本和独立的对象空间。全局变量不在进程之间共享。有方法可以创建共享变量(通过套接字通信),但也许最好的方法是使用multiprocessing.Queue
。在主线程代码中创建它,并将其作为参数传递给子进程。让任务在队列上推送一个"完成"标志,然后主线程代码读取结果。
后续
任务的数量始终等于len(datalist)
,因此不清楚为什么要跟踪它。在这里,我创建了一个多进程队列并将其传递给函数。Python通过创建套接字来实现这一点。检查函数在完成时发送一个信号,主线程代码获取每个信号并打印它。q.get
将阻塞,直到队列中有内容。
import multiprocessing
def checker(q):
# 任务代码放在这里
q.put("完成")
if __name__ == "__main__":
q = multiprocessing.Queue()
p = Pool(3)
p.map(lambda: checker(q), datalist)
for _ in datalist:
print(q.get())
英文:
You are not spawning 3 threads. You are spawning 3 processes. Each process has its own memory space, with its own copy of the interpreter and its own independent object space. Global variables are not shared across processes. There are ways to create shared variables (which communicate over sockets), but you might be better served by using a multiprocessing.Queue
. Create it in the mainline code, and pass it as a parameter to the subprocesses. Have the jobs push a "complete" flag on the queue, and have the mainline code read the results.
FOLLOWUP
The NUMBER of jobs will always be equal to len(datalist)
, so it's not clear why you would track that. Here, I create a multiprocessing queue and pass that to the function. Python implements that by creating a socket. The checker function sends a signal when it finishes, and the mainline code fetches each one and prints it. q.get
will block until something is in the queue.
import multiprocessing
def checker(q):
# JOB CODE GOES HERE
q.put( "done" )
if __name__ =="__main__":
q = multiprocessing.Queue()
p = Pool(3)
p.map(lambda: checker(q), datalist)
for _ in datalist:
print( q.get() )
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论