multiprocessing为每个线程创建基本变量的克隆。

huangapple go评论57阅读模式
英文:

why does multiprocessing create a clone for base variables for each thread

问题

from multiprocessing import Pool

number_of_doe_jobs = 0

def thefunction():
 global number_of_doe_jobs
 # JOB CODE GOES HERE
 number_of_doe_jobs+=1

if __name__ =="__main__":
    p = Pool(3)
    p.map(checker, datalist)
英文:

So I'm using multiprocessing pool with 3 threads, to run a function that does a certain job, and I have a variable defined outside this function which equals 0, and every time the function do it job it should add 1 to that variable and print it, but every thread uses a separated variable

here is the code:

from multiprocessing import Pool

number_of_doe_jobs = 0

def thefunction():
 global number_of_doe_jobs
 # JOB CODE GOES HERE
 number_of_doe_jobs+=1

if __name__ =="__main__":
    p = Pool(3)
    p.map(checker, datalist)

the desired output is that it adds 1 to number_of_doe_jobs ,
but every thread add 1 to it own number_of_doe_jobs , so there are 3 number_of_doe_jobs variables now.

答案1

得分: 1

你没有创建3个线程。你创建了3个进程。每个进程都有自己的内存空间,有自己的解释器副本和独立的对象空间。全局变量不在进程之间共享。有方法可以创建共享变量(通过套接字通信),但也许最好的方法是使用multiprocessing.Queue。在主线程代码中创建它,并将其作为参数传递给子进程。让任务在队列上推送一个"完成"标志,然后主线程代码读取结果。

后续

任务的数量始终等于len(datalist),因此不清楚为什么要跟踪它。在这里,我创建了一个多进程队列并将其传递给函数。Python通过创建套接字来实现这一点。检查函数在完成时发送一个信号,主线程代码获取每个信号并打印它。q.get将阻塞,直到队列中有内容。

import multiprocessing

def checker(q):
    # 任务代码放在这里
    q.put("完成")

if __name__ == "__main__":
    q = multiprocessing.Queue()
    p = Pool(3)
    p.map(lambda: checker(q), datalist)

    for _ in datalist:
        print(q.get())
英文:

You are not spawning 3 threads. You are spawning 3 processes. Each process has its own memory space, with its own copy of the interpreter and its own independent object space. Global variables are not shared across processes. There are ways to create shared variables (which communicate over sockets), but you might be better served by using a multiprocessing.Queue. Create it in the mainline code, and pass it as a parameter to the subprocesses. Have the jobs push a "complete" flag on the queue, and have the mainline code read the results.

FOLLOWUP

The NUMBER of jobs will always be equal to len(datalist), so it's not clear why you would track that. Here, I create a multiprocessing queue and pass that to the function. Python implements that by creating a socket. The checker function sends a signal when it finishes, and the mainline code fetches each one and prints it. q.get will block until something is in the queue.

import multiprocessing

def checker(q):
    # JOB CODE GOES HERE
    q.put( "done" )

if __name__ =="__main__":
    q = multiprocessing.Queue()
    p = Pool(3)
    p.map(lambda: checker(q), datalist)

    for _ in datalist:
        print( q.get() )

huangapple
  • 本文由 发表于 2023年2月10日 03:19:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/75403435.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定