从另一个文件中的函数内调用并发 futures。

call concurrent futures from inside a function in another file




from scratch_tests import g



BrokenProcessPool: 进程池中的一个进程在未来正在运行或挂起时意外终止


How can I use parallel computing sitting inside a function in another file that is being imported?

Here as an example I created a file scratch_tests.py

In another file I execute these lines:

from scratch_tests import g


It exectedly errors out. How in principle one handles this type of cases where the concurrent computations need to happen inside a function residing in some file?

> BrokenProcessPool: A process in the process pool was terminated
> abruptly while the future was running or pending


得分: 1





主脚本的模块名称在父进程中为'main',在任何子进程中为'mp_main'。您的scratch_tests.py文件的模块名称始终为'scratch_tests',无论是在主进程还是子进程中。因此,您在函数'g'中的测试if name == 'scratch_tests':将始终评估为True,并且不起作用。当从主脚本最初调用'g'时,它将创建池并提交一个任务给池,从而创建一些池子子进程。对于每个这些进程,您的主脚本将被重新解释,并且'g'将被递归调用,导致致命错误。因此:

  1. 从'g'中删除'name'测试,因为它没有作用。
  2. 在主脚本中添加以下'name'测试:
if __name__ == '__main__':
    from scratch_tests import g




您必须在主脚本中包含if name == 'main':测试,围绕任何直接创建子进程或通过执行创建子进程的代码的全局代码。


if __name__ == '__main__':
    # 此定义不被工作进程(即子进程)引用:
    my_list = [一些占用大量内存的大型列表]

def worker(value):

def foobar():
    from multiprocessing import Pool

    with Pool() as pool:
        pool.map(worker, my_list)

def main():

# 测试应放在此处:
if __name__ == '__main__':

You did not specify the platform you are running on but given the issue you are having it seems likely that it is one that uses the spawn method for creating child processes. I will attempt to provide an answer based on that assumption.

As mentioned by Timus, you need to make function f a global function by moving it to global scope; declaring it as global but keeping it where it is as a function nested within g accomplished nothing. But you have another, major issue:

The way child processes are initialized when spawn is being used is that a new Python interpreter is launched in the child process. The address space for this new process is initially uninitialized, i.e. it does not inherit a copy of the parent's memory. Thus the Python interpreter re-processes all the Python modules executing all definitions at the global scope, which includes the function definitions and any other global data you will need by your child process. You can surround code at the global scope (or at any level but it is usually pointless to do this for any level except the global scope) with a test of the current module name to conditionally execute that code. This, however, only seems to make sense if you do this in the main script whose __name__ value will change if it is being executed in a child process as part of its initialization. For other modules, the __name__ variable is invariant, so what is the point of testing it?

Let me elaborate:

The module name of the main script has name '__main__' in the parent process and name '__mp_main__' in any child process. The module name for your scratch_tests.py file will always be 'scratch_tests' regardless if is in the main or child process. Therefore, the test you have in function g, namely if __name__ == 'scratch_tests': will always evaluate to True regardless and accomplishes nothing. When g is initially called from the main script it will create the pool and submit a task to the pool creating some number of pool child processes. For each of these processes your main script will be re-interpreted and g will be recursively called again leading to a catastrophic error. Therefore:

  1. Remove the __name__ test from g since it accomplishes nothing.
  2. Add the following test for __name__ in your main script:
if __name__ == '__main__':
    from scratch_tests import g


This will prevent g from being called recursively.


You must include a if __name__ == '__main__': test in your main script around any global code that directly creates a child process or indirectly creates a child process by executing code that creates a child process.

Consequently, if you have in your main script a function foobar that creates a child process, don't put the name test in foobar; put it around code at global scope that directly or indirectly results in foobar being called. You can also include the name test around any definitions at global scope in the main script that are not required by child processes as a matter of efficiency:

if __name__ == '__main__':
    # This definition is not referenced by worker (i.e. child processes):
    my_list = [some very large list that takes a lot of memory]

def worker(value):

def foobar():
    from multiprocessing import Pool

    with Pool() as pool:
        pool.map(worker, my_list)

def main():

# This is where the test should be:
if __name__ == '__main__':

