从另一个文件中的函数内调用并发 futures。

huangapple go评论66阅读模式
英文:

call concurrent futures from inside a function in another file

问题

如何在被导入的另一个文件中的函数内使用并行计算?

这里有一个示例,我创建了一个名为scratch_tests.py的文件:

import concurrent.futures

def g():
    print(__name__)

    global f
    def f(x):
        return x**2

    if __name__=='scratch_tests':
        with concurrent.futures.ProcessPoolExecutor() as executor:
            t = executor.submit(f, 2)
        return t.result()

在另一个文件中,我执行了以下代码:

from scratch_tests import g

g()

它出现了预期的错误。在原则上,如何处理这种需要在某个文件中的函数内进行并发计算的情况?

BrokenProcessPool: 进程池中的一个进程在未来正在运行或挂起时意外终止

英文:

How can I use parallel computing sitting inside a function in another file that is being imported?

Here as an example I created a file scratch_tests.py

import concurrent.futures

def g():
    print(__name__)

    global f
    def f(x):
        return x**2

    if __name__=='scratch_tests':
        with concurrent.futures.ProcessPoolExecutor() as executor:
            t = executor.submit(f, 2)
        return t.result()

In another file I execute these lines:

from scratch_tests import g

g()

It exectedly errors out. How in principle one handles this type of cases where the concurrent computations need to happen inside a function residing in some file?

> BrokenProcessPool: A process in the process pool was terminated
> abruptly while the future was running or pending

答案1

得分: 1

您没有指定您正在运行的平台,但考虑到您遇到的问题,似乎可能是使用“spawn”方法创建子进程的平台。我将尝试基于这一假设提供答案。

正如Timus所提到的,您需要通过将函数“f”移动到全局范围使其成为全局函数,将其声明为全局函数但保留在函数“g”中并不会起到任何作用。但您还有另一个重大问题:

在使用“spawn”时初始化子进程的方式是在子进程中启动一个新的Python解释器。这个新进程的地址空间最初未初始化,即它不会继承父进程的内存副本。因此,Python解释器会重新处理所有Python模块,执行全局范围内的所有定义,包括函数定义和子进程需要的任何其他全局数据。您可以使用当前模块名称的测试条件ally执行全局范围内的代码。但是,这似乎只有在将其放在主脚本中执行的情况下才有意义,如果它作为其初始化的一部分在子进程中执行,则其“name”值将发生变化。对于其他模块,__name__变量是不变的,那么测试它有什么意义呢?

让我详细说明一下:

主脚本的模块名称在父进程中为'main',在任何子进程中为'mp_main'。您的scratch_tests.py文件的模块名称始终为'scratch_tests',无论是在主进程还是子进程中。因此,您在函数'g'中的测试if name == 'scratch_tests':将始终评估为True,并且不起作用。当从主脚本最初调用'g'时,它将创建池并提交一个任务给池,从而创建一些池子子进程。对于每个这些进程,您的主脚本将被重新解释,并且'g'将被递归调用,导致致命错误。因此:

  1. 从'g'中删除'name'测试,因为它没有作用。
  2. 在主脚本中添加以下'name'测试:
if __name__ == '__main__':
    from scratch_tests import g

    g()

这将防止'g'被递归调用。

总结

您必须在主脚本中包含if name == 'main':测试,围绕任何直接创建子进程或通过执行创建子进程的代码的全局代码。

因此,如果在主脚本中有一个创建子进程的函数'foobar',不要将名称测试放在'foobar'中;而是将其放在全局范围的代码周围,该代码直接或间接导致'foobar'被调用。您还可以在主脚本中的全局范围内包含名称测试,这些名称不需要被子进程使用,以提高效率:

if __name__ == '__main__':
    # 此定义不被工作进程(即子进程)引用:
    my_list = [一些占用大量内存的大型列表]

def worker(value):
    ....

def foobar():
    from multiprocessing import Pool

    with Pool() as pool:
        pool.map(worker, my_list)

def main():
    ...
    foobar()
    ...

# 测试应放在此处:
if __name__ == '__main__':
    main()
英文:

You did not specify the platform you are running on but given the issue you are having it seems likely that it is one that uses the spawn method for creating child processes. I will attempt to provide an answer based on that assumption.

As mentioned by Timus, you need to make function f a global function by moving it to global scope; declaring it as global but keeping it where it is as a function nested within g accomplished nothing. But you have another, major issue:

The way child processes are initialized when spawn is being used is that a new Python interpreter is launched in the child process. The address space for this new process is initially uninitialized, i.e. it does not inherit a copy of the parent's memory. Thus the Python interpreter re-processes all the Python modules executing all definitions at the global scope, which includes the function definitions and any other global data you will need by your child process. You can surround code at the global scope (or at any level but it is usually pointless to do this for any level except the global scope) with a test of the current module name to conditionally execute that code. This, however, only seems to make sense if you do this in the main script whose __name__ value will change if it is being executed in a child process as part of its initialization. For other modules, the __name__ variable is invariant, so what is the point of testing it?

Let me elaborate:

The module name of the main script has name '__main__' in the parent process and name '__mp_main__' in any child process. The module name for your scratch_tests.py file will always be 'scratch_tests' regardless if is in the main or child process. Therefore, the test you have in function g, namely if __name__ == 'scratch_tests': will always evaluate to True regardless and accomplishes nothing. When g is initially called from the main script it will create the pool and submit a task to the pool creating some number of pool child processes. For each of these processes your main script will be re-interpreted and g will be recursively called again leading to a catastrophic error. Therefore:

  1. Remove the __name__ test from g since it accomplishes nothing.
  2. Add the following test for __name__ in your main script:
if __name__ == '__main__':
    from scratch_tests import g

    g()

This will prevent g from being called recursively.

Summary

You must include a if __name__ == '__main__': test in your main script around any global code that directly creates a child process or indirectly creates a child process by executing code that creates a child process.

Consequently, if you have in your main script a function foobar that creates a child process, don't put the name test in foobar; put it around code at global scope that directly or indirectly results in foobar being called. You can also include the name test around any definitions at global scope in the main script that are not required by child processes as a matter of efficiency:

if __name__ == '__main__':
    # This definition is not referenced by worker (i.e. child processes):
    my_list = [some very large list that takes a lot of memory]

def worker(value):
    ....

def foobar():
    from multiprocessing import Pool

    with Pool() as pool:
        pool.map(worker, my_list)

def main():
    ...
    foobar()
    ...

# This is where the test should be:
if __name__ == '__main__':
    main()

huangapple
  • 本文由 发表于 2023年5月25日 04:28:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76327196.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定