英文:
Python multiprocessing with function of functions
问题
我想使用Python的multiprocessing模块来并行化一个定制的函数,该函数本身包含其他定制的函数(请参阅下面的代码)。假设我有100个元素,我想每个批次计算10个元素,那么我将使用10个进程。然而,我注意到multiprocessing首先对这100个元素计算my_subfunction1
,然后再传递给my_subfunction2
,以及随后的函数。
为什么会发生这种情况,有没有办法阻止这种行为?我希望每个批次分别计算my_subfunction1
、2
和3
中的10个元素。
非常感谢您的帮助。
import multiprocessing as mp
with mp.get_context('spawn').Pool(processes=10) as pool:
try:
res = pool.starmap(my_main_function, args)
except Exception as e:
pool.terminate()
raise e
def my_main_function(my_args):
res1 = my_subfunction1(arg1)
res2 = my_subfunction2(res1)
res3 = my_subfunction3(res2)
return res3
英文:
I want to use multiprocessing python module to parallelize a customized function, which itself contains other customized functions (see the code below). Let's supose I have 100 elements and I want to compute 10 each batch, then I will use 10 processes. However I notice that the multiprocessing first calculate my_subfunction1
for the 100 elements before passing to my_subfunction2
, and subsequently.
Why is this happening and is there any way to prevent this behaviour? I would like to calculate my_subfunction1, 2, and 3
for 10 elements each batch.
Thanks so much in advance for the help.
import multiprocessing as mp
with mp.get_context('spawn').Pool(processes = 10) as pool:
try:
res = pool.starmap(my_main_function, args)
except Exception as e:
pool.terminate()
raise e
def my_main_function(my_args):
res1 = my_subfunction1(arg1)
res2 = my_subfunction2(res1)
res3 = my_subfunction3(res2)
return res3
答案1
得分: 1
你需要使用Python的multiprocessing库中的Pool对象。Pool对象:
提供了一种便捷的方式来并行执行函数,跨多个输入值分发输入数据到不同的进程中(数据并行)。
import multiprocessing as mp
def my_main_function(arg):
res1 = (lambda x: x + 1)(arg)
res2 = (lambda x: x * 2)(res1)
res3 = (lambda x: x ** 2)(res2)
return res3
if __name__ == '__main__':
args = range(100)
with mp.Pool(processes=10) as pool:
res = pool.map(my_main_function, args)
print(res)
结果:
[4, 16, 36, 64, 100, 144, 196, 256, 324, 400, 484, 576, 676, 784, 900, 1024, 1156, 1296, 1444, 1600, 1764, 1936, 2116, 2304, 2500, 2704, 2916, 3136, 3364, 3600, 3844, 4096, 4356, 4624, 4900, 5184, 5476, 5776, 6084, 6400, 6724, 7056, 7396, 7744, 8100, 8464, 8836, 9216, 9604, 10000, 10404, 10816, 11236, 11664, 12100, 12544, 12996, 13456, 13924, 14400, 14884, 15376, 15876, 16384, 16900, 17424, 17956, 18496, 19044, 19600, 20164, 20736, 21316, 21904, 22500, 23104, 23716, 24336, 24964, 25600, 26244, 26896, 27556, 28224, 28900, 29584, 30276, 30976, 31684, 32400, 33124, 33856, 34596, 35344, 36100, 36864, 37636, 38416, 39204, 40000]
英文:
You need Python's Pool object from the multiprocessing library. The Pool object:
> offers a convenient means of parallelizing the execution of a function
> across multiple input values, distributing the input data across
> processes (data parallelism).
import multiprocessing as mp
def my_main_function(arg):
res1 = (lambda x: x + 1)(arg)
res2 = (lambda x: x * 2)(res1)
res3 = (lambda x: x ** 2)(res2)
return res3
if __name__ == '__main__':
args = range(100)
with mp.Pool(processes=10) as pool:
res = pool.map(my_main_function, args)
print(res)
Result:
[4, 16, 36, 64, 100, 144, 196, 256, 324, 400, 484, 576, 676, 784, 900, 1024, 1156, 1296, 1444, 1600, 1764, 1936, 2116, 2304, 2500, 2704, 2916, 3136, 3364, 3600, 3844, 4096, 4356, 4624, 4900, 5184, 5476, 5776, 6084, 6400, 6724, 7056, 7396, 7744, 8100, 8464, 8836, 9216, 9604, 10000, 10404, 10816, 11236, 11664, 12100, 12544, 12996, 13456, 13924, 14400, 14884, 15376, 15876, 16384, 16900, 17424, 17956, 18496, 19044, 19600, 20164, 20736, 21316, 21904, 22500, 23104, 23716, 24336, 24964, 25600, 26244, 26896, 27556, 28224, 28900, 29584, 30276, 30976, 31684, 32400, 33124, 33856, 34596, 35344, 36100, 36864, 37636, 38416, 39204, 40000]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论