英文:
Multiprocessing Pool - too many file descriptors using Pytorch
问题
你好,我正在尝试使用 multiprocessing pool 在我写的程序中同时玩游戏。
这是我正在做的代码片段:
self_play_pool = get_context("spawn").Pool(processes=2)
results = [self_play_pool.apply_async(self.play_game, (False,)) for _ in range(200)]
for res in results:
replay_buffer.save_game(res.get())
self_play_pool.terminate()
我尝试过在第三行使用 ".close()" 方法和不使用它。我还尝试过 concurrent.futures 池,但出现了类似的错误。
发生的情况是,大约在100个游戏后,程序崩溃,出现"broken pipe"错误或"too many files open"错误,例如:
BrokenPipeError: [Errno 32] Broken pipe
根据我在网上找到的信息,似乎是当进程太多时会发生这种情况,因为每个进程都需要一个文件描述符... 但我只有同时运行2个进程。我还尝试使用 "with" 上下文管理器,但结果相同。
感谢任何帮助。谢谢。
英文:
Hello I'm trying to use the multiprocessing pool to try and play games simultaneously in a program I'm writing.
Here is a code snipet of what i'm doing:
self_play_pool = get_context("spawn").Pool(processes=2)
results = [self_play_pool.apply_async(self.play_game, (False,)) for _ in range(200)]
#self_play_pool.close()
for res in results:
replay_buffer.save_game(res.get())
self_play_pool.terminate()
I have tried running it with the ".close()" method on the third line and without it. I also tried the concurrent.futures pool but it gave me similar errors.
What happens is that after roughly 100 games it crashes giving me either a "broken pipe" error, or a "too many files open" error, like this one:
BrokenPipeError: [Errno 32] Broken pipe
For what I was able to gather online, this seems to happen when there are too many processes running since each requires a file descriptor... However I only have 2 processes running simultaneasly.
I also tried to use the "with" context manager but the same happened.
Any help would be apreciated.
Thank you.
答案1
得分: 1
我找到了解决方法,如果有人看到这篇帖子并想知道... 如果您使用函数 torch.multiprocessing.set_sharing_strategy('file_system')
更改了pytorch的共享策略,它就不会创建额外的文件描述符。非常重要的是,您必须在父进程内和每个子进程内都运行该函数。我只需将它放在我的 play_game
函数的第一行,但我认为您可以使用 multiprocessing.pool
的 "initializer" 参数来实现相同的效果。
英文:
I found the solution for my problem, if anyone finds this post and wants to know... If you change the pytorch's sharing strategy with the function torch.multiprocessing.set_sharing_strategy('file_system')
, it won't create extra file descriptors.
It is very important to note that you must run the function both inside the father process and inside each child. I did it by simply putting it as the first line of my play_game
function, but I think you can achieve the same with the multiprocessing.pool
"initializer" argument.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论