多进程池 – 使用Pytorch时文件描述符过多

huangapple go评论69阅读模式
英文:

Multiprocessing Pool - too many file descriptors using Pytorch

问题

你好,我正在尝试使用 multiprocessing pool 在我写的程序中同时玩游戏。

这是我正在做的代码片段:

self_play_pool = get_context("spawn").Pool(processes=2)
results = [self_play_pool.apply_async(self.play_game, (False,)) for _ in range(200)]

for res in results:
    replay_buffer.save_game(res.get())

self_play_pool.terminate()

我尝试过在第三行使用 ".close()" 方法和不使用它。我还尝试过 concurrent.futures 池,但出现了类似的错误。

发生的情况是,大约在100个游戏后,程序崩溃,出现"broken pipe"错误或"too many files open"错误,例如:
BrokenPipeError: [Errno 32] Broken pipe

根据我在网上找到的信息,似乎是当进程太多时会发生这种情况,因为每个进程都需要一个文件描述符... 但我只有同时运行2个进程。我还尝试使用 "with" 上下文管理器,但结果相同。

感谢任何帮助。谢谢。

英文:

Hello I'm trying to use the multiprocessing pool to try and play games simultaneously in a program I'm writing.

Here is a code snipet of what i'm doing:

self_play_pool = get_context("spawn").Pool(processes=2)
results = [self_play_pool.apply_async(self.play_game, (False,)) for _ in range(200)]
#self_play_pool.close()
				
for res in results:
     replay_buffer.save_game(res.get())


self_play_pool.terminate()

I have tried running it with the ".close()" method on the third line and without it. I also tried the concurrent.futures pool but it gave me similar errors.

What happens is that after roughly 100 games it crashes giving me either a "broken pipe" error, or a "too many files open" error, like this one:
BrokenPipeError: [Errno 32] Broken pipe

For what I was able to gather online, this seems to happen when there are too many processes running since each requires a file descriptor... However I only have 2 processes running simultaneasly.
I also tried to use the "with" context manager but the same happened.

Any help would be apreciated.
Thank you.

答案1

得分: 1

我找到了解决方法,如果有人看到这篇帖子并想知道... 如果您使用函数 torch.multiprocessing.set_sharing_strategy('file_system') 更改了pytorch的共享策略,它就不会创建额外的文件描述符。非常重要的是,您必须在父进程内和每个子进程内都运行该函数。我只需将它放在我的 play_game 函数的第一行,但我认为您可以使用 multiprocessing.pool 的 "initializer" 参数来实现相同的效果。

英文:

I found the solution for my problem, if anyone finds this post and wants to know... If you change the pytorch's sharing strategy with the function torch.multiprocessing.set_sharing_strategy('file_system'), it won't create extra file descriptors.
It is very important to note that you must run the function both inside the father process and inside each child. I did it by simply putting it as the first line of my play_game function, but I think you can achieve the same with the multiprocessing.pool "initializer" argument.

huangapple
  • 本文由 发表于 2023年5月22日 00:47:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76300957.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定