2023年3月8日 16:42:48go评论99阅读模式

英文:

How to execute several python scripts in parallel that use 2 GPUs and avoid cuda out of memory?

问题

我有一个名为generateData.sh的bash文件，其中包含数百个Python脚本，例如：

python create_video.py input output --param1 --param2 --param3

每个Python脚本处理不同的输入视频，并在过程中使用我机器上的两个可用GPU之一（用于某些计算机视觉任务）。

我尝试使用GNU parallel（或xargs或&）并行处理bash文件，例如：

cat generateData.sh | parallel

这使我能够并行处理48个Python脚本。然而，由于GPU上的有限空间，只有其中的10个能够正确完成并生成好的输出视频。其他输入视频根本没有被处理，可能是因为遇到了一些cuda内存不足的错误。

我希望GPU并行处理等待某些作业完成，以便在GPU上有一些空闲空间。否则，GPU并行处理将只能处理我的bash文件的一部分。

谢谢你的回答！

英文:

I have a bash file (generateData.sh) that contains hundred of python scripts such as :

python create_video.py input output --param1 --param2 --param3

Each python script processes a different input video and uses in the process one of the two available GPU on my machine (for some computer vision tasks).

I tried to parallelize the bash file using GNU parallel (or xargs or &) with :

cat generateData.sh | parallel

This allows me to parallelize 48 python scripts. However, because of the limited space on GPU, only 10 of them correctly finish with a good output video. The other input videos are not handle at all, probably because it encountered some cuda out-of-memory errors.

I would like GPU parallel to wait that some jobs finish to have some free space on GPUs. Otherwise, GPU parallel will only process correctly a sub-part of my bash file.

Thanks for you answers !

答案1

得分: 1

如下所示：

cat python-lines.sh |
  parallel -j48 CUDA_VISIBLE_DEVICES='{%2}' {=uq=}

以在2个CUDA设备上每个GPU运行24个作业。

英文:

Something like this:

cat python-lines.sh |
  parallel -j48 CUDA_VISIBLE_DEVICES=&#39;$(({%} % 2))&#39; {=uq=}

to run 24 jobs per GPU on 2 CUDA devices.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何并行执行多个使用2个GPU的Python脚本，并避免CUDA内存不足问题？

问题

答案1

设置每个组的最后一行的列为前一行的列。

找到最新数据框中已更改的行。

如何在Python中将图像复制到剪贴板？

在Pandas列中替换数值，使其与另一列中所有唯一值相同。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论