2023年3月21日 00:39:36go评论95阅读模式

英文:

Tensorflow using all of one GPU but little of the other

问题

在使用Tensorflow目标检测API时，遇到了使用NVIDIA 3080（10GB）出现OOM错误，于是购买了4090（24GB）。目前我同时在两张卡上运行，但我注意到在高批次大小运行时，几乎用尽了3080，但4090的使用量不同。理想情况下，我想要同时充分利用两张卡，以尽可能提高批次大小。我似乎找不到一种方法来更改策略，以便GPU可以承受不同的负载。镜像策略似乎在每个拆分期间为每个GPU提供相同数量的数据。是否有一种方法可以让一张GPU的负载更高，而另一张负载更低？

我的计算机和环境规格如下：

操作系统 = Ubuntu 22.04
GPU：[0: 4090, 1: 3080]
Python版本：3.10.9
CUDAToolkit版本：11.2.2（通过Anaconda安装）
CuDNN版本：8.1.0.77（通过Anaconda安装）

训练期间的GPU内存使用情况

我对这方面还比较新，所以感激任何帮助。如果我遗漏了任何有用的信息，请告诉我，我将相应编辑帖子。提前感谢。

我尝试将分发策略从MultiWorkerMirroredStrategy更改为MirroredStrategy和experimental.CentralStorageStrategy，但没有真正的改变。我希望中央存储策略能够使CPU更有效地分发数据。

Derek

英文:

After running into OOM errors using Tensorflow Object Detection API with an NVIDIA 3080 (10GB) I bought a 4090 (24GB). I am currently running both together, but I noticed that in high batch size runs, I'm using almost all the 3080 but varying amounts of the 4090. Ideally I'd like to use all of both cards to push the batch size as high as possible. I can't seem to find a way to change the strategy so that the GPUs can take different loads. The Mirrored strategies seem to give each GPU the same amount of data to process during each split. Is there a way that one GPU can have more and the other less?

My machine and environment specs are as follows:

OS = Ubuntu 22.04
GPUs: [0: 4090, 1: 3080]
python: 3.10.9
cudatoolkit: 11.2.2 (installed through anaconda)
cudnn: 8.1.0.77 (installed through anaconda)

GPU Memory usage during training

I'm fairly new to this, so any help is appreciated. If I've left out any useful information, please let me know and I'll edit the post accordingly. Thanks in advance.

I've tried changing the distribution strategies from MultiWorkerMirroredStrategy to MirroredStrategy and experimental.CentralStorageStrategy with no real change. I was hoping that the central storage strategy would allow the CPU to more effectively distribute the data.

Derek

答案1

得分: 0

我最终通过从conda-forge下载ncurses并将其设置为默认通道来解决了这个问题。以下是执行此操作的说明：

conda config --add channels conda-forge
conda config --set channel_priority strict
conda install -c conda-forge ncurses
conda search ncurses --channel conda-forge

希望这能节省某人的时间！
DB

英文:

I ended up solving this by downloading ncurses from conda-forge and setting it as the default channel. Here are the instructions for doing this:

conda config --add channels conda-forge
conda config --set channel_priority strict
conda install -c conda-forge ncurses
conda search ncurses --channel conda-forge

Hope this saves someone some time!

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Tensorflow 使用一个GPU的全部资源，但另一个GPU的利用率较低。

问题

答案1

删除标题并确保第二个匹配不重叠的正则表达式

如何使用cv2.Canny边缘检测从图像中检测叶子？

保留文本列中的换行符并将其转换为 CSV 时。

主程序和使用dlopen加载的库需要不同版本的libsqlite3.so。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。