2023年2月24日 15:52:04go评论148阅读模式

英文:

Low utilization of the A100 GPU with fastai

问题

我目前正在使用fastai来训练计算机视觉模型。

我使用了这种开发环境。

在这台机器上有：

CPU 16核
RAM 64GB
GPU Nvidia A100
SSD 200GB

我在一个JupyterLab容器中进行开发，运行在一个1节点的Docker Swarm集群上。
JupyterLab实例安装在这个镜像上：
nvcr.io/nvidia/pytorch:23.01-py3

当我启动训练时，GPU的利用率没有达到100%，大约只有20%，而根据我的batch_size，GPU内存使用很大。
这里是一张截图：

GPU利用率

我尝试使用相同的模型、相同的数据和类似的超参数通过pytorch运行训练，它能够使用100%的GPU性能。

我尝试安装不同版本的pytorch、fastai和cuda，但在fastai中GPU的使用始终限制在20%。

您是否有任何思路，可以帮助我找到解决方案呢？

英文:

I am currently using fastai to train computer vision models.

I use a development environment of this style.

On this machine we have :

CPU 16 cores 
RAM 64go 
GPU Nvidia A100
SSD 200go

I devellope on a jupyterlab container, on a 1 node docker swarm cluster.
The jupyterlab instance is installed on this image :
nvcr.io/nvidia/pytorch:23.01-py3

When I launch a training the GPU is not used at 100% it is more or less at 20% and the GPU memory is well exploded according to my batch_size.
Here is a screenshot :

GPU Utilization

I run a training via pytorch with the same model, the same data and similar hyperparameters and with pytorch it uses 100% of the GPU power.

I tried to install different versions of pytorch, fastai, cuda but nothing works with fastai the use of my GPU is always limited to 20%.

Would you have a reflection track, to help me to find a solution please?

I tried to install different versions of pytorch, fastai, cuda but nothing works with fastai the use of my GPU is always limited to 20%.

答案1

得分: 0

谢谢您的反馈，

在经过更多的调查后，我发现是这个回调 ActivationStats 使我的GPU变慢。以下是我的学习器（learner）的代码：

learn = vision_learner(
    dls, 
    'resnet18', 
    metrics=[accuracy, error_rate],
    cbs=[
        CSVLogger(fname='PTO_ETIQUETTE.csv'),
        EarlyStoppingCallback(monitor='valid_loss', min_delta=0.3, patience=10),
        ActivationStats(with_hist=True)
    ],
    pretrained=True
)

我不明白为什么这个回调会如此显著地降低GPU性能？

英文:

thank you for your feedback,

After more hours of investigation I found out what was slowing down my GPU because of this callback ActivationStats

here is the code of my learner:

learn = vision_learner(
    dls, 
    &#39;resnet18&#39;, 
    metrics=[accuracy, error_rate],
    cbs=[
        CSVLogger(fname=&#39;PTO_ETIQUETTE.csv&#39;),
        EarlyStoppingCallback(monitor=&#39;valid_loss&#39;, min_delta=0.3, patience=10),
        ActivationStats(with_hist=True)
    ],
    pretrained=True
)

I don't understand why this callback slows down so much the GPU performance ?

答案2

得分: 0

在ActivationStats(with_hist=True, cpu=False)中添加cpu=False应该可以解决这个问题，我相信。

看起来，默认情况下，统计计算是在 CPU 上执行的，如此处所示。

ActivationStats (with_hist=False, modules=None, every=None,
                  remove_end=True, is_forward=True, detach=True, cpu=True,
                  include_paramless=False, hook=None)

英文:

Adding cpu=False to ActivationStats(with_hist=True, cpu=False) would fix it I believe.

It looks like by default, stats computation takes place in cpu as shown here.

 ActivationStats (with_hist=False, modules=None, every=None,
                  remove_end=True, is_forward=True, detach=True, cpu=True,
                  include_paramless=False, hook=None)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

A100 GPU 在 fastai 中利用率低。

问题

答案1

答案2

TypeError: ‘float’对象不可订阅，同时打印

CUDA数学函数寄存器使用

I want to broadcast an pytorcc tensor of dimension (a,b,c) onto an array of dimension (b,c) to get an output of dimension (a,c) how do I do this?

为什么我的神经网络无法学习 XOR 问题？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论