英文:
Low utilization of the A100 GPU with fastai
问题
我目前正在使用fastai来训练计算机视觉模型。
我使用了这种开发环境。
在这台机器上有:
CPU 16核
RAM 64GB
GPU Nvidia A100
SSD 200GB
我在一个JupyterLab容器中进行开发,运行在一个1节点的Docker Swarm集群上。
JupyterLab实例安装在这个镜像上:
nvcr.io/nvidia/pytorch:23.01-py3
当我启动训练时,GPU的利用率没有达到100%,大约只有20%,而根据我的batch_size,GPU内存使用很大。
这里是一张截图:
我尝试使用相同的模型、相同的数据和类似的超参数通过pytorch运行训练,它能够使用100%的GPU性能。
我尝试安装不同版本的pytorch、fastai和cuda,但在fastai中GPU的使用始终限制在20%。
您是否有任何思路,可以帮助我找到解决方案呢?
英文:
I am currently using fastai to train computer vision models.
I use a development environment of this style.
On this machine we have :
CPU 16 cores
RAM 64go
GPU Nvidia A100
SSD 200go
I devellope on a jupyterlab container, on a 1 node docker swarm cluster.
The jupyterlab instance is installed on this image :
nvcr.io/nvidia/pytorch:23.01-py3
When I launch a training the GPU is not used at 100% it is more or less at 20% and the GPU memory is well exploded according to my batch_size.
Here is a screenshot :
I run a training via pytorch with the same model, the same data and similar hyperparameters and with pytorch it uses 100% of the GPU power.
I tried to install different versions of pytorch, fastai, cuda but nothing works with fastai the use of my GPU is always limited to 20%.
Would you have a reflection track, to help me to find a solution please?
I tried to install different versions of pytorch, fastai, cuda but nothing works with fastai the use of my GPU is always limited to 20%.
答案1
得分: 0
谢谢您的反馈,
在经过更多的调查后,我发现是这个回调 ActivationStats 使我的GPU变慢。以下是我的学习器(learner)的代码:
learn = vision_learner(
dls,
'resnet18',
metrics=[accuracy, error_rate],
cbs=[
CSVLogger(fname='PTO_ETIQUETTE.csv'),
EarlyStoppingCallback(monitor='valid_loss', min_delta=0.3, patience=10),
ActivationStats(with_hist=True)
],
pretrained=True
)
我不明白为什么这个回调会如此显著地降低GPU性能?
英文:
thank you for your feedback,
After more hours of investigation I found out what was slowing down my GPU because of this callback ActivationStats
here is the code of my learner:
learn = vision_learner(
dls,
'resnet18',
metrics=[accuracy, error_rate],
cbs=[
CSVLogger(fname='PTO_ETIQUETTE.csv'),
EarlyStoppingCallback(monitor='valid_loss', min_delta=0.3, patience=10),
ActivationStats(with_hist=True)
],
pretrained=True
)
I don't understand why this callback slows down so much the GPU performance ?
答案2
得分: 0
在ActivationStats(with_hist=True, cpu=False)
中添加cpu=False
应该可以解决这个问题,我相信。
看起来,默认情况下,统计计算是在 CPU 上执行的,如此处所示。
ActivationStats (with_hist=False, modules=None, every=None,
remove_end=True, is_forward=True, detach=True, cpu=True,
include_paramless=False, hook=None)
英文:
Adding cpu=False
to ActivationStats(with_hist=True, cpu=False)
would fix it I believe.
It looks like by default, stats computation takes place in cpu as shown here.
ActivationStats (with_hist=False, modules=None, every=None,
remove_end=True, is_forward=True, detach=True, cpu=True,
include_paramless=False, hook=None)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论