PyTorch在进行简单的乘法运算时为什么会内存不足?

huangapple go评论79阅读模式
英文:

Why is pytorch running out of memory on a trivial multiplication?

问题

I'm trying to get the pytorch MNIST tutorial to run using WSL2/Ubuntu and RTX 3060 Ti GPU. On the first training batch it slurps up all the linux RAM until Ubuntu kills it.

在尝试使用WSL2/Ubuntu和RTX 3060 Ti GPU运行pytorch MNIST教程时,第一个训练批次会占用所有Linux RAM,直到Ubuntu终止它。

After paring down the tutorial, I see the same failure with tiny tensors in this simple repro case.

在简化教程后,我在这个简单的重现案例中看到了相同的失败。

  1. import torch
  2. x0 = torch.tensor([[1.], [4.]], device='cuda')
  3. w0 = torch.tensor([[2.]], device='cuda')
  4. y0 = torch.nn.functional.linear(x0, w0) <-- crashes here, should return tensor([[2.], [8.]])
  1. import torch
  2. x0 = torch.tensor([[1.], [4.]], device='cuda')
  3. w0 = torch.tensor([[2.]], device='cuda')
  4. y0 = torch.nn.functional.linear(x0, w0) <-- 在这里崩溃应该返回tensor([[2.], [8.]])

jupyter kernel runs out of memory and dies

jupyter内核内存耗尽并终止

What I've tried:

我尝试过:

  1. Checking that the GPU can be seen from the shell and pytorch.cuda.is_available() == True

  2. Creating the tensors locally rather than on the cuda device - this works.

  3. Running the code through python command line rather than jupyter - fails.

  4. Various NVIDIA windows drivers for cuda versions 11.4 to 12.0 - doesn't seem to matter.

  5. Wiping and rebuilding the WSL Ubuntu instance - doesn't help.

  6. 检查GPU是否在shell中可见,pytorch.cuda.is_available() == True

  7. 在本地创建张量而不是在cuda设备上 - 这有效。

  8. 通过Python命令行运行代码而不是Jupyter - 失败。

  9. 尝试不同版本的NVIDIA Windows驱动程序,从cuda版本11.4到12.0 - 似乎都没有用。

  10. 擦除并重新构建WSL Ubuntu实例 - 没有帮助。

  1. $ conda list | grep torch
  2. pytorch 1.13.1 py3.10_cuda11.7_cudnn8.5.0_0
  3. pytorch-cuda 11.7 h67b0de4_1
  4. $ nvidia-smi
  5. Wed Feb 15 15:27:25 2023
  6. +-----------------------------------------------------------------------------+
  7. | NVIDIA-SMI 515.75 Driver Version: 517.40 CUDA Version: 11.7 |
  8. |-------------------------------+----------------------+----------------------+
  9. | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
  10. | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
  11. | | | MIG M. |
  12. |===============================+======================+======================|
  13. | 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A |
  14. | 0% 39C P8 12W / 200W | 515MiB / 8192MiB | 2% Default |
  15. | | | N/A |
  16. +-------------------------------+----------------------+----------------------+
  17. ls -al /usr/lib/wsl/lib
  18. total 74192
  19. drwxr-xr-x 1 root root 40 Feb 15 15:23 .
  20. drwxr-xr-x 4 root root 4096 Feb 15 06:13 ..
  21. -r-xr-xr-x 1 root root 141464 Sep 12 16:54 libcuda.so
  22. -r-xr-xr-x 1 root root 141464 Sep 12 16:54 libcuda.so.1
  23. -r-xr-xr-x 1 root root 141464 Sep 12 16:54 libcuda.so.1.1
  24. -r-xr-xr-x 1 root root 800568 Oct 7 18:46 libd3d12.so
  25. -r-xr-xr-x 1 root root 6224608 Oct 7 18:46 libd3d12core.so
  26. -r-xr-xr-x 1 root root 829248 Oct 7 18:46 libdxcore.so
  27. -r-xr-xr-x 1 root root 5950624 Sep 12 16:54 libnvcuvid.so
  28. -r-xr-xr-x 1 root root 5950624 Sep 12 16:54 libnvcuvid.so.1
  29. -r-xr-xr-x 1 root root 7547400 Sep 12 16:54 libnvdxdlkernels.so
  30. -r-xr-xr-x 1 root root 424400 Sep 12 16:54 libnvidia-encode.so
  31. -r-xr-xr-x 1 root root 424400 Sep 12 16:54 libnvidia-encode.so.1
  32. -r-xr-xr-x 1 root root 212624 Sep 12 16:54 libnvidia-ml.so.1
  33. -r-xr-xr-x 1 root root 354768 Sep 12 16:54 libnvidia-opticalflow.so
  34. -r-xr-xr-x 1 root root 354768 Sep 12 16:54 libnvidia-opticalflow.so.1
  35. -r-xr-xr-x 1 root root 45845584 Sep 12 16:54 libnvwgf2umx.so
  36. -r-xr-xr-x 1 root root 600472 Sep 12 16:54 nvidia-smi
  1. $ conda list | grep torch
  2. pytorch 1.13.1 py3.10_cuda11.7_cudnn8.5.0_0
  3. pytorch-cuda 11.7 h67b0de4_1
  4. $ nvidia-smi
  5. Wed Feb 15 15:27:25 2023
  6. +-----------------------------------------------------------------------------+
  7. | NVIDIA-SMI 515.75 Driver Version: 517.40 CUDA Version: 11.7 |
  8. |-------------------------------+----------------------+----------------------+
  9. <details>
  10. <summary>英文:</summary>
  11. I&#39;m trying to get the [pytorch MNIST tutorial](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html) to run using WSL2/Ubuntu and RTX 3060 Ti GPU. On the first training batch it slurps up all the linux RAM until Ubuntu kills it.
  12. After paring down the tutorial, I see the same failure with tiny tensors in this simple repro case.

import torch
x0 = torch.tensor([[1.], [4.]], device='cuda')
w0 = torch.tensor([[2.]], device='cuda')
y0 = torch.nn.functional.linear(x0, w0) <-- crashes here, should return tensor([[2.], [8.]])

  1. [jupyter kernel runs out of memory and dies](https://i.stack.imgur.com/27sfK.png)
  2. What I&#39;ve tried:
  3. 1. Checking that the GPU can be seen from the shell and pytorch.cuda.is_available() == True
  4. 1. Creating the tensors locally rather than on the cuda device - this works.
  5. 1. Running the code through python command line rather than jupyter - fails.
  6. 1. Various NVIDIA windows drivers for cuda versions 11.4 to 12.0 - doesn&#39;t seem to matter.
  7. 1. Wiping and rebuilding the WSL Ubuntu instance - doesn&#39;t help.

$ conda list | grep torch
pytorch 1.13.1 py3.10_cuda11.7_cudnn8.5.0_0
pytorch-cuda 11.7 h67b0de4_1

$ nvidia-smi
Wed Feb 15 15:27:25 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.75 Driver Version: 517.40 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A |
| 0% 39C P8 12W / 200W | 515MiB / 8192MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

ls -al /usr/lib/wsl/lib
total 74192
drwxr-xr-x 1 root root 40 Feb 15 15:23 .
drwxr-xr-x 4 root root 4096 Feb 15 06:13 ..
-r-xr-xr-x 1 root root 141464 Sep 12 16:54 libcuda.so
-r-xr-xr-x 1 root root 141464 Sep 12 16:54 libcuda.so.1
-r-xr-xr-x 1 root root 141464 Sep 12 16:54 libcuda.so.1.1
-r-xr-xr-x 1 root root 800568 Oct 7 18:46 libd3d12.so
-r-xr-xr-x 1 root root 6224608 Oct 7 18:46 libd3d12core.so
-r-xr-xr-x 1 root root 829248 Oct 7 18:46 libdxcore.so
-r-xr-xr-x 1 root root 5950624 Sep 12 16:54 libnvcuvid.so
-r-xr-xr-x 1 root root 5950624 Sep 12 16:54 libnvcuvid.so.1
-r-xr-xr-x 1 root root 7547400 Sep 12 16:54 libnvdxdlkernels.so
-r-xr-xr-x 1 root root 424400 Sep 12 16:54 libnvidia-encode.so
-r-xr-xr-x 1 root root 424400 Sep 12 16:54 libnvidia-encode.so.1
-r-xr-xr-x 1 root root 212624 Sep 12 16:54 libnvidia-ml.so.1
-r-xr-xr-x 1 root root 354768 Sep 12 16:54 libnvidia-opticalflow.so
-r-xr-xr-x 1 root root 354768 Sep 12 16:54 libnvidia-opticalflow.so.1
-r-xr-xr-x 1 root root 45845584 Sep 12 16:54 libnvwgf2umx.so
-r-xr-xr-x 1 root root 600472 Sep 12 16:54 nvidia-smi

  1. </details>
  2. # 答案1
  3. **得分**: 3
  4. 我通过确保WSL配置的内存超过GPU来使其正常工作。似乎NVIDIA的统一虚拟地址(UVA)希望在首次调用时将RTX 3060 Ti的整个8GB映射到Linux的内存空间中。当我将WSL的内存从2GB增加到16GB(通过%USERPROFILE%\\.wslconfig)时,我的示例和PyTorch教程开始正常工作。
  5. <details>
  6. <summary>英文:</summary>
  7. I was able to get it working by making sure WSL is configured with more memory than the GPU. It seems NVIDIA&#39;s Unified Virtual Addressing (UVA) wants to map the RTX 3060 Ti&#39;s whole 8GB into linux&#39;s memory space on the first call? When I increased my WSL memory from 2GB to 16GB (via %USERPROFILE%\\.wslconfig), my example and the pytorch tutorial started working.
  8. </details>

huangapple
  • 本文由 发表于 2023年2月16日 07:58:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75466538.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定