PyTorch在进行简单的乘法运算时为什么会内存不足?

huangapple go评论51阅读模式
英文:

Why is pytorch running out of memory on a trivial multiplication?

问题

I'm trying to get the pytorch MNIST tutorial to run using WSL2/Ubuntu and RTX 3060 Ti GPU. On the first training batch it slurps up all the linux RAM until Ubuntu kills it.

在尝试使用WSL2/Ubuntu和RTX 3060 Ti GPU运行pytorch MNIST教程时,第一个训练批次会占用所有Linux RAM,直到Ubuntu终止它。

After paring down the tutorial, I see the same failure with tiny tensors in this simple repro case.

在简化教程后,我在这个简单的重现案例中看到了相同的失败。

import torch
x0 = torch.tensor([[1.], [4.]], device='cuda')
w0 = torch.tensor([[2.]], device='cuda')
y0 = torch.nn.functional.linear(x0, w0)    <-- crashes here, should return tensor([[2.], [8.]])
import torch
x0 = torch.tensor([[1.], [4.]], device='cuda')
w0 = torch.tensor([[2.]], device='cuda')
y0 = torch.nn.functional.linear(x0, w0)    <-- 在这里崩溃应该返回tensor([[2.], [8.]])

jupyter kernel runs out of memory and dies

jupyter内核内存耗尽并终止

What I've tried:

我尝试过:

  1. Checking that the GPU can be seen from the shell and pytorch.cuda.is_available() == True

  2. Creating the tensors locally rather than on the cuda device - this works.

  3. Running the code through python command line rather than jupyter - fails.

  4. Various NVIDIA windows drivers for cuda versions 11.4 to 12.0 - doesn't seem to matter.

  5. Wiping and rebuilding the WSL Ubuntu instance - doesn't help.

  6. 检查GPU是否在shell中可见,pytorch.cuda.is_available() == True

  7. 在本地创建张量而不是在cuda设备上 - 这有效。

  8. 通过Python命令行运行代码而不是Jupyter - 失败。

  9. 尝试不同版本的NVIDIA Windows驱动程序,从cuda版本11.4到12.0 - 似乎都没有用。

  10. 擦除并重新构建WSL Ubuntu实例 - 没有帮助。

$ conda list | grep torch
pytorch                   1.13.1          py3.10_cuda11.7_cudnn8.5.0_0
pytorch-cuda              11.7                 h67b0de4_1

$ nvidia-smi
Wed Feb 15 15:27:25 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.75       Driver Version: 517.40       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
|  0%   39C    P8    12W / 200W |    515MiB /  8192MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

ls -al /usr/lib/wsl/lib
total 74192
drwxr-xr-x 1 root root       40 Feb 15 15:23 .
drwxr-xr-x 4 root root     4096 Feb 15 06:13 ..
-r-xr-xr-x 1 root root   141464 Sep 12 16:54 libcuda.so
-r-xr-xr-x 1 root root   141464 Sep 12 16:54 libcuda.so.1
-r-xr-xr-x 1 root root   141464 Sep 12 16:54 libcuda.so.1.1
-r-xr-xr-x 1 root root   800568 Oct  7 18:46 libd3d12.so
-r-xr-xr-x 1 root root  6224608 Oct  7 18:46 libd3d12core.so
-r-xr-xr-x 1 root root   829248 Oct  7 18:46 libdxcore.so
-r-xr-xr-x 1 root root  5950624 Sep 12 16:54 libnvcuvid.so
-r-xr-xr-x 1 root root  5950624 Sep 12 16:54 libnvcuvid.so.1
-r-xr-xr-x 1 root root  7547400 Sep 12 16:54 libnvdxdlkernels.so
-r-xr-xr-x 1 root root   424400 Sep 12 16:54 libnvidia-encode.so
-r-xr-xr-x 1 root root   424400 Sep 12 16:54 libnvidia-encode.so.1
-r-xr-xr-x 1 root root   212624 Sep 12 16:54 libnvidia-ml.so.1
-r-xr-xr-x 1 root root   354768 Sep 12 16:54 libnvidia-opticalflow.so
-r-xr-xr-x 1 root root   354768 Sep 12 16:54 libnvidia-opticalflow.so.1
-r-xr-xr-x 1 root root 45845584 Sep 12 16:54 libnvwgf2umx.so
-r-xr-xr-x 1 root root   600472 Sep 12 16:54 nvidia-smi
$ conda list | grep torch
pytorch                   1.13.1          py3.10_cuda11.7_cudnn8.5.0_0
pytorch-cuda              11.7                 h67b0de4_1

$ nvidia-smi
Wed Feb 15 15:27:25 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.75       Driver Version: 517.40       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+


<details>
<summary>英文:</summary>

I&#39;m trying to get the [pytorch MNIST tutorial](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html) to run using WSL2/Ubuntu and RTX 3060 Ti GPU.  On the first training batch it slurps up all the linux RAM until Ubuntu kills it.

After paring down the tutorial, I see the same failure with tiny tensors in this simple repro case.

import torch
x0 = torch.tensor([[1.], [4.]], device='cuda')
w0 = torch.tensor([[2.]], device='cuda')
y0 = torch.nn.functional.linear(x0, w0) <-- crashes here, should return tensor([[2.], [8.]])

[jupyter kernel runs out of memory and dies](https://i.stack.imgur.com/27sfK.png)


What I&#39;ve tried:

1. Checking that the GPU can be seen from the shell and pytorch.cuda.is_available() == True
1. Creating the tensors locally rather than on the cuda device - this works.
1. Running the code through python command line rather than jupyter - fails.
1. Various NVIDIA windows drivers for cuda versions 11.4 to 12.0 - doesn&#39;t seem to matter.
1. Wiping and rebuilding the WSL Ubuntu instance - doesn&#39;t help.


$ conda list | grep torch
pytorch 1.13.1 py3.10_cuda11.7_cudnn8.5.0_0
pytorch-cuda 11.7 h67b0de4_1

$ nvidia-smi
Wed Feb 15 15:27:25 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.75 Driver Version: 517.40 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A |
| 0% 39C P8 12W / 200W | 515MiB / 8192MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

ls -al /usr/lib/wsl/lib
total 74192
drwxr-xr-x 1 root root 40 Feb 15 15:23 .
drwxr-xr-x 4 root root 4096 Feb 15 06:13 ..
-r-xr-xr-x 1 root root 141464 Sep 12 16:54 libcuda.so
-r-xr-xr-x 1 root root 141464 Sep 12 16:54 libcuda.so.1
-r-xr-xr-x 1 root root 141464 Sep 12 16:54 libcuda.so.1.1
-r-xr-xr-x 1 root root 800568 Oct 7 18:46 libd3d12.so
-r-xr-xr-x 1 root root 6224608 Oct 7 18:46 libd3d12core.so
-r-xr-xr-x 1 root root 829248 Oct 7 18:46 libdxcore.so
-r-xr-xr-x 1 root root 5950624 Sep 12 16:54 libnvcuvid.so
-r-xr-xr-x 1 root root 5950624 Sep 12 16:54 libnvcuvid.so.1
-r-xr-xr-x 1 root root 7547400 Sep 12 16:54 libnvdxdlkernels.so
-r-xr-xr-x 1 root root 424400 Sep 12 16:54 libnvidia-encode.so
-r-xr-xr-x 1 root root 424400 Sep 12 16:54 libnvidia-encode.so.1
-r-xr-xr-x 1 root root 212624 Sep 12 16:54 libnvidia-ml.so.1
-r-xr-xr-x 1 root root 354768 Sep 12 16:54 libnvidia-opticalflow.so
-r-xr-xr-x 1 root root 354768 Sep 12 16:54 libnvidia-opticalflow.so.1
-r-xr-xr-x 1 root root 45845584 Sep 12 16:54 libnvwgf2umx.so
-r-xr-xr-x 1 root root 600472 Sep 12 16:54 nvidia-smi




</details>


# 答案1
**得分**: 3

我通过确保WSL配置的内存超过GPU来使其正常工作。似乎NVIDIA的统一虚拟地址(UVA)希望在首次调用时将RTX 3060 Ti的整个8GB映射到Linux的内存空间中。当我将WSL的内存从2GB增加到16GB(通过%USERPROFILE%\\.wslconfig)时,我的示例和PyTorch教程开始正常工作。

<details>
<summary>英文:</summary>

I was able to get it working by making sure WSL is configured with more memory than the GPU. It seems NVIDIA&#39;s Unified Virtual Addressing (UVA) wants to map the RTX 3060 Ti&#39;s whole 8GB into linux&#39;s memory space on the first call? When I increased my WSL memory from 2GB to 16GB (via %USERPROFILE%\\.wslconfig), my example and the pytorch tutorial started working.

</details>



huangapple
  • 本文由 发表于 2023年2月16日 07:58:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75466538.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定