英文:
Why is pytorch running out of memory on a trivial multiplication?
问题
I'm trying to get the pytorch MNIST tutorial to run using WSL2/Ubuntu and RTX 3060 Ti GPU. On the first training batch it slurps up all the linux RAM until Ubuntu kills it.
在尝试使用WSL2/Ubuntu和RTX 3060 Ti GPU运行pytorch MNIST教程时,第一个训练批次会占用所有Linux RAM,直到Ubuntu终止它。
After paring down the tutorial, I see the same failure with tiny tensors in this simple repro case.
在简化教程后,我在这个简单的重现案例中看到了相同的失败。
import torch
x0 = torch.tensor([[1.], [4.]], device='cuda')
w0 = torch.tensor([[2.]], device='cuda')
y0 = torch.nn.functional.linear(x0, w0) <-- crashes here, should return tensor([[2.], [8.]])
import torch
x0 = torch.tensor([[1.], [4.]], device='cuda')
w0 = torch.tensor([[2.]], device='cuda')
y0 = torch.nn.functional.linear(x0, w0) <-- 在这里崩溃,应该返回tensor([[2.], [8.]])
jupyter kernel runs out of memory and dies
What I've tried:
我尝试过:
-
Checking that the GPU can be seen from the shell and pytorch.cuda.is_available() == True
-
Creating the tensors locally rather than on the cuda device - this works.
-
Running the code through python command line rather than jupyter - fails.
-
Various NVIDIA windows drivers for cuda versions 11.4 to 12.0 - doesn't seem to matter.
-
Wiping and rebuilding the WSL Ubuntu instance - doesn't help.
-
检查GPU是否在shell中可见,pytorch.cuda.is_available() == True
-
在本地创建张量而不是在cuda设备上 - 这有效。
-
通过Python命令行运行代码而不是Jupyter - 失败。
-
尝试不同版本的NVIDIA Windows驱动程序,从cuda版本11.4到12.0 - 似乎都没有用。
-
擦除并重新构建WSL Ubuntu实例 - 没有帮助。
$ conda list | grep torch
pytorch 1.13.1 py3.10_cuda11.7_cudnn8.5.0_0
pytorch-cuda 11.7 h67b0de4_1
$ nvidia-smi
Wed Feb 15 15:27:25 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.75 Driver Version: 517.40 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A |
| 0% 39C P8 12W / 200W | 515MiB / 8192MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
ls -al /usr/lib/wsl/lib
total 74192
drwxr-xr-x 1 root root 40 Feb 15 15:23 .
drwxr-xr-x 4 root root 4096 Feb 15 06:13 ..
-r-xr-xr-x 1 root root 141464 Sep 12 16:54 libcuda.so
-r-xr-xr-x 1 root root 141464 Sep 12 16:54 libcuda.so.1
-r-xr-xr-x 1 root root 141464 Sep 12 16:54 libcuda.so.1.1
-r-xr-xr-x 1 root root 800568 Oct 7 18:46 libd3d12.so
-r-xr-xr-x 1 root root 6224608 Oct 7 18:46 libd3d12core.so
-r-xr-xr-x 1 root root 829248 Oct 7 18:46 libdxcore.so
-r-xr-xr-x 1 root root 5950624 Sep 12 16:54 libnvcuvid.so
-r-xr-xr-x 1 root root 5950624 Sep 12 16:54 libnvcuvid.so.1
-r-xr-xr-x 1 root root 7547400 Sep 12 16:54 libnvdxdlkernels.so
-r-xr-xr-x 1 root root 424400 Sep 12 16:54 libnvidia-encode.so
-r-xr-xr-x 1 root root 424400 Sep 12 16:54 libnvidia-encode.so.1
-r-xr-xr-x 1 root root 212624 Sep 12 16:54 libnvidia-ml.so.1
-r-xr-xr-x 1 root root 354768 Sep 12 16:54 libnvidia-opticalflow.so
-r-xr-xr-x 1 root root 354768 Sep 12 16:54 libnvidia-opticalflow.so.1
-r-xr-xr-x 1 root root 45845584 Sep 12 16:54 libnvwgf2umx.so
-r-xr-xr-x 1 root root 600472 Sep 12 16:54 nvidia-smi
$ conda list | grep torch
pytorch 1.13.1 py3.10_cuda11.7_cudnn8.5.0_0
pytorch-cuda 11.7 h67b0de4_1
$ nvidia-smi
Wed Feb 15 15:27:25 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.75 Driver Version: 517.40 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
<details>
<summary>英文:</summary>
I'm trying to get the [pytorch MNIST tutorial](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html) to run using WSL2/Ubuntu and RTX 3060 Ti GPU. On the first training batch it slurps up all the linux RAM until Ubuntu kills it.
After paring down the tutorial, I see the same failure with tiny tensors in this simple repro case.
import torch
x0 = torch.tensor([[1.], [4.]], device='cuda')
w0 = torch.tensor([[2.]], device='cuda')
y0 = torch.nn.functional.linear(x0, w0) <-- crashes here, should return tensor([[2.], [8.]])
[jupyter kernel runs out of memory and dies](https://i.stack.imgur.com/27sfK.png)
What I've tried:
1. Checking that the GPU can be seen from the shell and pytorch.cuda.is_available() == True
1. Creating the tensors locally rather than on the cuda device - this works.
1. Running the code through python command line rather than jupyter - fails.
1. Various NVIDIA windows drivers for cuda versions 11.4 to 12.0 - doesn't seem to matter.
1. Wiping and rebuilding the WSL Ubuntu instance - doesn't help.
$ conda list | grep torch
pytorch 1.13.1 py3.10_cuda11.7_cudnn8.5.0_0
pytorch-cuda 11.7 h67b0de4_1
$ nvidia-smi
Wed Feb 15 15:27:25 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.75 Driver Version: 517.40 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A |
| 0% 39C P8 12W / 200W | 515MiB / 8192MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
ls -al /usr/lib/wsl/lib
total 74192
drwxr-xr-x 1 root root 40 Feb 15 15:23 .
drwxr-xr-x 4 root root 4096 Feb 15 06:13 ..
-r-xr-xr-x 1 root root 141464 Sep 12 16:54 libcuda.so
-r-xr-xr-x 1 root root 141464 Sep 12 16:54 libcuda.so.1
-r-xr-xr-x 1 root root 141464 Sep 12 16:54 libcuda.so.1.1
-r-xr-xr-x 1 root root 800568 Oct 7 18:46 libd3d12.so
-r-xr-xr-x 1 root root 6224608 Oct 7 18:46 libd3d12core.so
-r-xr-xr-x 1 root root 829248 Oct 7 18:46 libdxcore.so
-r-xr-xr-x 1 root root 5950624 Sep 12 16:54 libnvcuvid.so
-r-xr-xr-x 1 root root 5950624 Sep 12 16:54 libnvcuvid.so.1
-r-xr-xr-x 1 root root 7547400 Sep 12 16:54 libnvdxdlkernels.so
-r-xr-xr-x 1 root root 424400 Sep 12 16:54 libnvidia-encode.so
-r-xr-xr-x 1 root root 424400 Sep 12 16:54 libnvidia-encode.so.1
-r-xr-xr-x 1 root root 212624 Sep 12 16:54 libnvidia-ml.so.1
-r-xr-xr-x 1 root root 354768 Sep 12 16:54 libnvidia-opticalflow.so
-r-xr-xr-x 1 root root 354768 Sep 12 16:54 libnvidia-opticalflow.so.1
-r-xr-xr-x 1 root root 45845584 Sep 12 16:54 libnvwgf2umx.so
-r-xr-xr-x 1 root root 600472 Sep 12 16:54 nvidia-smi
</details>
# 答案1
**得分**: 3
我通过确保WSL配置的内存超过GPU来使其正常工作。似乎NVIDIA的统一虚拟地址(UVA)希望在首次调用时将RTX 3060 Ti的整个8GB映射到Linux的内存空间中。当我将WSL的内存从2GB增加到16GB(通过%USERPROFILE%\\.wslconfig)时,我的示例和PyTorch教程开始正常工作。
<details>
<summary>英文:</summary>
I was able to get it working by making sure WSL is configured with more memory than the GPU. It seems NVIDIA's Unified Virtual Addressing (UVA) wants to map the RTX 3060 Ti's whole 8GB into linux's memory space on the first call? When I increased my WSL memory from 2GB to 16GB (via %USERPROFILE%\\.wslconfig), my example and the pytorch tutorial started working.
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论