2023年2月16日 07:58:10go评论79阅读模式

英文:

Why is pytorch running out of memory on a trivial multiplication?

问题

I'm trying to get the pytorch MNIST tutorial to run using WSL2/Ubuntu and RTX 3060 Ti GPU. On the first training batch it slurps up all the linux RAM until Ubuntu kills it.

在尝试使用WSL2/Ubuntu和RTX 3060 Ti GPU运行pytorch MNIST教程时，第一个训练批次会占用所有Linux RAM，直到Ubuntu终止它。

After paring down the tutorial, I see the same failure with tiny tensors in this simple repro case.

在简化教程后，我在这个简单的重现案例中看到了相同的失败。

import torch
x0 = torch.tensor([[1.], [4.]], device='cuda')
w0 = torch.tensor([[2.]], device='cuda')
y0 = torch.nn.functional.linear(x0, w0)    &lt;-- crashes here, should return tensor([[2.], [8.]])

import torch
x0 = torch.tensor([[1.], [4.]], device='cuda')
w0 = torch.tensor([[2.]], device='cuda')
y0 = torch.nn.functional.linear(x0, w0)    &lt;-- 在这里崩溃，应该返回tensor([[2.], [8.]])

jupyter kernel runs out of memory and dies

jupyter内核内存耗尽并终止

What I've tried:

我尝试过：

Checking that the GPU can be seen from the shell and pytorch.cuda.is_available() == True
Creating the tensors locally rather than on the cuda device - this works.
Running the code through python command line rather than jupyter - fails.
Various NVIDIA windows drivers for cuda versions 11.4 to 12.0 - doesn't seem to matter.
Wiping and rebuilding the WSL Ubuntu instance - doesn't help.
检查GPU是否在shell中可见，pytorch.cuda.is_available() == True
在本地创建张量而不是在cuda设备上 - 这有效。
通过Python命令行运行代码而不是Jupyter - 失败。
尝试不同版本的NVIDIA Windows驱动程序，从cuda版本11.4到12.0 - 似乎都没有用。
擦除并重新构建WSL Ubuntu实例 - 没有帮助。

$ conda list | grep torch
pytorch                   1.13.1          py3.10_cuda11.7_cudnn8.5.0_0
pytorch-cuda              11.7                 h67b0de4_1
$ nvidia-smi
Wed Feb 15 15:27:25 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.75       Driver Version: 517.40       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
|  0%   39C    P8    12W / 200W |    515MiB /  8192MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
ls -al /usr/lib/wsl/lib
total 74192
drwxr-xr-x 1 root root       40 Feb 15 15:23 .
drwxr-xr-x 4 root root     4096 Feb 15 06:13 ..
-r-xr-xr-x 1 root root   141464 Sep 12 16:54 libcuda.so
-r-xr-xr-x 1 root root   141464 Sep 12 16:54 libcuda.so.1
-r-xr-xr-x 1 root root   141464 Sep 12 16:54 libcuda.so.1.1
-r-xr-xr-x 1 root root   800568 Oct  7 18:46 libd3d12.so
-r-xr-xr-x 1 root root  6224608 Oct  7 18:46 libd3d12core.so
-r-xr-xr-x 1 root root   829248 Oct  7 18:46 libdxcore.so
-r-xr-xr-x 1 root root  5950624 Sep 12 16:54 libnvcuvid.so
-r-xr-xr-x 1 root root  5950624 Sep 12 16:54 libnvcuvid.so.1
-r-xr-xr-x 1 root root  7547400 Sep 12 16:54 libnvdxdlkernels.so
-r-xr-xr-x 1 root root   424400 Sep 12 16:54 libnvidia-encode.so
-r-xr-xr-x 1 root root   424400 Sep 12 16:54 libnvidia-encode.so.1
-r-xr-xr-x 1 root root   212624 Sep 12 16:54 libnvidia-ml.so.1
-r-xr-xr-x 1 root root   354768 Sep 12 16:54 libnvidia-opticalflow.so
-r-xr-xr-x 1 root root   354768 Sep 12 16:54 libnvidia-opticalflow.so.1
-r-xr-xr-x 1 root root 45845584 Sep 12 16:54 libnvwgf2umx.so
-r-xr-xr-x 1 root root   600472 Sep 12 16:54 nvidia-smi

$ conda list | grep torch
pytorch                   1.13.1          py3.10_cuda11.7_cudnn8.5.0_0
pytorch-cuda              11.7                 h67b0de4_1
$ nvidia-smi
Wed Feb 15 15:27:25 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.75       Driver Version: 517.40       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
<details>
<summary>英文:</summary>
I&#39;m trying to get the [pytorch MNIST tutorial](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html) to run using WSL2/Ubuntu and RTX 3060 Ti GPU.  On the first training batch it slurps up all the linux RAM until Ubuntu kills it.
After paring down the tutorial, I see the same failure with tiny tensors in this simple repro case.

import torch
x0 = torch.tensor([[1.], [4.]], device='cuda')
w0 = torch.tensor([[2.]], device='cuda')
y0 = torch.nn.functional.linear(x0, w0) <-- crashes here, should return tensor([[2.], [8.]])

[jupyter kernel runs out of memory and dies](https://i.stack.imgur.com/27sfK.png)
What I&#39;ve tried:
1. Checking that the GPU can be seen from the shell and pytorch.cuda.is_available() == True
1. Creating the tensors locally rather than on the cuda device - this works.
1. Running the code through python command line rather than jupyter - fails.
1. Various NVIDIA windows drivers for cuda versions 11.4 to 12.0 - doesn&#39;t seem to matter.
1. Wiping and rebuilding the WSL Ubuntu instance - doesn&#39;t help.

$ conda list | grep torch
pytorch 1.13.1 py3.10_cuda11.7_cudnn8.5.0_0
pytorch-cuda 11.7 h67b0de4_1

ls -al /usr/lib/wsl/lib
total 74192
drwxr-xr-x 1 root root 40 Feb 15 15:23 .
drwxr-xr-x 4 root root 4096 Feb 15 06:13 ..
-r-xr-xr-x 1 root root 141464 Sep 12 16:54 libcuda.so
-r-xr-xr-x 1 root root 141464 Sep 12 16:54 libcuda.so.1
-r-xr-xr-x 1 root root 141464 Sep 12 16:54 libcuda.so.1.1
-r-xr-xr-x 1 root root 800568 Oct 7 18:46 libd3d12.so
-r-xr-xr-x 1 root root 6224608 Oct 7 18:46 libd3d12core.so
-r-xr-xr-x 1 root root 829248 Oct 7 18:46 libdxcore.so
-r-xr-xr-x 1 root root 5950624 Sep 12 16:54 libnvcuvid.so
-r-xr-xr-x 1 root root 5950624 Sep 12 16:54 libnvcuvid.so.1
-r-xr-xr-x 1 root root 7547400 Sep 12 16:54 libnvdxdlkernels.so
-r-xr-xr-x 1 root root 424400 Sep 12 16:54 libnvidia-encode.so
-r-xr-xr-x 1 root root 424400 Sep 12 16:54 libnvidia-encode.so.1
-r-xr-xr-x 1 root root 212624 Sep 12 16:54 libnvidia-ml.so.1
-r-xr-xr-x 1 root root 354768 Sep 12 16:54 libnvidia-opticalflow.so
-r-xr-xr-x 1 root root 354768 Sep 12 16:54 libnvidia-opticalflow.so.1
-r-xr-xr-x 1 root root 45845584 Sep 12 16:54 libnvwgf2umx.so
-r-xr-xr-x 1 root root 600472 Sep 12 16:54 nvidia-smi


</details>
# 答案1
**得分**: 3
我通过确保WSL配置的内存超过GPU来使其正常工作。似乎NVIDIA的统一虚拟地址（UVA）希望在首次调用时将RTX 3060 Ti的整个8GB映射到Linux的内存空间中。当我将WSL的内存从2GB增加到16GB（通过%USERPROFILE%\\.wslconfig）时，我的示例和PyTorch教程开始正常工作。
<details>
<summary>英文:</summary>
I was able to get it working by making sure WSL is configured with more memory than the GPU. It seems NVIDIA&#39;s Unified Virtual Addressing (UVA) wants to map the RTX 3060 Ti&#39;s whole 8GB into linux&#39;s memory space on the first call? When I increased my WSL memory from 2GB to 16GB (via %USERPROFILE%\\.wslconfig), my example and the pytorch tutorial started working.
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

PyTorch在进行简单的乘法运算时为什么会内存不足？

问题

TokenClassificationChunkPipeline 报错：’BatchEncoding’ 对象不是迭代器

Torch未使用CUDA编译，需要在我的本地PC上使用CUDA。

torch.onnx.export报告：“未安装模块onnx！”

Pre-allocating dynamic shaped tensor memory for ONNX runtime inference?

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。