2023年6月9日 01:22:34go评论212阅读模式

英文:

Pytorch-Scarf package RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu

问题

RuntimeError: 预期所有张量在相同设备上，但至少找到两个设备，cuda:0 和 cpu！

这是我从此 GitHub 存储库运行示例笔记本时遇到的错误。

以下是代码：

batch_size = 128
epochs = 1000
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True)
model = SCARF(
  input_dim=train_ds.shape[1],
  emb_dim=16,
  corruption_rate=0.6,
).to(device)
optimizer = Adam(model.parameters(), lr=0.001)
ntxent_loss = NTXent()
loss_history = []
for epoch in range(1, epochs + 1):
  epoch_loss = train_epoch(model, ntxent_loss, train_loader, optimizer, device, epoch)
  loss_history.append(epoch_loss)

以下是确切的错误：

RuntimeError Traceback (most recent call last)  Cell In [7], line 7  4 loss_history = []  6 for epoch in range(1, epochs + 1):  ----&gt; 7 epoch_loss = train_epoch(model, ntxent_loss, train_loader, optimizer, device, epoch)  8 loss_history.append(epoch_loss)
    
    File ~/pytorch-scarf/example/../example/utils.py:23, in train_epoch(model, criterion, train_loader, optimizer, device, epoch)  20 emb_anchor, emb_positive = model(anchor, positive)  22 # compute loss  ---&gt; 23 loss = criterion(emb_anchor, emb_positive)  24 loss.backward()  26 # update model weights
    
    File /opt/tljh/user/lib/python3.9/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)  1126 # If we don&#39;t have any hooks, we want to skip the rest of the logic in  1127 # this function, and just call forward.  1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks  1129 or _global_forward_hooks or _global_forward_pre_hooks):  -&gt; 1130 return forward_call(*input, **kwargs)  1131 # Do not call functions when jit is used  1132 full_backward_hooks, non_full_backward_hooks = [], []
    
    File ~/pytorch-scarf/example/../scarf/loss.py:39, in NTXent.forward(self, z_i, z_j)  37 mask = (~torch.eye(batch_size * 2, batch_size * 2, dtype=torch.bool)).float()  38 numerator = torch.exp(positives / self.temperature)  ---&gt; 39 denominator = mask * torch.exp(similarity / self.temperature)  41 all_losses = -torch.log(numerator / torch.sum(denominator, dim=1))  42 loss = torch.sum(all_losses) / (2 * batch_size)
    
    RuntimeError: 预期所有张量在相同设备上，但至少找到两个设备，cuda:0 和 cpu！
当我在仅 CPU 的机器上运行代码时，我不会遇到相同的错误。由于数据的创建方式，我无法确认它们是什么张量类型（也许这是问题的原因）。我已经确认，在将它们传递给 criterion() 之前，emb_anchor 和 emb_positive 都是 cuda 张量（正如[这里](https://stackoverflow.com/questions/66091226/runtimeerror-expected-all-tensors-to-be-on-the-same-device-but-found-at-least)的帖子建议的可能解决方法）。
<details>
<summary>英文:</summary>
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
This comes when I run the example notebook from [this](https://github.com/clabrugere/pytorch-scarf) github repo.
Here is the code:
    batch_size = 128
    
    epochs = 1000  device = torch.device(&quot;cuda&quot; if torch.cuda.is_available() else &quot;cpu&quot;)
    
    train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True)
    
    model = SCARF(  input_dim=train_ds.shape[1],  emb_dim=16,  corruption_rate=0.6,  ).to(device)  optimizer = Adam(model.parameters(), lr=0.001)  ntxent_loss = NTXent()
    
    loss_history = []
    
    for epoch in range(1, epochs + 1):  epoch_loss = train_epoch(model, ntxent_loss, train_loader, optimizer, device, epoch)  loss_history.append(epoch_loss)
## and here is the exact error: 
    RuntimeError Traceback (most recent call last)  Cell In [7], line 7  4 loss_history = []  6 for epoch in range(1, epochs + 1):  ----&gt; 7 epoch_loss = train_epoch(model, ntxent_loss, train_loader, optimizer, device, epoch)  8 loss_history.append(epoch_loss)
    
    File ~/pytorch-scarf/example/../example/utils.py:23, in train_epoch(model, criterion, train_loader, optimizer, device, epoch)  20 emb_anchor, emb_positive = model(anchor, positive)  22 # compute loss  ---&gt; 23 loss = criterion(emb_anchor, emb_positive)  24 loss.backward()  26 # update model weights
    
    File /opt/tljh/user/lib/python3.9/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)  1126 # If we don&#39;t have any hooks, we want to skip the rest of the logic in  1127 # this function, and just call forward.  1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks  1129 or _global_forward_hooks or _global_forward_pre_hooks):  -&gt; 1130 return forward_call(*input, **kwargs)  1131 # Do not call functions when jit is used  1132 full_backward_hooks, non_full_backward_hooks = [], []
    
    File ~/pytorch-scarf/example/../scarf/loss.py:39, in NTXent.forward(self, z_i, z_j)  37 mask = (~torch.eye(batch_size * 2, batch_size * 2, dtype=torch.bool)).float()  38 numerator = torch.exp(positives / self.temperature)  ---&gt; 39 denominator = mask * torch.exp(similarity / self.temperature)  41 all_losses = -torch.log(numerator / torch.sum(denominator, dim=1))  42 loss = torch.sum(all_losses) / (2 * batch_size)
    
    RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
When I run the code on a CPU only machine I don&#39;t get this same error. Because of how the data is created I am not able to confirm what tensor type it is (maybe this is the problem). I&#39;ve confirmed that both emb_anchor and emb_positive before they are passed into criterion() are cuda (as suggested by [this](https://stackoverflow.com/questions/66091226/runtimeerror-expected-all-tensors-to-be-on-the-same-device-but-found-at-least) post to be a possible solution)
</details>
# 答案1
**得分**: 1
问题出在 `scarf/loss.py` 文件中。您应该将以下行替换为：
```python
mask = (~torch.eye(batch_size * 2, batch_size * 2, dtype=torch.bool)).float().to(z_i.device)

作者忘记将 mask 张量移动到 z_i.device。

英文:

The problem is in scarf/loss.py file. You should replace the line:

mask = (~torch.eye(batch_size * 2, batch_size * 2, dtype=torch.bool)).float()

with

mask = (~torch.eye(batch_size * 2, batch_size * 2, dtype=torch.bool)).float().to(z_i.device)

The author forgot to move mask tensor to z_i.device

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pytorch-Scarf package RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu

问题

shutil.copy 似乎没有复制所有预期的文件，但没有抛出异常。

Python3 Twisted反向代理重定向错误。

有没有一种简洁的一行代码方式来设置Python VCL GUI应用程序中组件的边距？

Python：如何将矩阵索引更改为笛卡尔索引？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。