RuntimeError: Input type (torch.cuda.DoubleTensor) and weight type (torch.cuda.FloatTensor) should be the same

huangapple go评论101阅读模式
英文:

RuntimeError: Input type (torch.cuda.DoubleTensor) and weight type (torch.cuda.FloatTensor) should be the same

问题

这个错误是因为你的输入数据类型与模型参数的数据类型不匹配。在你的模型定义中,你使用了默认的数据类型,而在训练循环中,你的输入数据类型是DoubleTensor。为了解决这个问题,你可以在模型定义中指定数据类型,使其与输入数据一致。

在你的CNNModel类的构造函数中,你可以将模型的权重和偏差的数据类型指定为与输入数据一致。在forward方法的开头,将输入数据的数据类型更改为与模型一致。下面是如何修改你的代码:

  1. class CNNModel(nn.Module):
  2. def __init__(self):
  3. super(CNNModel, self).__init__()
  4. # convolution 1
  5. self.c1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=(5,5), stride=1, padding=0)
  6. self.relu1 = nn.ReLU()
  7. # maxpool 1
  8. self.maxpool1 = nn.MaxPool2d(kernel_size=(2,2))
  9. # dropout 1
  10. self.dropout1 = nn.Dropout(0.25)
  11. # convolution 2
  12. self.c2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=(3,3), stride=1, padding=0)
  13. self.relu2 = nn.ReLU()
  14. # maxpool 2
  15. self.maxpool2 = nn.MaxPool2d(kernel_size=(2,2))
  16. # dropout 2
  17. self.dropout2 = nn.Dropout(0.25)
  18. # linear 1
  19. self.fc1 = nn.Linear(32*5*5, 256)
  20. # dropout 3
  21. self.dropout3 = nn.Dropout(0.25)
  22. # linear 2
  23. self.fc2 = nn.Linear(256, 10)
  24. def forward(self, x):
  25. # 将输入数据类型更改为FloatTensor
  26. x = x.float()
  27. out = self.c1(x) # [BATCH_SIZE, 16, 24, 24]
  28. out = self.relu1(out)
  29. out = self.maxpool1(out) # [BATCH_SIZE, 16, 12, 12]
  30. out = self.dropout1(out)
  31. out = self.c2(out) # [BATCH_SIZE, 32, 10, 10]
  32. out = self.relu2(out)
  33. out = self.maxpool2(out) # [BATCH_SIZE, 32, 5, 5]
  34. out = self.dropout2(out)
  35. out = out.view(out.size(0), -1) # [BATCH_SIZE, 32*5*5=800]
  36. out = self.fc1(out) # [BATCH_SIZE, 256]
  37. out = self.dropout3(out)
  38. out = self.fc2(out) # [BATCH_SIZE, 10]
  39. return out

这样做后,模型的权重和偏差将与输入数据一致,并且不会引发数据类型不匹配的错误。请确保在你的训练循环中使用FloatTensor类型的输入数据。

英文:

I am training a CNN to classify some images. The objective is to classify them into two classes. I already executed the same code on Windows with a RTX3070, now I am trying to do the exact same on Ubuntu with a Nvidea A100-40Gb. The code I am using is this one:

  1. import warnings
  2. warnings.filterwarnings('ignore')
  3. import random
  4. import pandas as pd
  5. import numpy as np
  6. import matplotlib.pyplot as plt
  7. import matplotlib.image as mpimg
  8. import plotly
  9. import plotly.graph_objects as go
  10. %matplotlib inline
  11. import os
  12. from sklearn.calibration import calibration_curve
  13. from sklearn.model_selection import train_test_split
  14. from sklearn.metrics import confusion_matrix
  15. import itertools
  16. import torch
  17. import torchvision
  18. import torch.nn as nn
  19. import torch.nn.functional as F
  20. from torch.autograd import Variable
  21. from torch.optim import lr_scheduler
  22. if torch.cuda.is_available():
  23. print("CUDA available. Using GPU acceleration.")
  24. device = "cuda"
  25. else:
  26. print("CUDA is NOT available. Using CPU for training.")
  27. device = "cpu"
  28. import pickle
  29. def save_var(var,filename):
  30. with open(filename, 'wb') as f:
  31. pickle.dump(var, f)
  32. def recover_var(filename):
  33. with open(filename, 'rb') as f:
  34. var = pickle.load(f)
  35. return var
  36. df = recover_var('dataframe_cnn.pickle') #my dataset
  37. df = df.sample(frac=1).reset_index(drop=True)
  38. df.columns = ['label'] + list(range(1,27649))
  39. train =df[:int(0.7*len(df))]
  40. test = df[int(0.7*len(df)):]
  41. def preprocessing(train, test, split_train_size = 0.2):
  42. # Split data into features(pixels) and labels(numbers from 0 to 9)
  43. targets = train.label.values
  44. features = train.drop(["label"], axis = 1).values
  45. # Normalization
  46. features = features/255.
  47. X_test = test.values/255.
  48. # Train test split. Size of train data is (1-split_train_size)*100% and size of test data is split_train_size%.
  49. X_train, X_val, y_train, y_val = train_test_split(features,
  50. targets,
  51. test_size = split_train_size,
  52. random_state = 42)
  53. # Create feature and targets tensor for train set. I need variable to accumulate gradients. Therefore first I create tensor, then I will create variable
  54. X_train = torch.from_numpy(X_train)
  55. y_train = torch.from_numpy(y_train).type(torch.LongTensor) # data type is long
  56. # Create feature and targets tensor for test set.
  57. X_val = torch.from_numpy(X_val)
  58. y_val = torch.from_numpy(y_val).type(torch.LongTensor) # data type is long
  59. # Create feature tensor for train set.
  60. X_test = torch.from_numpy(X_test)
  61. return X_train, y_train, X_val, y_val, X_test
  62. X_train, y_train, X_val, y_val, X_test = preprocessing(train, test)
  63. print(f'Shape of training data: {X_train.shape}')
  64. print(f'Shape training labels: {y_train.shape}')
  65. print(f'Shape of validation data: {X_val.shape}')
  66. print(f'Shape of valiation labels: {y_val.shape}')
  67. print(f'Shape of testing data: {X_test.shape}')
  68. # batch_size, epoch and iteration
  69. BATCH_SIZE = 100
  70. N_ITER = 2500
  71. EPOCHS = 5
  72. # I will be trainin the model on another 10 epochs to show flexibility of pytorch
  73. EXTRA_EPOCHS = 10
  74. # Pytorch train and test sets
  75. train_tensor = torch.utils.data.TensorDataset(X_train, y_train)
  76. val_tensor = torch.utils.data.TensorDataset(X_val, y_val)
  77. test_tensor = torch.utils.data.TensorDataset(X_test)
  78. # data loader
  79. train_loader = torch.utils.data.DataLoader(train_tensor,
  80. batch_size = BATCH_SIZE,
  81. shuffle = True)
  82. val_loader = torch.utils.data.DataLoader(val_tensor,
  83. batch_size = BATCH_SIZE,
  84. shuffle = False)
  85. test_loader = torch.utils.data.DataLoader(test_tensor,
  86. batch_size = BATCH_SIZE,
  87. shuffle = False)
  88. class CNNModel(nn.Module):
  89. def __init__(self):
  90. super(CNNModel, self).__init__()
  91. # convolution 1
  92. self.c1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=(5,5), stride=1, padding=0)
  93. self.relu1 = nn.ReLU()
  94. # maxpool 1
  95. self.maxpool1 = nn.MaxPool2d(kernel_size=(2,2))
  96. # dropout 1
  97. self.dropout1 = nn.Dropout(0.25)
  98. # convolution 2
  99. self.c2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=(3,3), stride=1, padding=0)
  100. self.relu2 = nn.ReLU()
  101. # maxpool 2
  102. self.maxpool2 = nn.MaxPool2d(kernel_size=(2,2))
  103. # dropout 2
  104. self.dropout2 = nn.Dropout(0.25)
  105. # linear 1
  106. self.fc1 = nn.Linear(32*5*5, 256)
  107. # dropout 3
  108. self.dropout3 = nn.Dropout(0.25)
  109. # linear 2
  110. self.fc2 = nn.Linear(256, 10)
  111. def forward(self, x):
  112. out = self.c1(x) # [BATCH_SIZE, 16, 24, 24]
  113. out = self.relu1(out)
  114. out = self.maxpool1(out) # [BATCH_SIZE, 16, 12, 12]
  115. out = self.dropout1(out)
  116. out = self.c2(out) # [BATCH_SIZE, 32, 10, 10]
  117. out = self.relu2(out)
  118. out = self.maxpool2(out) # [BATCH_SIZE, 32, 5, 5]
  119. out = self.dropout2(out)
  120. out = out.view(out.size(0), -1) # [BATCH_SIZE, 32*5*5=800]
  121. out = self.fc1(out) # [BATCH_SIZE, 256]
  122. out = self.dropout3(out)
  123. out = self.fc2(out) # [BATCH_SIZE, 10]
  124. return out
  125. # Create CNN
  126. model = CNNModel()
  127. # Optimizer
  128. optimizer = torch.optim.Adam(model.parameters(), lr=0.003)
  129. # Cross Entropy Loss
  130. criterion = nn.CrossEntropyLoss()
  131. # LR scheduler
  132. exp_lr_scheduler = lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.5)
  133. # On GPU if possible
  134. if torch.cuda.is_available():
  135. print("Model will be training on GPU")
  136. model = model.cuda()
  137. criterion = criterion.cuda()
  138. else:
  139. print("Model will be training on CPU")
  140. def fit(epoch):
  141. print("Training...")
  142. # Set model on training mode
  143. model.train()
  144. # Update lr parameter
  145. exp_lr_scheduler.step()
  146. # Initialize train loss and train accuracy
  147. train_running_loss = 0.0
  148. train_running_correct = 0
  149. train_running_lr = optimizer.param_groups[0]['lr']
  150. for batch_idx, (data, target) in enumerate(train_loader):
  151. data, target = Variable(data.view(BATCH_SIZE,1,144,192)), Variable(target)
  152. if torch.cuda.is_available():
  153. data = data.cuda()
  154. target = target.cuda()
  155. optimizer.zero_grad()
  156. output = model(data)
  157. loss = criterion(output, target)
  158. train_running_loss += loss.item()
  159. _, preds = torch.max(output.data, 1)
  160. train_running_correct += (preds == target).sum().item()
  161. loss.backward()
  162. optimizer.step()
  163. if (batch_idx + 1)% 50 == 0:
  164. print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
  165. epoch+1,
  166. (batch_idx + 1) * len(data),
  167. len(train_loader.dataset),
  168. BATCH_SIZE * (batch_idx + 1) / len(train_loader),
  169. loss.cpu().detach().numpy())
  170. )
  171. train_loss = train_running_loss/len(train_loader.dataset)
  172. train_accuracy = 100. * train_running_correct/len(train_loader.dataset)
  173. return train_loss, train_accuracy, train_running_lr
  174. def validate(data_loader):
  175. print("Validating...")
  176. # Set model on validating mode
  177. model.eval()
  178. val_preds = torch.LongTensor().cuda()
  179. val_proba = torch.LongTensor().cuda()
  180. # Initialize validation loss and validation accuracy
  181. val_running_loss = 0.0
  182. val_running_correct = 0
  183. for data, target in data_loader:
  184. # Regarding volatile argument, check the note below
  185. data, target = Variable(data.view(BATCH_SIZE,1,144,192), volatile=True), Variable(target)
  186. if torch.cuda.is_available():
  187. data = data.cuda()
  188. target = target.cuda()
  189. output = model(data)
  190. loss = criterion(output, target)
  191. val_running_loss += loss.item()
  192. pred = output.data.max(1, keepdim=True)[1]
  193. proba = torch.nn.functional.softmax(output.data)
  194. val_running_correct += pred.eq(target.data.view_as(pred)).cpu().sum()
  195. # Store val_predictions with probas for confusion matrix calculations & best errors made
  196. val_preds = torch.cat((val_preds.float(), pred), dim=0).float()
  197. val_proba = torch.cat((val_proba.float(), proba)).float()
  198. val_loss = val_running_loss/len(data_loader.dataset)
  199. val_accuracy = 100. * val_running_correct/len(data_loader.dataset)
  200. return val_loss, val_accuracy, val_preds, val_proba
  201. train_loss, train_accuracy = [], []
  202. val_loss, val_accuracy = [], []
  203. val_preds, val_proba = [], []
  204. train_lr = []
  205. for epoch in range(EPOCHS):
  206. print(f"Epoch {epoch+1} of {EPOCHS}\n")
  207. train_epoch_loss, train_epoch_accuracy, train_epoch_lr = fit(epoch)
  208. val_epoch_loss, val_epoch_accuracy, val_epoch_preds, val_epoch_proba = validate(val_loader)
  209. train_loss.append(train_epoch_loss)
  210. train_accuracy.append(train_epoch_accuracy)
  211. train_lr.append(train_epoch_lr)
  212. val_loss.append(val_epoch_loss)
  213. val_accuracy.append(val_epoch_accuracy)
  214. val_preds.append(val_epoch_preds)
  215. val_proba.append(val_epoch_proba)
  216. print(f"Train Loss: {train_epoch_loss:.4f}, Train Acc: {train_epoch_accuracy:.2f}")
  217. print(f'Val Loss: {val_epoch_loss:.4f}, Val Acc: {val_epoch_accuracy:.2f}\n')

However, it is returning the following error:

  1. Cell In[149], line 285
  2. 281 for epoch in range(EPOCHS):
  3. 283 print(f"Epoch {epoch+1} of {EPOCHS}\n")
  4. --> 285 train_epoch_loss, train_epoch_accuracy, train_epoch_lr = fit(epoch)
  5. 286 val_epoch_loss, val_epoch_accuracy, val_epoch_preds, val_epoch_proba = validate(val_loader)
  6. 288 train_loss.append(train_epoch_loss)
  7. Cell In[149], line 213, in fit(epoch)
  8. 210 target = target.cuda()
  9. 212 optimizer.zero_grad()
  10. --> 213 output = model(data)
  11. 214 loss = criterion(output, target)
  12. 216 train_running_loss += loss.item()
  13. File /opt/miniconda3/envs/mlgpu/lib/python3.9/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
  14. 1126 # If we don't have any hooks, we want to skip the rest of the logic in
  15. 1127 # this function, and just call forward.
  16. 1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
  17. 1129 or _global_forward_hooks or _global_forward_pre_hooks):
  18. -> 1130 return forward_call(*input, **kwargs)
  19. 1131 # Do not call functions when jit is used
  20. 1132 full_backward_hooks, non_full_backward_hooks = [], []
  21. Cell In[149], line 152, in CNNModel.forward(self, x)
  22. 150 def forward(self, x):
  23. --> 152 out = self.c1(x) # [BATCH_SIZE, 16, 24, 24]
  24. 153 out = self.relu1(out)
  25. 154 out = self.maxpool1(out) # [BATCH_SIZE, 16, 12, 12]
  26. File /opt/miniconda3/envs/mlgpu/lib/python3.9/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
  27. 1126 # If we don't have any hooks, we want to skip the rest of the logic in
  28. 1127 # this function, and just call forward.
  29. 1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
  30. 1129 or _global_forward_hooks or _global_forward_pre_hooks):
  31. -> 1130 return forward_call(*input, **kwargs)
  32. 1131 # Do not call functions when jit is used
  33. 1132 full_backward_hooks, non_full_backward_hooks = [], []
  34. File /opt/miniconda3/envs/mlgpu/lib/python3.9/site-packages/torch/nn/modules/conv.py:457, in Conv2d.forward(self, input)
  35. 456 def forward(self, input: Tensor) -> Tensor:
  36. --> 457 return self._conv_forward(input, self.weight, self.bias)
  37. File /opt/miniconda3/envs/mlgpu/lib/python3.9/site-packages/torch/nn/modules/conv.py:453, in Conv2d._conv_forward(self, input, weight, bias)
  38. 449 if self.padding_mode != 'zeros':
  39. 450 return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
  40. 451 weight, bias, self.stride,
  41. 452 _pair(0), self.dilation, self.groups)
  42. --> 453 return F.conv2d(input, weight, bias, self.stride,
  43. 454 self.padding, self.dilation, self.groups)
  44. RuntimeError: Input type (torch.cuda.DoubleTensor) and weight type (torch.cuda.FloatTensor) should be the same```

I already tried to typecast the output to float, torch.float, np.float32, but it still returned the same error. Moreover, I tried to change the type of the variable x on line 152, without results. How can I solve it?

答案1

得分: 1

看起来您的模型参数是Float类型,但您的数据是Double类型。我不确定您尝试将张量强制类型转换的具体方法,但以下方法应该有效:

  1. optimizer.zero_grad()
  2. output = model(data.float())

或者,您可以通过以下方法将模型参数转换为Double类型:

  1. # 创建CNN模型
  2. model = CNNModel()
  3. model.double()

尝试其中任何一种方法都应该解决张量类型不匹配的问题。

注意: 请不要像您在以下示例中那样使用Variable(tensor)

  1. for batch_idx, (data, target) in enumerate(train_loader):
  2. data, target = Variable(data.view(BATCH_SIZE, 1, 144, 192)), Variable(target)

以及在这里:

  1. for data, target in data_loader:
  2. # 关于volatile参数,请查看下面的注释
  3. data, target = Variable(data.view(BATCH_SIZE, 1, 144, 192), volatile=True), Variable(target)

Variable API 已被PyTorch弃用。

英文:

Looks like your model parameters are in Float but your data is in Double datatype. I'm not sure exactly how you attempted to cast your tensor but the following should work:

  1. optimizer.zero_grad()
  2. output = model(data.float())

Alternatively, you can convert the model parameters to Double by the following:

  1. # Create CNN
  2. model = CNNModel()
  3. model.double()

Try either of them and it should tackle the tensor type mismatch issue.

NOTE: Do not use Variable(tensor) as you did in the following:

  1. for batch_idx, (data, target) in enumerate(train_loader):
  2. data, target = Variable(data.view(BATCH_SIZE,1,144,192)), Variable(target)

as well as here:

  1. for data, target in data_loader:
  2. # Regarding volatile argument, check the note below
  3. data, target = Variable(data.view(BATCH_SIZE,1,144,192), volatile=True), Variable(target)

The Variable API has been deprecated by PyTorch.

huangapple
  • 本文由 发表于 2023年7月13日 21:04:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/76679690.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定