英文:
Pytorch: mat1 and mat2 shapes cannot be multiplied (64x8192 and 12800x10)
问题
I'm trying to implement a CNN for classifying numbers from images. This error shows when I'm trying to train my neural network.
RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x8192 and 12800x10)
And here is the architecture of my CNN. It has two hidden layers and has different kernel sizes and the number of kernels at each layer:
classnet = nn.Sequential(
nn.Conv2d(3, 16, kernel_size=5),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(16, 512, kernel_size=7),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Flatten(),
nn.Linear(512 * 5 * 5, 10),
nn.LogSoftmax(dim=1)
)
Here is the main class for CNN:
class ClassifierNeuralNet(nn.Module):
def __init__(self, classnet):
super(ClassifierNeuralNet, self).__init__()
# We provide a sequential module with layers and activations
self.classnet = classnet
# The loss function (the negative log-likelihood)
self.nll = nn.NLLLoss(reduction="none") # it requires log-softmax as input!!
# This function classifies an image x to a class.
# The output must be a class label (long).
def classify(self, x):
# using classnet to perform a forward pass on the image
out = self.classnet(x)
# using argmax to gain the class with the maximum probability
y_pred = out.argmax(dim=1)
return y_pred
# This function is crucial for a module in PyTorch.
# In our framework, this class outputs a value of the loss function.
def forward(self, x, y, reduction="avg"):
# using classnet to perform a forward pass on the image
out = self.classnet(x)
print(out.shape)
# passing the result of the forward pass to the NLL loss function
loss = self.nll(out, y)
# return the result based on the reduction parameter
if reduction == "sum":
return loss.sum()
else:
return loss.mean()
The main class worked on another task (recognize numbers from 8x8 images), and now I'm trying to apply it to a larger CNN that recognizes digits from 32x32 images.
Here are my "old" architecture and running/evaluating section:
names = ["classifier_mlp", "classifier_cnn"]
# loop over models
for name in names:
print("-> START {}".format(name))
# Create a folder (REMEMBER: You must mount your drive if you use Colab!)
if name == "classifier_mlp":
name = name + "_M_" + str(M)
elif name == "classifier_cnn":
name = name + "_M_" + str(M) + "_kernels_" + str(num_kernels)
# Create a folder if necessary
result_dir = os.path.join(results_dir, "results", name + "/")
# =========
# MAKE SURE THAT "result_dir" IS A PATH TO A LOCAL FOLDER OR A GOOGLE COLAB FOLDER (DEFINED IN CELL 3)
result_dir = "./" # (current folder)
# =========
if not (os.path.exists(result_dir)):
os.mkdir(result_dir)
# MLP
if name[0:14] == "classifier_mlp":
classnet = nn.Sequential(
nn.Linear(D, M),
nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(M, M),
nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(M, K),
nn.LogSoftmax(dim=1))
# You are asked here to propose your architecture
# NOTE: Please remember that the output must be LogSoftmax!
# ------
pass
# CNN
elif name[0:14] == "classifier_cnn":
classnet = nn.Sequential(
Reshape(size=(1, 8, 8)),
nn.Conv2d(in_channels=1, out_channels=num_kernels, kernel_size=3),
nn.ReLU(),
nn.Conv2d(in_channels=num_kernels, out_channels=num_kernels*2, kernel_size=3),
nn.ReLU(),
Flatten(),
nn.Linear(num_kernels*2*4*4, M),
nn.ReLU(),
nn.Linear(M, K),
nn.LogSoftmax(dim=1)
)
pass
# Init ClassifierNN
model = ClassifierNeuralNet(classnet)
# Init OPTIMIZER (here we use ADAMAX)
optimizer = torch.optim.Adamax(
,
lr=lr,
weight_decay=wd,
)
# Training procedure
nll_val, error_val = training(
name=result_dir + name,
max_patience=max_patience,
num_epochs=num_epochs,
model=model,
optimizer=optimizer,
training_loader=training_loader,
val_loader=val_loader,
)
# The final evaluation (on the test set)
test_loss, test_error = evaluation(name=result_dir + name, test_loader=test_loader)
# write the results to a file
f = open(result_dir + name + "_test_loss.txt", "w")
f.write("NLL: " + str(test_loss) + "\nCE: " + str(test_error))
f.close()
# create curves
plot_curve(
result_dir + name,
nll_val,
file_name="_nll_val_curve.pdf",
ylabel="nll",
test_eval=test_loss,
)
plot_curve(
result_dir + name,
error_val,
file_name="_ca_val_curve.pdf",
ylabel="ce",
color="r-",
test_eval=test_error,
)
I hope this helps! Let me know if you have any specific questions or if you need further assistance.
英文:
I'm trying to implement a CNN for classifying numbers from images. This error shows when I'm trying to train my neural network.
RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x8192 and 12800x10)
And here is the architecture of my CNN. It has two hidden layers and has different kernel sizes and number of kernels at each layer:
classnet = nn.Sequential(
nn.Conv2d(3, 16, kernel_size=5),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(16, 512, kernel_size=7),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Flatten(),
nn.Linear(512 * 5 * 5, 10),
nn.LogSoftmax(dim=1)
)
Here is the main class for CNN:
class ClassifierNeuralNet(nn.Module):
def __init__(self, classnet):
super(ClassifierNeuralNet, self).__init__()
# We provide a sequential module with layers and activations
self.classnet = classnet
# The loss function (the negative log-likelihood)
self.nll = nn.NLLLoss(reduction="none") # it requires log-softmax as input!!
# This function classifies an image x to a class.
# The output must be a class label (long).
def classify(self, x):
# using classnet to perform a forward pass on the image
out = self.classnet(x)
# using argmax to gain the class with maximum probability
y_pred = out.argmax(dim=1)
return y_pred
# This function is crucial for a module in PyTorch.
# In our framework, this class outputs a value of the loss function.
def forward(self, x, y, reduction="avg"):
# using classnet to perform a forward pass on the image
out = self.classnet(x)
print(out.shape)
# passing the result of forward pass to nll loss function
loss = self.nll(out,y)
# return the result base on the reduction parameter
if reduction == "sum":
return loss.sum()
else:
return loss.mean()
The main class worked on another task(recognize numbers from 8x8 images) and now I'm trying to apply it on a larger CNN that recognize digits from 32x32 images.
Here are my "old" architecture and running/evaluating section:
names = ["classifier_mlp", "classifier_cnn"]
# loop over models
for name in names:
print("\n-> START {}".format(name))
# Create a folder (REMEMBER: You must mount your drive if you use Colab!)
if name == "classifier_mlp":
name = name + "_M_" + str(M)
elif name == "classifier_cnn":
name = name + "_M_" + str(M) + "_kernels_" + str(num_kernels)
# Create a folder if necessary
result_dir = os.path.join(results_dir, "results", name + "/")
# =========
# MAKE SURE THAT "result_dir" IS A PATH TO A LOCAL FOLDER OR A GOOGLE COLAB FOLDER (DEFINED IN CELL 3)
result_dir = "./" # (current folder)
# =========
if not (os.path.exists(result_dir)):
os.mkdir(result_dir)
# MLP
if name[0:14] == "classifier_mlp":
classnet = nn.Sequential(
nn.Linear(D, M),
nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(M, M),
nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(M, K),
nn.LogSoftmax(dim=1))
# You are asked here to propose your own architecture
# NOTE: Please remember that the output must be LogSoftmax!
# ------
pass
# CNN
elif name[0:14] == "classifier_cnn":
classnet = nn.Sequential(
Reshape(size=(1, 8, 8)),
nn.Conv2d(in_channels=1, out_channels=num_kernels, kernel_size=3),
nn.ReLU(),
nn.Conv2d(in_channels=num_kernels, out_channels=num_kernels*2, kernel_size=3),
nn.ReLU(),
Flatten(),
nn.Linear(num_kernels*2*4*4, M),
nn.ReLU(),
nn.Linear(M, K),
nn.LogSoftmax(dim=1)
)
pass
# Init ClassifierNN
model = ClassifierNeuralNet(classnet)
# Init OPTIMIZER (here we use ADAMAX)
optimizer = torch.optim.Adamax(
,
lr=lr,
weight_decay=wd,
)
# Training procedure
nll_val, error_val = training(
name=result_dir + name,
max_patience=max_patience,
num_epochs=num_epochs,
model=model,
optimizer=optimizer,
training_loader=training_loader,
val_loader=val_loader,
)
# The final evaluation (on the test set)
test_loss, test_error = evaluation(name=result_dir + name, test_loader=test_loader)
# write the results to a file
f = open(result_dir + name + "_test_loss.txt", "w")
f.write("NLL: " + str(test_loss) + "\nCE: " + str(test_error))
f.close()
# create curves
plot_curve(
result_dir + name,
nll_val,
file_name="_nll_val_curve.pdf",
ylabel="nll",
test_eval=test_loss,
)
plot_curve(
result_dir + name,
error_val,
file_name="_ca_val_curve.pdf",
ylabel="ce",
color="r-",
test_eval=test_error,
)
Full code here:
# PLEASE DO NOT REMOVE!
# Here are two auxiliary functions that can be used for a convolutional NN (CNN).
# This module reshapes an input (matrix -> tensor).
class Reshape(nn.Module):
def __init__(self, size):
super(Reshape, self).__init__()
self.size = size # a list
def forward(self, x):
assert x.shape[1] == np.prod(self.size)
return x.view(x.shape[0], *self.size)
# This module flattens an input (tensor -> matrix) by blending dimensions
# beyond the batch size.
class Flatten(nn.Module):
def __init__(self):
super(Flatten, self).__init__()
def forward(self, x):
return x.view(x.shape[0], -1)
# =========
# GRADING:
# 0
# 0.5 pt if code works but it is explained badly
# 1.0 pt if code works and it is explained well
# =========
# Implement a neural network (NN) classifier.
class ClassifierNeuralNet(nn.Module):
def __init__(self, classnet):
super(ClassifierNeuralNet, self).__init__()
# We provide a sequential module with layers and activations
self.classnet = classnet
# The loss function (the negative log-likelihood)
self.nll = nn.NLLLoss(reduction="none") # it requires log-softmax as input!!
# This function classifies an image x to a class.
# The output must be a class label (long).
def classify(self, x):
# using classnet to perform a forward pass on the image
out = self.classnet(x)
# using argmax to gain the class with maximum probability
y_pred = out.argmax(dim=1)
return y_pred
# This function is crucial for a module in PyTorch.
# In our framework, this class outputs a value of the loss function.
def forward(self, x, y, reduction="avg"):
# using classnet to perform a forward pass on the image
out = self.classnet(x)
print(out.shape)
# passing the result of forward pass to nll loss function
loss = self.nll(out,y)
# return the result base on the reduction parameter
if reduction == "sum":
return loss.sum()
else:
return loss.mean()
# Initialize hyperparameters
# Hyperparameters
# -> data hyperparams
D = 3072 # input dimension
# -> model hyperparams
M = 256 # the number of neurons in scale (s) and translation (t) nets
K = 10 # the number of labels
# -> training hyperparams
lr = 1e-3 # learning rate
wd = 1e-5 # weight decay
num_epochs = 1000 # max. number of epochs
max_patience = 20 # an early stopping is used, if training doesn't improve for longer than 20 epochs, it is stopped
name = 'New_CNN' + "_M_" + str(M) + "_kernels_"
# Create a folder if necessary
result_dir = os.path.join(results_dir, "results", name + "/")
# =========
# MAKE SURE THAT "result_dir" IS A PATH TO A LOCAL FOLDER OR A GOOGLE COLAB FOLDER (DEFINED IN CELL 3)
result_dir = "./" # (current folder)
# =========
if not (os.path.exists(result_dir)):
os.mkdir(result_dir)
classnet = nn.Sequential(
nn.Conv2d(3, 16, kernel_size=5),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(16, 512, kernel_size=7),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Flatten(),
nn.Linear(512 * 5 * 5, 10),
nn.LogSoftmax(dim=1)
)
# Init ClassifierNN
model = ClassifierNeuralNet(classnet)
# Init OPTIMIZER (here we use ADAMAX)
optimizer = torch.optim.Adamax(
,
lr=lr,
weight_decay=wd,
)
# Training procedure
nll_val, error_val = training(
name=result_dir + name,
max_patience=max_patience,
num_epochs=num_epochs,
model=model,
optimizer=optimizer,
training_loader=training_loader,
val_loader=val_loader,
)
# The final evaluation (on the test set)
test_loss, test_error = evaluation(name=result_dir + name, test_loader=test_loader)
# write the results to a file
f = open(result_dir + name + "_test_loss.txt", "w")
f.write("NLL: " + str(test_loss) + "\nCE: " + str(test_error))
f.close()
# create curves
plot_curve(
result_dir + name,
nll_val,
file_name="_nll_val_curve.pdf",
ylabel="nll",
test_eval=test_loss,
)
plot_curve(
result_dir + name,
error_val,
file_name="_ca_val_curve.pdf",
ylabel="ce",
color="r-",
test_eval=test_error,
)
And this is the full error message:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-25-4846604cd120> in <cell line: 36>()
34
35 # Training procedure
---> 36 nll_val, error_val = training(
37 name=result_dir + name,
38 max_patience=max_patience,
5 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py in forward(self, input)
112
113 def forward(self, input: Tensor) -> Tensor:
--> 114 return F.linear(input, self.weight, self.bias)
115
116 def extra_repr(self) -> str:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x8192 and 12800x10)
答案1
得分: 2
以下是翻译好的部分:
"错误在最后一行指定:
RuntimeError: mat1 和 mat2 的形状不能相乘(64x8192 和 12800x10)
你的问题出现在你的 classnet Sequential 模型的最后两行:
nn.Flatten(),
nn.Linear(512 * 5 * 5, 10),
[512 * 5 * 5, 10]
实际上就是你在错误消息中得到的 12800x10
。而 64x8192
是你的 nn.Flatten()
的输出形状。
我可以通过提供以下输入来重现相同的错误:
input = torch.ones([64,3,32,32])
output = classnet(input)
这将给我们带来以下错误:
RuntimeError Traceback (most recent call last)
<ipython-input-11-401645f0193a> in <cell line: 2>()
1 input = torch.ones([64,3,32,32])
----> 2 output = classnet(input)
3 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py in forward(self, input)
112
113 def forward(self, input: Tensor) -> Tensor:
--> 114 return F.linear(input, self.weight, self.bias)
115
116 def extra_repr(self) -> str:
RuntimeError: mat1 和 mat2 的形状不能相乘(64x8192 和 12800x10)
将你的最后两层更改为以下内容:
nn.Flatten(),
nn.Linear(8192, 10),
将解决这个问题。"
英文:
The error is specified in the last line :
RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x8192 and 12800x10)
Your issue is in the last two lines of your classnet Sequential model
nn.Flatten(),
nn.Linear(512 * 5 * 5, 10),
[512 * 5 * 5, 10]
is just the 12800x10
you're getting in your error message. and 64x8192
is the output shape of your nn.Flatten()
.
I can reproduce the same error by giving the following input :
input = torch.ones([64,3,32,32])
output = classnet(input)
Which will give us :
RuntimeError Traceback (most recent call last)
<ipython-input-11-401645f0193a> in <cell line: 2>()
1 input = torch.ones([64,3,32,32])
----> 2 output = classnet(input)
3 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py in forward(self, input)
112
113 def forward(self, input: Tensor) -> Tensor:
--> 114 return F.linear(input, self.weight, self.bias)
115
116 def extra_repr(self) -> str:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x8192 and 12800x10)
Changing your last two layers to the following :
nn.Flatten(),
nn.Linear(8192, 10),
Will resolve the issue.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论