英文:
How to specify the batch dimension in a conv2D layer with pyTorch
问题
我有一个包含600x600灰度图像的数据集,通过数据加载器分成了50张图像一组。
我的神经网络具有16个滤波器的卷积层,随后是6x6的最大池化层,然后是一个全连接层。卷积层的输出应该是 out_channels*width*height/maxpool_kernel_W/maxpool_kernel_H = 16*600*600/6/6 = 160000
,再乘以批次大小,即50。
然而,当我尝试进行前向传播时,出现以下错误:RuntimeError: mat1 and mat2 shapes cannot be multiplied (80000x100 and 160000x1000)
。我已经验证数据的格式是正确的,即 [batch, n_channels, width, height]
(在我的情况下是 [50, 1, 600, 600]
)。
从逻辑上讲,输出应该是一个50x160000的矩阵,但显然它被格式化为一个80000x100的矩阵。似乎torch在错误的维度上进行了矩阵相乘。如果有人了解原因,请帮助我理解。
# 获取数据(使用假数据集生成器)
dataset = FakeData(size=500, image_size=(1, 600, 600), transform=ToTensor())
training_data, test_data = random_split(dataset, [400, 100])
train_dataloader = DataLoader(training_data, batch_size=50, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=50, shuffle=True)
net = nn.Sequential(
nn.Conv2d(
in_channels=1,
out_channels=16,
kernel_size=5,
padding=2,
),
nn.ReLU(),
nn.MaxPool2d(kernel_size=6),
nn.Linear(160000, 1000),
nn.ReLU(),
)
optimizer = optim.Adam(net.parameters(), lr=1e-3,)
epochs = 10
for i in range(epochs):
for (x, _) in train_dataloader:
optimizer.zero_grad()
# 确保数据的形状正确
print(x.shape) # 返回 torch.Size([50, 1, 600, 600])
# 错误发生在这里,在第一次前向传播时
output = net(x)
criterion = nn.MSELoss()
loss = criterion(output, x)
loss.backward()
optimizer.step()
英文:
I have a dataset of 600x600 grayscale images, grouped in batches of 50 images by a dataloader.
My network has a convolution layer with 16 filters, followed by Maxpooling with 6x6 kernels, and then a Dense layer. The output of the conv2D should be out_channels*width*height/maxpool_kernel_W/maxpool_kernel_H = 16*600*600/6/6 = 160000
, multiplied by the batch size, 50.
However when I try to do a forward pass I get the following error: RuntimeError: mat1 and mat2 shapes cannot be multiplied (80000x100 and 160000x1000)
. I verified that the data is formatted correctly as [batch,n_channels,width,height]
(so [50,1,600,600] in my case).
Logically the output should be a 50x160000 matrix, but apparently it is formatted as a 80000x100 matrix. It seems like torch is multiplying the matrices along the wrong dimensions. If anyone understands why, please help me understand too.
# get data (using a fake dataset generator)
dataset = FakeData(size=500, image_size= (1, 600, 600), transform=ToTensor())
training_data, test_data = random_split(dataset,[400,100])
train_dataloader = DataLoader(training_data, batch_size=50, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=50, shuffle=True)
net = nn.Sequential(
nn.Conv2d(
in_channels=1,
out_channels=16,
kernel_size=5,
padding=2,
),
nn.ReLU(),
nn.MaxPool2d(kernel_size=6),
nn.Linear(160000, 1000),
nn.ReLU(),
)
optimizer = optim.Adam(net.parameters(), lr=1e-3,)
epochs = 10
for i in range(epochs):
for (x, _) in train_dataloader:
optimizer.zero_grad()
# make sure the data is in the right shape
print(x.shape) # returns torch.Size([50, 1, 600, 600])
# error happens here, at the first forward pass
output = net(x)
criterion = nn.MSELoss()
loss = criterion(output, x)
loss.backward()
optimizer.step()
答案1
得分: 1
如果你逐层检查你的模型的推断层,你会注意到nn.MaxPool2d
返回一个形状为(50, 16, 100, 100)
的4D张量。有不同的方法来减少空间维度(展平、平均池化、最大池化)。例如,如果你想展平空间维度,这将导致一个形状为(50, 16*100*100)
的张量,即(50, 160_000)
,就像你期望的一样。在这种情况下,你需要使用nn.Flatten
层。
net = nn.Sequential(nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, padding=2),
nn.ReLU(),
nn.MaxPool2d(kernel_size=6),
nn.Flatten(),
nn.Linear(160000, 1000),
nn.ReLU())
英文:
If you inspect your model's inference layer by layer you would have noticed that the nn.MaxPool2d
returns a 4D tensor shaped (50, 16, 100, 100)
. There are different ways to reduce spatial dimensionality (flattening, average-pooling, max-pooling). For instance, if you want to flatten the spatial dimensions, this will result in a tensor of shape (50, 16*100*100)
, ie. (50, 160_000)
as you expected to have. This being said you are required to use a nn.Flatten
layer.
net = nn.Sequential(nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, padding=2),
nn.ReLU(),
nn.MaxPool2d(kernel_size=6),
nn.Flatten(),
nn.Linear(160000, 1000),
nn.ReLU())
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论