如何在PyTorch的conv2D层中指定批处理维度

huangapple go评论45阅读模式
英文:

How to specify the batch dimension in a conv2D layer with pyTorch

问题

我有一个包含600x600灰度图像的数据集,通过数据加载器分成了50张图像一组。

我的神经网络具有16个滤波器的卷积层,随后是6x6的最大池化层,然后是一个全连接层。卷积层的输出应该是 out_channels*width*height/maxpool_kernel_W/maxpool_kernel_H = 16*600*600/6/6 = 160000,再乘以批次大小,即50。

然而,当我尝试进行前向传播时,出现以下错误:RuntimeError: mat1 and mat2 shapes cannot be multiplied (80000x100 and 160000x1000)。我已经验证数据的格式是正确的,即 [batch, n_channels, width, height](在我的情况下是 [50, 1, 600, 600])。

从逻辑上讲,输出应该是一个50x160000的矩阵,但显然它被格式化为一个80000x100的矩阵。似乎torch在错误的维度上进行了矩阵相乘。如果有人了解原因,请帮助我理解。

# 获取数据(使用假数据集生成器)
dataset = FakeData(size=500, image_size=(1, 600, 600), transform=ToTensor())
training_data, test_data = random_split(dataset, [400, 100])
train_dataloader = DataLoader(training_data, batch_size=50, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=50, shuffle=True)

net = nn.Sequential(
    nn.Conv2d(
        in_channels=1,
        out_channels=16,
        kernel_size=5,
        padding=2,
    ),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=6),
    nn.Linear(160000, 1000),
    nn.ReLU(),
)

optimizer = optim.Adam(net.parameters(), lr=1e-3,)

epochs = 10
for i in range(epochs):
    for (x, _) in train_dataloader:
        optimizer.zero_grad()

        # 确保数据的形状正确
        print(x.shape)  # 返回 torch.Size([50, 1, 600, 600])

        # 错误发生在这里,在第一次前向传播时
        output = net(x)

        criterion = nn.MSELoss()
        loss = criterion(output, x)
        loss.backward()
        optimizer.step()
英文:

I have a dataset of 600x600 grayscale images, grouped in batches of 50 images by a dataloader.

My network has a convolution layer with 16 filters, followed by Maxpooling with 6x6 kernels, and then a Dense layer. The output of the conv2D should be out_channels*width*height/maxpool_kernel_W/maxpool_kernel_H = 16*600*600/6/6 = 160000, multiplied by the batch size, 50.

However when I try to do a forward pass I get the following error: RuntimeError: mat1 and mat2 shapes cannot be multiplied (80000x100 and 160000x1000). I verified that the data is formatted correctly as [batch,n_channels,width,height] (so [50,1,600,600] in my case).

Logically the output should be a 50x160000 matrix, but apparently it is formatted as a 80000x100 matrix. It seems like torch is multiplying the matrices along the wrong dimensions. If anyone understands why, please help me understand too.

# get data (using a fake dataset generator)
dataset = FakeData(size=500, image_size= (1, 600, 600), transform=ToTensor())
training_data, test_data = random_split(dataset,[400,100])
train_dataloader = DataLoader(training_data, batch_size=50, shuffle=True)
test_dataloader  = DataLoader(test_data, batch_size=50, shuffle=True)

net = nn.Sequential(
    nn.Conv2d(
                in_channels=1,              
                out_channels=16,            
                kernel_size=5,                     
                padding=2,           
            ),
    nn.ReLU(),  
    nn.MaxPool2d(kernel_size=6),
    nn.Linear(160000, 1000),
    nn.ReLU(),
)

optimizer = optim.Adam(net.parameters(), lr=1e-3,)

epochs = 10
for i in range(epochs):
    for (x, _) in train_dataloader:
        optimizer.zero_grad()

        # make sure the data is in the right shape
        print(x.shape) # returns torch.Size([50, 1, 600, 600])

        # error happens here, at the first forward pass
        output = net(x)

        criterion = nn.MSELoss()
        loss = criterion(output, x)
        loss.backward()
        optimizer.step()

答案1

得分: 1

如果你逐层检查你的模型的推断层,你会注意到nn.MaxPool2d返回一个形状为(50, 16, 100, 100)的4D张量。有不同的方法来减少空间维度(展平、平均池化、最大池化)。例如,如果你想展平空间维度,这将导致一个形状为(50, 16*100*100)的张量,即(50, 160_000),就像你期望的一样。在这种情况下,你需要使用nn.Flatten层。

net = nn.Sequential(nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, padding=2),
                    nn.ReLU(),  
                    nn.MaxPool2d(kernel_size=6),
                    nn.Flatten(),
                    nn.Linear(160000, 1000),
                    nn.ReLU())
英文:

If you inspect your model's inference layer by layer you would have noticed that the nn.MaxPool2d returns a 4D tensor shaped (50, 16, 100, 100). There are different ways to reduce spatial dimensionality (flattening, average-pooling, max-pooling). For instance, if you want to flatten the spatial dimensions, this will result in a tensor of shape (50, 16*100*100), ie. (50, 160_000) as you expected to have. This being said you are required to use a nn.Flatten layer.

net = nn.Sequential(nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, padding=2),
                    nn.ReLU(),  
                    nn.MaxPool2d(kernel_size=6),
                    nn.Flatten(),
                    nn.Linear(160000, 1000),
                    nn.ReLU())

huangapple
  • 本文由 发表于 2023年2月6日 12:19:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/75357298.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定