问题

我有一个包含600x600灰度图像的数据集，通过数据加载器分成了50张图像一组。

我的神经网络具有16个滤波器的卷积层，随后是6x6的最大池化层，然后是一个全连接层。卷积层的输出应该是 out_channels*width*height/maxpool_kernel_W/maxpool_kernel_H = 16*600*600/6/6 = 160000，再乘以批次大小，即50。

然而，当我尝试进行前向传播时，出现以下错误：RuntimeError: mat1 and mat2 shapes cannot be multiplied (80000x100 and 160000x1000)。我已经验证数据的格式是正确的，即 [batch, n_channels, width, height]（在我的情况下是 [50, 1, 600, 600]）。

从逻辑上讲，输出应该是一个50x160000的矩阵，但显然它被格式化为一个80000x100的矩阵。似乎torch在错误的维度上进行了矩阵相乘。如果有人了解原因，请帮助我理解。

# 获取数据（使用假数据集生成器）
dataset = FakeData(size=500, image_size=(1, 600, 600), transform=ToTensor())
training_data, test_data = random_split(dataset, [400, 100])
train_dataloader = DataLoader(training_data, batch_size=50, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=50, shuffle=True)

net = nn.Sequential(
    nn.Conv2d(
        in_channels=1,
        out_channels=16,
        kernel_size=5,
        padding=2,
    ),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=6),
    nn.Linear(160000, 1000),
    nn.ReLU(),
)

optimizer = optim.Adam(net.parameters(), lr=1e-3,)

epochs = 10
for i in range(epochs):
    for (x, _) in train_dataloader:
        optimizer.zero_grad()

        # 确保数据的形状正确
        print(x.shape)  # 返回 torch.Size([50, 1, 600, 600])

        # 错误发生在这里，在第一次前向传播时
        output = net(x)

        criterion = nn.MSELoss()
        loss = criterion(output, x)
        loss.backward()
        optimizer.step()

英文:

I have a dataset of 600x600 grayscale images, grouped in batches of 50 images by a dataloader.

My network has a convolution layer with 16 filters, followed by Maxpooling with 6x6 kernels, and then a Dense layer. The output of the conv2D should be out_channels*width*height/maxpool_kernel_W/maxpool_kernel_H = 16*600*600/6/6 = 160000, multiplied by the batch size, 50.

However when I try to do a forward pass I get the following error: RuntimeError: mat1 and mat2 shapes cannot be multiplied (80000x100 and 160000x1000). I verified that the data is formatted correctly as [batch,n_channels,width,height] (so [50,1,600,600] in my case).

Logically the output should be a 50x160000 matrix, but apparently it is formatted as a 80000x100 matrix. It seems like torch is multiplying the matrices along the wrong dimensions. If anyone understands why, please help me understand too.

# get data (using a fake dataset generator)
dataset = FakeData(size=500, image_size= (1, 600, 600), transform=ToTensor())
training_data, test_data = random_split(dataset,[400,100])
train_dataloader = DataLoader(training_data, batch_size=50, shuffle=True)
test_dataloader  = DataLoader(test_data, batch_size=50, shuffle=True)

net = nn.Sequential(
    nn.Conv2d(
                in_channels=1,              
                out_channels=16,            
                kernel_size=5,                     
                padding=2,           
            ),
    nn.ReLU(),  
    nn.MaxPool2d(kernel_size=6),
    nn.Linear(160000, 1000),
    nn.ReLU(),
)

optimizer = optim.Adam(net.parameters(), lr=1e-3,)

epochs = 10
for i in range(epochs):
    for (x, _) in train_dataloader:
        optimizer.zero_grad()

        # make sure the data is in the right shape
        print(x.shape) # returns torch.Size([50, 1, 600, 600])

        # error happens here, at the first forward pass
        output = net(x)

        criterion = nn.MSELoss()
        loss = criterion(output, x)
        loss.backward()
        optimizer.step()

答案1

得分: 1

如果你逐层检查你的模型的推断层，你会注意到nn.MaxPool2d返回一个形状为(50, 16, 100, 100)的4D张量。有不同的方法来减少空间维度（展平、平均池化、最大池化）。例如，如果你想展平空间维度，这将导致一个形状为(50, 16*100*100)的张量，即(50, 160_000)，就像你期望的一样。在这种情况下，你需要使用nn.Flatten层。

net = nn.Sequential(nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, padding=2),
                    nn.ReLU(),  
                    nn.MaxPool2d(kernel_size=6),
                    nn.Flatten(),
                    nn.Linear(160000, 1000),
                    nn.ReLU())

英文:

If you inspect your model's inference layer by layer you would have noticed that the nn.MaxPool2d returns a 4D tensor shaped (50, 16, 100, 100). There are different ways to reduce spatial dimensionality (flattening, average-pooling, max-pooling). For instance, if you want to flatten the spatial dimensions, this will result in a tensor of shape (50, 16*100*100), ie. (50, 160_000) as you expected to have. This being said you are required to use a nn.Flatten layer.

net = nn.Sequential(nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, padding=2),
                    nn.ReLU(),  
                    nn.MaxPool2d(kernel_size=6),
                    nn.Flatten(),
                    nn.Linear(160000, 1000),
                    nn.ReLU())

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在PyTorch的conv2D层中指定批处理维度

问题

答案1

Pytorch自定义数据集具有多种返回数据类型。

评估集在PyTorch Hugging Face中为什么会占用内存？

PyTorch在Windows 11上无法安装在Python 3.11上。

平方根的负数 pytorch

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论