如何修复卷积神经网络输出数据与目标之间轻微不匹配的维度?

huangapple go评论115阅读模式
英文:

How to Fix Slight Mismatch Between Dimensions of CNN Output Data and the Target?

问题

我正在尝试创建一个UNet卷积神经网络的版本,它将以一种特定类型的MRI图像体积作为源数据,并使用相应的MRI图像体积作为目标数据。

经过一些尝试和错误之后,我仍然发现CNN的输出大小与目标数据的尺寸之间存在小的不匹配。CNN的输出是208x224x160,但源/目标数据都是210x224x160。这在计算损失时会导致运行时错误。奇怪的是,当我输入随机生成的数据时,维度不匹配不会发生,输出的维度与输入相同。

可能导致此错误的原因是什么,我应该如何解决它?

以下是代码部分:

  1. # 你的代码...

我尝试过简化架构,但仍然出现维度错误,而且错误还更大。当我没有维度错误时,我会遇到内存不足的错误。我还尝试验证了程序不同阶段的数据维度,尽管随机生成的数据在输入和输出之间没有不匹配,但一旦将我的MRI数据输入网络,就会出现不匹配。

英文:

I am trying to create a version of the UNet CNN which will take in a certain type of MRI image volume as the source and use corresponding MRI image volume as the target.

After quite a bit of trial and error I am still getting a small mismatch between the size of the CNN's output and the dimensions of the target. The CNN output is 208x224x160, but the source/target data are both 210x224x160. This causes a runtime error during the calculation of the loss. What's strange is that the dimension mismatch doesn't occur when I put in randomly generated data, the output has the same dimensions as the input.

What could be causing this error and how should I go about fixing it?

Here is the code:

  1. import nibabel as nib
  2. import os
  3. import torch
  4. from torch.utils.data import Dataset, DataLoader
  5. import numpy as np
  6. import matplotlib.pyplot as plt
  7. import torch.nn as nn
  8. import torch.nn.functional as F
  9. # function using nibabel to load a single volume from the disk
  10. def load_Volume(filepath):
  11. img = nib.load(filepath)
  12. data = img.get_fdata()
  13. return data
  14. def preprocess_mri_data(data):
  15. # Normalize the data, other pre-processing can be added
  16. mean = np.mean(data)
  17. std = np.std(data)
  18. data = (data - mean) / std
  19. return data
  20. # Dataset class to use with the data loader. Pairs sources with targets.
  21. class MRISource_Target(Dataset):
  22. def __init__(self, source_dir, target_dir, transform=None):
  23. self.source_dir = source_dir
  24. self.target_dir = target_dir
  25. self.source_filenames = os.listdir(source_dir)
  26. self.target_filenames = os.listdir(target_dir)
  27. self.transform = transform
  28. def __len__(self):
  29. return len(self.source_filenames)
  30. def __getitem__(self, idx):
  31. source_filepath = os.path.join(self.source_dir, self.source_filenames[idx])
  32. target_filepath = os.path.join(self.target_dir, self.target_filenames[idx])
  33. source_data = load_Volume(source_filepath)
  34. target_data = load_Volume(target_filepath)
  35. source_data = preprocess_mri_data(source_data)
  36. target_data = preprocess_mri_data(target_data)
  37. if self.transform:
  38. source_data = self.transform(source_data)
  39. target_data = self.transform(target_data)
  40. return {'source': source_data, 'target': target_data}
  41. # directories for the training and testing data
  42. train_source_dir = '/content/drive/MyDrive/qsmData/Train/Source'
  43. train_target_dir = '/content/drive/MyDrive/qsmData/Train/Target/'
  44. test_source_dir = '/content/drive/MyDrive/qsmData/Test/Source/'
  45. test_target_dir = '/content/drive/MyDrive/qsmData/Test/Target/'
  46. # create the paired datasets
  47. train_dataset = MRISource_Target(train_source_dir, train_target_dir)
  48. test_dataset = MRISource_Target(test_source_dir, test_target_dir)
  49. # make the datasets iteratable for training
  50. train_loader = DataLoader(train_dataset, batch_size=1, shuffle=True)
  51. test_loader = DataLoader(test_dataset, batch_size=1, shuffle=False)
  52. # visualize an arbitrary slice
  53. def plot_mri_slice(volume, slice_num):
  54. plt.imshow(volume[:, :, slice_num], cmap='gray')
  55. plt.axis('off')
  56. plt.show()
  57. import torch
  58. import torch.nn as nn
  59. # Define the U-Net architecture
  60. class UNet(nn.Module):
  61. def __init__(self, input_channels, output_channels):
  62. super(UNet, self).__init__()
  63. self.encoder = nn.Sequential(
  64. nn.Conv3d(input_channels, 32, kernel_size=3, padding=1),
  65. nn.ReLU(inplace=True),
  66. nn.Conv3d(32, 64, kernel_size=3, padding=1),
  67. nn.ReLU(inplace=True),
  68. nn.MaxPool3d(kernel_size=2, stride=2)
  69. )
  70. self.middle = nn.Sequential(
  71. nn.Conv3d(64, 128, kernel_size=3, padding=1),
  72. nn.ReLU(inplace=True),
  73. nn.Conv3d(128, 128, kernel_size=3, padding=1),
  74. nn.ReLU(inplace=True),
  75. nn.MaxPool3d(kernel_size=2, stride=2)
  76. )
  77. self.decoder = nn.Sequential(
  78. nn.ConvTranspose3d(128, 64, kernel_size=2, stride=2),
  79. nn.ReLU(inplace=True),
  80. nn.ConvTranspose3d(64, 32, kernel_size=2, stride=2),
  81. nn.ReLU(inplace=True),
  82. nn.Conv3d(32, output_channels, kernel_size=3,padding=1),
  83. #nn.Tanh() # Assuming magnetic susceptibility values are in a specific range
  84. )
  85. def forward(self, x):
  86. x1 = self.encoder(x)
  87. x2 = self.middle(x1)
  88. x3 = self.decoder(x2)
  89. return x3
  90. # Example usage:
  91. batch_size = 1
  92. input_channels = 1 # Number of input channels (MRI phase)
  93. output_channels = 1 # Number of output channels (Magnetic susceptibility)
  94. depth = 64 # Updated depth to match cropped data
  95. height = 64
  96. width = 64
  97. # Create the U-Net model
  98. generator = UNet(input_channels, output_channels)
  99. # Example input data
  100. input_data = torch.randn(batch_size, input_channels, depth, height, width)
  101. # Generate output
  102. output = generator(input_data)
  103. # Print the generated output shape
  104. print("Generated Output Shape:", output.shape)
  105. import nibabel as nib
  106. def get_data_dimensions(filepath):
  107. img = nib.load(filepath)
  108. data = img.get_fdata()
  109. return data.shape
  110. source_filepath = '/content/drive/MyDrive/qsmData/Train/Source/normPhaseSubj1.nii'
  111. target_filepath = '/content/drive/MyDrive/qsmData/Train/Target/cosmos1.nii.gz'
  112. source_dimensions = get_data_dimensions(source_filepath)
  113. target_dimensions = get_data_dimensions(target_filepath)
  114. print("Source data dimensions:", source_dimensions)
  115. print("Target data dimensions:", target_dimensions)
  116. # Define the loss function and optimizer
  117. criterion = nn.MSELoss(reduce=None)
  118. optimizer = torch.optim.Adam(generator.parameters(), lr=0.001)
  119. # Move the model to the device (CPU or GPU)
  120. device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
  121. generator.to(device)
  122. num_epochs = 5
  123. print_interval = 10
  124. for epoch in range(num_epochs):
  125. generator.train()
  126. running_loss = 0.0
  127. for i, batch in enumerate(train_loader, 1): # Enumerate to track batch index
  128. source_data = batch['source'].to(device).unsqueeze(1).float() # Add the channel dimension
  129. target_data = batch['target'].to(device).unsqueeze(1).float() # Add the channel dimension
  130. # Zero the parameter gradients
  131. optimizer.zero_grad()
  132. # Forward pass
  133. outputs = generator(source_data)
  134. print(outputs.shape)
  135. print("Target shape:", target_data.shape)
  136. # Compute loss
  137. loss = criterion(outputs, target_data)
  138. # Backpropagation and optimization
  139. loss.backward()
  140. optimizer.step()
  141. running_loss += loss.item()
  142. # Print average loss for the epoch
  143. if i % print_interval == 0:
  144. avg_loss = running_loss / print_interval
  145. print(f'Epoch [{epoch + 1}/{num_epochs}], Batch [{i}/{len(train_loader)}], Loss: {avg_loss:.4f}')
  146. running_loss = 0.0
  147. predictions = []
  148. generator.eval() # Set the model to evaluation mode
  149. with torch.no_grad():
  150. for batch in test_loader:
  151. source_patches = batch['source'].to(device).unsqueeze(1).float() # Add the channel dimension
  152. # Forward pass and get the predictions
  153. outputs = generator(source_patches)
  154. # Store the predictions in the list
  155. predictions.append(outputs.cpu().squeeze().numpy())

I tried making a simpler architecture and still got dimension errors, in fact they were even larger. When I wasn't getting dimension errors I would just get out of memory errors. I also have tried verifying the dimensions of the data throughout different stages of the program, and even though the randomly generated data doesn't have a mismatch between input and output, my MRI data still does once it's put through the network.

答案1

得分: 0

因为下采样维度不等于上采样维度。

根据您提供的数据,我认为这是因为您在解码器层中使用的strides大于编码器层。

将架构修改为以下代码并查看是否有效:

  1. def __init__(self, input_channels, output_channels):
  2. super(UNet, self).__init__()
  3. self.encoder = nn.Sequential(
  4. nn.Conv3d(input_channels, 32, kernel_size=3, padding=1),
  5. nn.ReLU(inplace=True),
  6. nn.Conv3d(32, 64, kernel_size=3, padding=1),
  7. nn.ReLU(inplace=True),
  8. )
  9. self.pool = nn.MaxPool3d(kernel_size=2, stride=2)
  10. self.middle = nn.Sequential(
  11. nn.Conv3d(64, 128, kernel_size=3, padding=1),
  12. nn.ReLU(inplace=True),
  13. nn.Conv3d(128, 128, kernel_size=3, padding=1),
  14. nn.ReLU(inplace=True),
  15. )
  16. self.decoder = nn.Sequential(
  17. nn.ConvTranspose3d(128, 64, kernel_size=2, stride=2),
  18. nn.ReLU(inplace=True),
  19. nn.ConvTranspose3d(64, 32, kernel_size=2, stride=2),
  20. nn.ReLU(inplace=True),
  21. nn.Conv3d(32, output_channels, kernel_size=3, padding=1),
  22. )
英文:

It is because down sampling dimensions are not equal to the up sampling dimensions.
Based on the data you provide, I think it is because you used strides in the decoder layer more than the encoder layer.

Modify the architecture to the code below and see if this one works:

  1. def __init__(self, input_channels, output_channels):
  2. super(UNet, self).__init__()
  3. self.encoder = nn.Sequential(
  4. nn.Conv3d(input_channels, 32, kernel_size=3, padding=1),
  5. nn.ReLU(inplace=True),
  6. nn.Conv3d(32, 64, kernel_size=3, padding=1),
  7. nn.ReLU(inplace=True),
  8. )
  9. self.pool = nn.MaxPool3d(kernel_size=2, stride=2)
  10. self.middle = nn.Sequential(
  11. nn.Conv3d(64, 128, kernel_size=3, padding=1),
  12. nn.ReLU(inplace=True),
  13. nn.Conv3d(128, 128, kernel_size=3, padding=1),
  14. nn.ReLU(inplace=True),
  15. )
  16. self.decoder = nn.Sequential(
  17. nn.ConvTranspose3d(128, 64, kernel_size=2, stride=2),
  18. nn.ReLU(inplace=True),
  19. nn.ConvTranspose3d(64, 32, kernel_size=2, stride=2),
  20. nn.ReLU(inplace=True),
  21. nn.Conv3d(32, output_channels, kernel_size=3, padding=1),
  22. )

答案2

得分: 0

我通过在中间层和解码器之后应用最大池化来修复了维度错误。<br>我不确定为什么这样做有效,但现在输出和输入大小是一致的。<br>目前预测结果看起来不太好,但我相当肯定这是因为我将通道数设置为2到8之间,只训练了1个时期。<br>随着我应用正常的超参数,看到这个架构的表现将会很有趣。我只是很高兴再也没有运行时错误或内存问题了。

英文:

I was able to fix the dimension errors by applying max pooling after the middle layer and the decoder.<br>
I am not sure why this works, but now output and input sizes are consistent. <br>
The prediction results look bad right now, but I'm pretty sure that's because I set the number of channels down to between 2 to 8, and only trained for 1 epoch.<br>
It will be interesting to see how this architecture works as I apply normal hyper-parameters. I'm just glad that there are no more runtime errors or out of memory issues.

huangapple
  • 本文由 发表于 2023年8月5日 04:06:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76838860.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定