2023年7月31日 20:21:29go评论91阅读模式

英文:

Deep Learning binary case : visualization of masks predicted by the model + BinaryIoU

问题

I have a question regarding the visualization of masks predicted by the model in a case where we have only one class of object (binary situation). My objective is to import the trained model and visualize the predicted masks on the first images of my training dataset to understand what the model is capable of doing.

然而，目前，模型的预测结果是以logits格式表示的，这代表连续值。为了正确解释这些结果，我了解到它们需要被转换成概率值（也许可以使用sigmoid函数？）。因此，我希望创建一个“推断”函数，将logits转换为概率，并决定如果概率大于0.5，我们将认为像素属于目标类（物体），否则，我们将认为它是背景的一部分（我认为这通常是如何工作的）。

以下是允许训练模型的代码：

# https://keras.io/examples/vision/deeplabv3_plus/
import os
import cv2
import random
import numpy as np
from glob import glob
from scipy.io import loadmat
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.metrics import BinaryIoU
# 创建一个 TensorFlow 数据集
# 配置变量
IMAGE_SIZE = 512 # 调整图像大小
BATCH_SIZE = 5 # 每次使用 4 个样本训练网络（5 乘 5）
NUM_CLASSES = 1
DATA_DIR = r"C:\Users\lscamill\Desktop\SEMANTIC_IA\instance-level-human-parsing\instance-level_human_parsing\instance-level_human_parsing\Training"
NUM_TRAIN_IMAGES = 1382 # 用于训练
NUM_VAL_IMAGES = 346 # 用于验证
# 数据加载
image_paths = sorted(glob(os.path.join(DATA_DIR, "Images2/*")))
mask_paths = sorted(glob(os.path.join(DATA_DIR, "mask_final/*")))
combined = list(zip(image_paths, mask_paths))
random.shuffle(combined)
image_paths[:], mask_paths[:] = zip(*combined)
train_images = image_paths[:NUM_TRAIN_IMAGES]
train_masks = mask_paths[:NUM_TRAIN_IMAGES]
val_images = image_paths[NUM_TRAIN_IMAGES:NUM_TRAIN_IMAGES + NUM_VAL_IMAGES]
val_masks = mask_paths[NUM_TRAIN_IMAGES:NUM_TRAIN_IMAGES + NUM_VAL_IMAGES]
# 读取和预处理图像
def read_image(image_path, mask=False):
    image = tf.io.read_file(image_path)
    if mask:
        image = tf.image.decode_png(image, channels=1)
        image.set_shape([None, None, 1])
        image = tf.image.resize(images=image, size=[IMAGE_SIZE, IMAGE_SIZE])
    else:
        image = tf.image.decode_png(image, channels=3)
        image.set_shape([None, None, 3])
        image = tf.image.resize(images=image, size=[IMAGE_SIZE, IMAGE_SIZE])
        image = tf.keras.applications.resnet50.preprocess_input(image)
    return image
# 数据加载和处理
def load_data(image_list, mask_list):
    image = read_image(image_list)
    mask = read_image(mask_list, mask=True)
    return image, mask
# 从图像和掩码列表创建 TensorFlow 数据集，确保并行处理
def data_generator(image_list, mask_list):
    dataset = tf.data.Dataset.from_tensor_slices((image_list, mask_list))
    dataset = dataset.map(load_data, num_parallel_calls=tf.data.AUTOTUNE)
    dataset = dataset.batch(BATCH_SIZE, drop_remainder=True)
    return dataset
train_dataset = data_generator(train_images, train_masks)
val_dataset = data_generator(val_images, val_masks)
# 构建 DeepLabV3+ 模型
# 组合卷积层、批量归一化和 ReLU 激活以对输入张量进行非线性变换
def convolution_block(
    block_input,
    num_filters=256,
    kernel_size=3,
    dilation_rate=1,
    padding="same",
    use_bias=False,
):
    x = layers.Conv2D(
        num_filters,
        kernel_size=kernel_size,
        dilation_rate=dilation_rate,
        padding="same",
        use_bias=use_bias,
        kernel_initializer=keras.initializers.HeNormal(),
    )(block_input)
    x = layers.BatchNormalization()(x)
    return tf.nn.relu(x)
# 通过应用平均池化、上采样和连接操作来进行扩展的空间金字塔池化
def DilatedSpatialPyramidPooling(dspp_input):
    dims = dspp_input.shape
    x = layers.AveragePooling2D(pool_size=(dims[-3], dims[-2]))(dspp_input)
    x = convolution_block(x, kernel_size=1, use_bias=True)
    out_pool = layers.UpSampling2D(
        size=(dims[-3] // x.shape[1], dims[-2] // x.shape[2]), interpolation="bilinear",
    )(x)
    out_1 = convolution_block(dspp_input, kernel_size=1, dilation_rate=1)
    out_6 = convolution_block(dspp_input, kernel_size=3, dilation_rate=6)
    out_12 = convolution_block(dspp_input, kernel_size=3, dilation_rate=12)
    out_18 = convolution_block(dspp_input, kernel_size=3, dilation_rate=18)
    x = layers.Concatenate(axis=-1)([out_pool, out_1, out_6, out_12, out_18])
    output = convolution_block(x, kernel_size=1)
    return output
# 通过组合修改后的 ResNet50 主干、扩展空间金字塔池化和多尺度特征融合来构建 DeepLabV3+ 模型
def DeeplabV3Plus(image_size, num_classes):
    model_input = keras.Input(shape=(image_size, image_size, 3))
    resnet50 = keras.applications.ResNet50(
        weights="imagenet", include_top=False, input_tensor=model_input
    )
    x = resnet50.get_layer("conv4_block6_2_relu").output
    x = DilatedSpatialPyramidPooling(x)
    input_a = layers.UpSampling2D(
        size=(image_size // 4 // x.shape[1], image_size // 4 // x.shape[2]),
        interpolation="bilinear",
    )(x)
    input_b = resnet50.get_layer("conv2_block3_2_relu").output
    input_b = convolution_block(input_b, num_filters=48, kernel_size=1)
    x = layers.Concatenate(axis=-1)([input_a, input_b])
    x = convolution_block(x)
    x = convolution_block(x)
    x = layers.UpSampling2D(
        size=(image_size // x.shape[
<details>
<summary>英文:</summary>
I have a question regarding the visualization of masks predicted by the model in a case where we have only one class of object (binary situation). My objective is to import the trained model and visualize the predicted masks on the first images of my training dataset to understand what the model is capable of doing.
However, currently, the model&#39;s predictions are in the format of logits, which represent continuous values. To correctly interpret these results, I understood that they need to be converted into probability values (using the sigmoid function, perhaps?). So, I&#39;m looking to create an &quot;inference&quot; function that will transform the logits into probabilities and decide that if the probabilities are greater than 0.5, we consider the pixel to belong to the target class (object), otherwise, we consider it as part of the background (I believe that&#39;s how it generally works).
Here is my code that allows training the model:

https://keras.io/examples/vision/deeplabv3_plus/

import os
import cv2
import random
import numpy as np
from glob import glob
from scipy.io import loadmat
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.metrics import BinaryIoU

Creating a TensorFlow Dataset

Configuration Variables

IMAGE_SIZE = 512 # resize the images
BATCH_SIZE = 5 # takes 4 samples and trains the network again and again (5 by 5)
NUM_CLASSES = 1
DATA_DIR = r"C:\Users\lscamill\Desktop\SEMANTIC_IA\instance-level-human-parsing\instance-level_human_parsing\instance-level_human_parsing\Training"
NUM_TRAIN_IMAGES = 1382 # for training
NUM_VAL_IMAGES = 346 # for validation

Data Loading

image_paths = sorted(glob(os.path.join(DATA_DIR, "Images2/")))
mask_paths = sorted(glob(os.path.join(DATA_DIR, "mask_final/")))

combined = list(zip(image_paths, mask_paths))
random.shuffle(combined)
image_paths[:], mask_paths[:] = zip(*combined)

train_images = image_paths[:NUM_TRAIN_IMAGES]
train_masks = mask_paths[:NUM_TRAIN_IMAGES]
val_images = image_paths[NUM_TRAIN_IMAGES:NUM_TRAIN_IMAGES + NUM_VAL_IMAGES]
val_masks = mask_paths[NUM_TRAIN_IMAGES:NUM_TRAIN_IMAGES + NUM_VAL_IMAGES]

Image Reading and Preprocessing

def read_image(image_path, mask=False):
image = tf.io.read_file(image_path)
if mask: # mask is set to True, the image is treated as a mask image and decoded with a single channel
image = tf.image.decode_png(image, channels=1) # or decode_jpeg (grayscale image)
image.set_shape([None, None, 1])
image = tf.image.resize(images=image, size=[IMAGE_SIZE, IMAGE_SIZE])
else:
image = tf.image.decode_png(image, channels=3) # or decode_jpeg (channels=3 rgb)
image.set_shape([None, None, 3])
image = tf.image.resize(images=image, size=[IMAGE_SIZE, IMAGE_SIZE])
# preprocessing step applied to the non-mask (regular) images
# according to the requirements of the ResNet50 model
image = tf.keras.applications.resnet50.preprocess_input(image)
return image

Data Loading and Processing

loads an image and its corresponding mask

def load_data(image_list, mask_list):
image = read_image(image_list)
mask = read_image(mask_list, mask=True)
return image, mask

creates a TensorFlow dataset from the image and mask lists ensuring parallel processing)

def data_generator(image_list, mask_list):
dataset = tf.data.Dataset.from_tensor_slices((image_list, mask_list))
dataset = dataset.map(load_data, num_parallel_calls=tf.data.AUTOTUNE)
dataset = dataset.batch(BATCH_SIZE, drop_remainder=True)
return dataset

train_dataset = data_generator(train_images, train_masks)
val_dataset = data_generator(val_images, val_masks)

Building the DeepLabV3+ model

combines a convolutional layer, batch normalization, and ReLU activation to apply non-linear transformations to the input tensor

def convolution_block(
block_input, #The input tensor to the convolutional block
num_filters=256,
kernel_size=3, # The size of the convolutional kernel. Default is 3x3
dilation_rate=1, #Default =>1 no dilation
padding="same",
use_bias=False,
): # 2D convolution on the input tensor using the specified parameters
x = layers.Conv2D(
num_filters,
kernel_size=kernel_size,
dilation_rate=dilation_rate,
padding="same",
use_bias=use_bias,
kernel_initializer=keras.initializers.HeNormal(),
)(block_input)
x = layers.BatchNormalization()(x) # normalizes the activations of the previous layer
return tf.nn.relu(x) # applied to the normalized tensor

dilated spatial pyramid pooling by applying average pooling, upsampling, and concatenation operations

def DilatedSpatialPyramidPooling(dspp_input):
dims = dspp_input.shape # calculating the dimensions of the dspp_input
x = layers.AveragePooling2D(pool_size=(dims[-3], dims[-2]))(dspp_input) # performs average pooling on the input tensor, reducing its spatial dimensions
x = convolution_block(x, kernel_size=1, use_bias=True)
# upsample the tensor x to the original spatial dimensions of the input tensor
out_pool = layers.UpSampling2D(
size=(dims[-3] // x.shape[1], dims[-2] // x.shape[2]), interpolation="bilinear",
)(x)

out_1 = convolution_block(dspp_input, kernel_size=1, dilation_rate=1)
out_6 = convolution_block(dspp_input, kernel_size=3, dilation_rate=6)
out_12 = convolution_block(dspp_input, kernel_size=3, dilation_rate=12)
out_18 = convolution_block(dspp_input, kernel_size=3, dilation_rate=18)
x = layers.Concatenate(axis=-1)([out_pool, out_1, out_6, out_12, out_18])
output = convolution_block(x, kernel_size=1)
return output

constructs the DeepLabV3+ model by combining a modified ResNet50 backbone,

dilated spatial pyramid pooling, and multi-scale feature fusion

def DeeplabV3Plus(image_size, num_classes):
model_input = keras.Input(shape=(image_size, image_size, 3))
resnet50 = keras.applications.ResNet50(
weights="imagenet", include_top=False, input_tensor=model_input
)
x = resnet50.get_layer("conv4_block6_2_relu").output
x = DilatedSpatialPyramidPooling(x)

input_a = layers.UpSampling2D(
size=(image_size // 4 // x.shape[1], image_size // 4 // x.shape[2]),
interpolation=&quot;bilinear&quot;,
)(x)
input_b = resnet50.get_layer(&quot;conv2_block3_2_relu&quot;).output
input_b = convolution_block(input_b, num_filters=48, kernel_size=1)
x = layers.Concatenate(axis=-1)([input_a, input_b])
x = convolution_block(x)
x = convolution_block(x)
x = layers.UpSampling2D(
size=(image_size // x.shape[1], image_size // x.shape[2]),
interpolation=&quot;bilinear&quot;,
)(x)
model_output = layers.Conv2D(num_classes, kernel_size=(1, 1), padding=&quot;same&quot;)(x)
return keras.Model(inputs=model_input, outputs=model_output)

model = DeeplabV3Plus(image_size=IMAGE_SIZE, num_classes=NUM_CLASSES)

Training

loss = keras.losses.BinaryCrossentropy(from_logits=True)
model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001), loss=loss, metrics=[BinaryIoU()])

history = model.fit(train_dataset, validation_data=val_dataset, epochs=25)
model.save(r"C:\Users\lscamill\Desktop\new")
plt.plot(history.history["loss"])
plt.title("Training Loss new")
plt.ylabel("loss")
plt.xlabel("epoch")
plt.show()

plt.plot(history.history["binary_io_u"]) # Update the key to 'binary_io_u'
plt.title("Training IoU new")
plt.ylabel("IoU")
plt.xlabel("Epoch")
plt.show()

plt.plot(history.history["val_loss"])
plt.title("Validation Loss new")
plt.ylabel("val_loss")
plt.xlabel("epoch")
plt.show()

plt.plot(history.history["val_binary_io_u"]) # Update the key to 'val_binary_io_u'
plt.title("Validation IoU new")
plt.ylabel("IoU")
plt.xlabel("Epoch")
plt.show()


I tried these but i obtain only value between 0,001 et 0,009 (very low probability - it may be normal since my model does not have a lot of images to train):

model = keras.models.load_model(r"C:\Users\lscamill\Desktop\new")
model.layers[-1].activation = tf.keras.activations.sigmoid

def infer(model, image_tensor):
predictions = model.predict(np.expand_dims((image_tensor), axis=0))
predictions = np.squeeze(predictions) # This line removes unnecessary dimensions from the predictions.
prediction_binary = predictions > 0.5
return prediction_binary # array that contains the predicted segmentation mask

Moreover, when I look at my IoU plots for training and validation I have values around 0.49 (so it&#39;s acceptable) but when I try to look at the predictions generated for the same images they are low so is there a problem somewhere?
</details>
# 答案1
**得分**: 1
我认为要将输出的logits转换为每个类别（每个像素）的概率，您需要在通道维度上应用`Softmax`函数。`Softmax`将重新缩放值，使它们为正，并且总和为1 - `sigmoid`仅将它们变为正数，因此无法解释为概率。
如果您选择任何像素，并沿着该像素的通道维度移动，沿轴的softmax值将告诉您该通道的概率。对于每个像素，您可以通过在通道维度上应用`argmax`函数来确定获得最高概率的通道。对于二进制情况，这相当于得分概率大于0.5。这个新的映射可能会纠正您所看到的IoU行为。
```python
#logits size: [batch, h, w, channels]
logits = layers.Conv2D(num_classes, kernel_size=(1, 1), padding="same")(x)
#probabilities size: [batch, h, w, channels]
probabilities = layers.Softmax(dim=-1)(logits)
#Segmentation map size: [batch, h, w] (or [batch, h, w, 1])
#Each pixel has a value identifying the channel index that scored highest
#For binary case this is equivalent to scoring > 0.5
highest_scoring_channels = probabilities.argmax(dim=-1)

适用于您的infer方法，代码将如下所示：

model = keras.models.load_model(r"C:\Users\lscamill\Desktop\new")
#删除了sigmoid行
def infer(model, image_tensor):
    #您的模型返回logits，与最初一样。即模型没有改变。
    model_output = model.predict(np.expand_dims((image_tensor), axis=0))
    #将model_output（这些是logits）转换为概率
    probabilities = layers.Softmax(dim=-1)(model_output)
    #将其转换为每个类别的二进制分割掩码
    predicted_class0 = probabilities[:, :, :, 0] > 0.5
    predicted_class1 = probabilities[:, :, :, 1] > 0.5
    
    #返回所需的分割掩码
    return predicted_class0

英文:

I think that to convert your output logits to a probability for each class (for each pixel), you need to apply a Softmax function across the channels dimension. Softmax rescales the values such that they are positive and add up to 1 - sigmoid only makes them positive and therefore cannot be interpreted as a probability.

If you choose any pixel, and move along the channels dimension for that pixel, the softmax value along the axis tell you the probability for that channel. For each pixel, you can identify the channel that scored the highest probability by applying an argmax function along the channel dimension (for binary this is equivalent to scoring probability>0.5). This new map might correct the IoU behaviour you're seeing.

#logits size: [batch, h, w, channels]
logits = layers.Conv2D(num_classes, kernel_size=(1, 1), padding=&quot;same&quot;)(x)
#probabilities size: [batch, h, w, channels]
probabilities = layers.Softmax(dim=-1)(logits)
#Segmentation map size: [batch, h, w] (or [batch, h, w, 1])
#Each pixel has a value identifying the channel index that scored highest
#For binary case this is equivalent to scoring &gt; 0.5
highest_scoring_channels = probabilities.argmax(dim=-1)

Adapted for your infer method, the code would look like this:

model = keras.models.load_model(r&quot;C:\Users\lscamill\Desktop\new&quot;)
#deleted the sigmoid line
def infer(model, image_tensor):
    #Your model returns logits, as done originally. i.e. model hasn&#39;t changed.
    model_output = model.predict(np.expand_dims((image_tensor), axis=0))
    #Convert model_output (which are logits) to probabilities
    probabilities = layers.Softmax(dim=-1)(model_output)
    #Convert to a binary segmentation mask per class
    predicted_class0 = probabilities[:, :, :, 0] &gt; 0.5
    predicted_class1 = probabilities[:, :, :, 1] &gt; 0.5
    
    #Return the desired segmentation mask
    return predicted_class0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

问题