问题

我正在尝试训练用于人脸识别的孪生神经网络。许多资源使用以下函数作为损失函数：

def contrastive_loss(y_true, y_pred):
    margin = 1
    return K.mean(y_true * K.square(y_pred) + (1 - y_true) * K.square(K.maximum(margin - y_pred, 0)))

我训练了几个不同架构的神经网络。对于其中一些网络，这个函数不能正确工作（返回NaN）。因此，神经网络根本无法训练。

对于TestModel，一切正常，但对于Net_definition，它返回NaN。

如何解决这个问题？也许有其他损失函数可以用于此？

英文:

I'm trying to train Siamese neural networks for face recognition. Many resources use this function as a loss function:

def contrastive_loss(y_true, y_pred):
    margin = 1
    return K.mean(y_true * K.square(y_pred) + (1 - y_true) * K.square(K.maximum(margin - y_pred, 0)))

I train several neural networks of different architecture. And for some of them, this function does not work correctly (return nan). Because of this, the neural network is not trained at all.

My code:

#Models.py
from keras.models import Sequential, Model
from keras.layers import Input, Conv2D, MaxPooling2D, Dense, Dropout, Flatten, Lambda, BatchNormalization, Activation
from keras.optimizers import RMSprop
from keras import backend as K


def euclidean_distance(vects):
    x, y = vects
    return K.sqrt(K.sum(K.square(x - y), axis=1, keepdims=True))


def eucl_dist_output_shape(shapes):
    shape1, shape2 = shapes
    return (shape1[0], 1)


def contrastive_loss(y_true, y_pred):
    margin = 1
    return K.mean(y_true * K.square(y_pred) + (1 - y_true) * K.square(K.maximum(margin - y_pred, 0)))


def accuracy(y_true, y_pred):
    return K.mean(K.equal(y_true, K.cast(y_pred &lt; 0.5, y_true.dtype)))


def TestModel(input_shape):
    model = Sequential()
    model.add(Conv2D(filters=96, kernel_size=3, strides=3, activation=&#39;relu&#39;, input_shape=input_shape, padding=&#39;valid&#39;))
    model.add(MaxPooling2D(pool_size=2))
    model.add(Dropout(.25))
    model.add(Conv2D(filters=256, kernel_size=3, strides=3, activation=&#39;relu&#39;, padding=&#39;valid&#39;))
    model.add(MaxPooling2D(pool_size=2))
    model.add(Dropout(.25))
    model.add(Flatten())
    model.add(Dense(512, activation=&#39;relu&#39;))
    model.add(Dropout(0.1))
    model.add(Dense(128, activation=&#39;relu&#39;))
    return model


def Net_Definition(input_shape):
    model = Sequential()
    model.add(Conv2D(filters=96, kernel_size=7, strides=4, activation=&#39;relu&#39;, padding=&#39;valid&#39;, input_shape=input_shape))
    model.add(MaxPooling2D(pool_size=3, strides=2, padding=&#39;valid&#39;))
    model.add(BatchNormalization())
    model.add(Conv2D(filters=256, kernel_size=5, strides=1, activation=&#39;relu&#39;, padding=&#39;same&#39;))
    model.add(MaxPooling2D(pool_size=3, strides=2, padding=&#39;valid&#39;))
    model.add(BatchNormalization())
    model.add(Conv2D(filters=384, kernel_size=3, strides=1, activation=&#39;relu&#39;, padding=&#39;same&#39;))
    model.add(MaxPooling2D(pool_size=3, strides=2, padding=&#39;valid&#39;))
    model.add(Flatten())
    model.add(Dense(512, activation=&#39;relu&#39;))
    model.add(Dropout(.5))
    model.add(Dense(512, activation=&#39;relu&#39;))
    model.add(Dropout(.5))
    model.add(Dense(128, activation=&#39;softmax&#39;))
    return model


def CreateModel(name, input_shape):
    global network
    if name == &#39;test&#39;:
        network = TestModel(input_shape)
    elif name == &#39;net_definition&#39;:
        network = Net_Definition(input_shape)
    else:
        print(&#39;Invalid model name!&#39;)
        exit(0)

    network = Net_Definition(input_shape)

    input_a = Input(shape=input_shape)
    input_b = Input(shape=input_shape)
    processed_a = network(input_a)
    processed_b = network(input_b)

    distance = Lambda(euclidean_distance, output_shape=eucl_dist_output_shape)([processed_a, processed_b])
    model = Model(inputs=[input_a, input_b], outputs=distance)

    opt = RMSprop()
    model.compile(loss=contrastive_loss, optimizer=opt, metrics=[accuracy])
    return model

from keras.utils import Sequence
import numpy as np
import Models
from keras.callbacks import CSVLogger


class MyGenerator(Sequence):
    def __init__(self, filenames, labels, batch_size):
        self.filenames = filenames
        self.labels = labels
        self.batch_size = batch_size

    def __len__(self):
        return (np.ceil(len(self.filenames) / float(self.batch_size))).astype(np.int32)

    def __getitem__(self, item):
        batch_x = self.filenames[item * self.batch_size:(item + 1) * self.batch_size]
        batch_y = self.labels[item * self.batch_size:(item + 1) * self.batch_size]
        x1 = []
        x2 = []
        for i, files in enumerate(batch_x):
            pair = np.load(files).astype(np.float32)
            x1.append(pair[0]/255)
            x2.append(pair[1]/255)
        x1 = np.asarray(x1)
        x2 = np.asarray(x2)
        return (x1, x2), np.array(batch_y).astype(np.float32)


# path_to_folder = &#39;Datasets/test/pairs/224/&#39;
path_to_folder = &#39;Datasets/6. Pairs/224/&#39;
input_shape = (224, 224, 3)
batch_size = 128

x_train_file = open(path_to_folder + &#39;X_Train.txt&#39;, &#39;r&#39;)
y_train_file = open(path_to_folder + &#39;Y_Train.txt&#39;, &#39;r&#39;)
x_val_file = open(path_to_folder + &#39;X_Val.txt&#39;, &#39;r&#39;)
y_val_file = open(path_to_folder + &#39;Y_Val.txt&#39;, &#39;r&#39;)
x_train = x_train_file.read().splitlines()
y_train = y_train_file.read().splitlines()
x_val = x_val_file.read().splitlines()
y_val = y_val_file.read().splitlines()

csv_logger = CSVLogger(&#39;logs.log&#39;)

train_generator = MyGenerator(x_train, y_train, batch_size)
val_generator = MyGenerator(x_val, y_val, batch_size)

model = Models.CreateModel(&#39;test&#39;, input_shape)
history = model.fit(train_generator, epochs=10, verbose=1, validation_data=val_generator, callbacks=[csv_logger])
model.save_weights(&#39;my_checkpoint&#39;)

For TestModel everything works fine, but for Net_definition it returns nan.
TestModel
Net_definition
How can the problem be solved? Maybe there are other loss functions for this?

答案1

得分: 0

在对比函数中，y_true 和 1-y_true 两个术语应该互换位置。
在这个示例中，Siamese网络的输出应该是概率（取值在0到1之间），因为 y_true 要么是0要么是1。在这种情况下，CreateModel 函数正在创建Siamese网络，输出是两个向量之间的欧氏距离，这不是一个概率值。欧氏距离可能大于1。最好在Siamese模型的最后一层添加sigmoid等激活函数。

英文:

I can see a couple of errors here -

y_true and 1-y_true terms in contrastive function should be exchanged.

You can draw inspiration from here -

def loss(margin=1):
   &quot;&quot;&quot;Provides &#39;constrastive_loss&#39; an enclosing scope with variable &#39;margin&#39;.

   Arguments:
       margin: Integer, defines the baseline for distance for which pairs
               should be classified as dissimilar. - (default is 1).

   Returns:
       &#39;constrastive_loss&#39; function with data (&#39;margin&#39;) attached.
   &quot;&quot;&quot;

   # Contrastive loss = mean( (1-true_value) * square(prediction) +
   #                         true_value * square( max(margin-prediction, 0) ))
   def contrastive_loss(y_true, y_pred):
       &quot;&quot;&quot;Calculates the constrastive loss.

       Arguments:
           y_true: List of labels, each label is of type float32.
           y_pred: List of predictions of same length as of y_true,
                   each label is of type float32.

       Returns:
           A tensor containing constrastive loss as floating point value.
       &quot;&quot;&quot;

       square_pred = tf.math.square(y_pred)
       margin_square = tf.math.square(tf.math.maximum(margin - (y_pred), 0))
       return tf.math.reduce_mean(
           (1 - y_true) * square_pred + (y_true) * margin_square
       )

   return contrastive_loss

source

Output for the Siamese network in this example should be probability(value between 0 and 1) because y_true is either 0 or 1. In this case CreateModel function is creating Siamese network and the output is euclidean_distance between two vectors which is not a probability. Euclidean distance can be greater than 1. Better to add activation like sigmoid in the final layer of Siamese model.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

对孪生神经网络的损失函数

问题

答案1

Csv reader错误地解释了引号。

OpenAI Gym 使用 Grayscale 包装器返回 ValueError：要拆包的值过多（期望 2 个）。

Python类层次结构中的动态参数在args之前。

这是什么功能性编程范式。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论