2023年3月31日 04:27:24go评论68阅读模式

英文:

Different results when evaluate the model performance on test data using model.evaluate and model.predict

问题

以下是您提供的内容的翻译部分：

关于Keras中的model.evaluate()和model.predict()函数，我有一个问题。我在Keras中构建了一个简单的LSTM模型，并希望在测试数据集上测试模型性能。我考虑了以下两种方式来计算测试数据集上的指标：

使用model.evaluate()方法
使用model.predict()方法获取拟合值并手动计算指标

然而，我得到了不同的结果。此外，model.evaluate()方法的结果还取决于batch_size参数的值。根据我的理解和这个帖子，它们应该具有相同的结果。以下是可以复制结果的代码：

（以下是代码的翻译部分）

在运行代码之后，以下是结果：

使用默认batch大小的evaluate方法得到的MSE为0.3068242073059082

使用batch大小为1的evaluate方法得到的MSE为0.26647186279296875

使用batch大小为n_test的evaluate方法得到的MSE为0.30763307213783264

使用predict方法得到的MSE为0.3076330596820157

看起来使用model.predict()和batch大小为n_test的model.evaluate()方法会得到相同的结果。有人能解释吗？提前感谢！

英文:

I have a question regarding model.evaluate() and model.predict() functions in Keras. I built a simple LSTM model in Keras and want to test the model performance on the test dataset. I considered the following two ways to compute the metric on the test dataset:

Use model.evaluate() method
Use model.predict() method to obtain the fitted values and compute the metric manually

However, I ended up with different results. In addition, the results for model.evaluate() method also depend on the value for the batch_size argument. Based on my understanding and this post, they should have the same results. Here is the code that can replicate the results:

import tensorflow as tf
from keras.models import Model
from keras.layers import Dense, LSTM, Activation, Input
import numpy as np
from tqdm.notebook import tqdm
import keras.backend as K
from keras.callbacks import ModelCheckpoint, EarlyStopping

class VLSTM:
    def __init__(self, input_shape=(6, 1), nb_output_units=1, nb_hidden_units=128, dropout=0.0, 
                 recurrent_dropout=0.0, nb_layers=1):
        self.input_shape = input_shape
        self.nb_output_units = nb_output_units
        self.nb_hidden_units = nb_hidden_units
        self.nb_layers = nb_layers
        self.dropout = dropout
        self.recurrent_dropout = recurrent_dropout

    def build(self):
        inputs = Input(shape=self.input_shape)
        outputs = LSTM(self.nb_hidden_units)(inputs)
        outputs = Dense(1, activation=None)(outputs)
        return Model(inputs=[inputs], outputs=[outputs])
    
def RMSE(output, target):
    return K.sqrt(K.mean((output - target) ** 2))

n_train = 500
n_val = 100
n_test = 250 

X_train = np.random.rand(n_train, 6, 1)
Y_train = np.random.rand(n_train, 1)
X_val = np.random.rand(n_val, 6, 1)
Y_val = np.random.rand(n_val, 1)
X_test = np.random.rand(n_test, 6, 1)
Y_test = np.random.rand(n_test, 1)

input_shape = (X_train.shape[1], X_train.shape[2])
model = VLSTM(input_shape=input_shape)
m = model.build()
m.compile(loss=RMSE,
              optimizer=&#39;adam&#39;,
              metrics=[RMSE])

callbacks = []
callbacks.append(EarlyStopping(patience=30))


# train model
hist = m.fit(X_train, Y_train, \
             batch_size=32, epochs=10, shuffle=True, \
             validation_data=(X_val, Y_val), callbacks=callbacks)

# Use evaluate method with default batch size
test_mse = m.evaluate(X_test, Y_test)[1]
print(&quot;Mse is {} using evaluate method with default batch size&quot;.format(test_mse))

# Use evaluate method with batch size 1
test_mse = m.evaluate(X_test, Y_test, batch_size=1)[1]
print(&quot;Mse is {} using evaluate method with batch size = 1&quot;.format(test_mse))

# Use evaluate method with batch size = n_test
test_mse = m.evaluate(X_test, Y_test, batch_size=n_test)[1]
print(&quot;Mse is {} using evaluate method with batch size = n_test&quot;.format(test_mse))

# Use pred method and compute RMSE mannually
Y_test_pred = m.predict(X_test)
test_mse = np.sqrt( ((Y_test_pred - Y_test) ** 2).mean())
print(&quot;Mse is {} using evaluate method with batch size = 1&quot;.format(test_mse))

After running the codes, here are the results:

Mse is 0.3068242073059082 using evaluate method with default batch size

Mse is 0.26647186279296875 using evaluate method with batch size = 1

Mse is 0.30763307213783264 using evaluate method with batch size = n_test

Mse is 0.3076330596820157 using predict method

And looks like using mode.predict() and model.evaluate() with batch size = n_test gives the same results. Can anyone explain it? Thanks in advance!

答案1

得分: 0

是的，你的猜测是正确的，使用predict计算的均方误差与使用batch_size=len(dataset)进行评估的均方误差确实相等。这很容易理解，因为当你使用predict计算均方误差时，你没有将数据集分成批次进行计算，而是一次性计算全部数据。

显然，你也可以像这样将predict的输出分成批次进行均方误差计算：

Y_test_pred_batches = np.split(Y_test_pred, 5, axis=0) # batch_size = 250/5=50
Y_test_batches = np.split(Y_test, 5, axis=0)
batch_rmss = []
for y_pred, y_true in zip(Y_test_pred_batches, Y_test_batches):
    batch_rmss.append(rms(y_pred, y_true))
np.mean(batch_rmss)

这样计算的输出是：0.28436336682976376
现在使用evaluate：

test_mse = m.evaluate(X_test, Y_test, batch_size=50)[1]
test_mse

这样计算的输出是：0.28436335921287537
所以基本上它们是相同的。

如果你尝试使用np.split(Y_test_pred, 250, axis=0)，将批次大小设置为1，那么在我的情况下输出是0.24441334738835332。而使用evaluate时，batch_size=1的输出是0.244413360953331。所以你可以看到它们是相同的。

英文:

Yes, your guess is correct, the mse calculated with the predict is indeed equal to the evaluate with batch_size=len(dataset).
It's very easy to understand because when you calculate the mse with predict you have not divided the your dataset into batches to calculate it, you just calculate all at once.

Obviously you can calculate you mse with predict also dividing into batches like this:

Y_test_pred_batches = np.split(Y_test_pred, 5 ,axis=0) #batch_size = 250/5=50 
Y_test_batches = np.split(Y_test, 5 ,axis=0)
batch_rmss = []
for y_pred, y_true in zip(Y_test_pred_batches, Y_test_batches):
    batch_rmss.append(rms(y_pred, y_true))
np.mean(batch_rmss)

The output of this is: 0.28436336682976376
Now with evaluate:

test_mse = m.evaluate(X_test, Y_test, batch_size=50)[1]
test_mse

The output of this is: 0.28436335921287537
So basically they are the same.

If you try with np.split(Y_test_pred, 250 ,axis=0) which makes you the batch size 1, the output in my case is 0.24441334738835332. And with evaluate batch_size=1 the output is 0.244413360953331. So you can see it's the same.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在使用 model.evaluate 和 model.predict 评估模型性能时会得到不同的结果。

问题

答案1

ValueError: 在Keras中找到的形状与输入形状不匹配

用于服务器监控的机器学习

ValueError: Error when checking target: expected activation_9 to have shape (74, 6) but got array with shape (75, 6)

ValueError: 形状 (None, 1) 和 (None, 30, 30, 3, 1) 不兼容

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论