英文:
Different results when evaluate the model performance on test data using model.evaluate and model.predict
问题
以下是您提供的内容的翻译部分:
关于Keras中的model.evaluate()和model.predict()函数,我有一个问题。我在Keras中构建了一个简单的LSTM模型,并希望在测试数据集上测试模型性能。我考虑了以下两种方式来计算测试数据集上的指标:
- 使用model.evaluate()方法
- 使用model.predict()方法获取拟合值并手动计算指标
然而,我得到了不同的结果。此外,model.evaluate()方法的结果还取决于batch_size参数的值。根据我的理解和这个帖子,它们应该具有相同的结果。以下是可以复制结果的代码:
(以下是代码的翻译部分)
在运行代码之后,以下是结果:
使用默认batch大小的evaluate方法得到的MSE为0.3068242073059082
使用batch大小为1的evaluate方法得到的MSE为0.26647186279296875
使用batch大小为n_test的evaluate方法得到的MSE为0.30763307213783264
使用predict方法得到的MSE为0.3076330596820157
看起来使用model.predict()和batch大小为n_test的model.evaluate()方法会得到相同的结果。有人能解释吗?提前感谢!
英文:
I have a question regarding model.evaluate() and model.predict() functions in Keras. I built a simple LSTM model in Keras and want to test the model performance on the test dataset. I considered the following two ways to compute the metric on the test dataset:
- Use model.evaluate() method
- Use model.predict() method to obtain the fitted values and compute the metric manually
However, I ended up with different results. In addition, the results for model.evaluate() method also depend on the value for the batch_size argument. Based on my understanding and this post, they should have the same results. Here is the code that can replicate the results:
import tensorflow as tf
from keras.models import Model
from keras.layers import Dense, LSTM, Activation, Input
import numpy as np
from tqdm.notebook import tqdm
import keras.backend as K
from keras.callbacks import ModelCheckpoint, EarlyStopping
class VLSTM:
def __init__(self, input_shape=(6, 1), nb_output_units=1, nb_hidden_units=128, dropout=0.0,
recurrent_dropout=0.0, nb_layers=1):
self.input_shape = input_shape
self.nb_output_units = nb_output_units
self.nb_hidden_units = nb_hidden_units
self.nb_layers = nb_layers
self.dropout = dropout
self.recurrent_dropout = recurrent_dropout
def build(self):
inputs = Input(shape=self.input_shape)
outputs = LSTM(self.nb_hidden_units)(inputs)
outputs = Dense(1, activation=None)(outputs)
return Model(inputs=[inputs], outputs=[outputs])
def RMSE(output, target):
return K.sqrt(K.mean((output - target) ** 2))
n_train = 500
n_val = 100
n_test = 250
X_train = np.random.rand(n_train, 6, 1)
Y_train = np.random.rand(n_train, 1)
X_val = np.random.rand(n_val, 6, 1)
Y_val = np.random.rand(n_val, 1)
X_test = np.random.rand(n_test, 6, 1)
Y_test = np.random.rand(n_test, 1)
input_shape = (X_train.shape[1], X_train.shape[2])
model = VLSTM(input_shape=input_shape)
m = model.build()
m.compile(loss=RMSE,
optimizer='adam',
metrics=[RMSE])
callbacks = []
callbacks.append(EarlyStopping(patience=30))
# train model
hist = m.fit(X_train, Y_train, \
batch_size=32, epochs=10, shuffle=True, \
validation_data=(X_val, Y_val), callbacks=callbacks)
# Use evaluate method with default batch size
test_mse = m.evaluate(X_test, Y_test)[1]
print("Mse is {} using evaluate method with default batch size".format(test_mse))
# Use evaluate method with batch size 1
test_mse = m.evaluate(X_test, Y_test, batch_size=1)[1]
print("Mse is {} using evaluate method with batch size = 1".format(test_mse))
# Use evaluate method with batch size = n_test
test_mse = m.evaluate(X_test, Y_test, batch_size=n_test)[1]
print("Mse is {} using evaluate method with batch size = n_test".format(test_mse))
# Use pred method and compute RMSE mannually
Y_test_pred = m.predict(X_test)
test_mse = np.sqrt( ((Y_test_pred - Y_test) ** 2).mean())
print("Mse is {} using evaluate method with batch size = 1".format(test_mse))
After running the codes, here are the results:
Mse is 0.3068242073059082 using evaluate method with default batch size
Mse is 0.26647186279296875 using evaluate method with batch size = 1
Mse is 0.30763307213783264 using evaluate method with batch size = n_test
Mse is 0.3076330596820157 using predict method
And looks like using mode.predict() and model.evaluate() with batch size = n_test gives the same results. Can anyone explain it? Thanks in advance!
答案1
得分: 0
是的,你的猜测是正确的,使用predict
计算的均方误差与使用batch_size=len(dataset)
进行评估的均方误差确实相等。这很容易理解,因为当你使用predict
计算均方误差时,你没有将数据集分成批次进行计算,而是一次性计算全部数据。
显然,你也可以像这样将predict
的输出分成批次进行均方误差计算:
Y_test_pred_batches = np.split(Y_test_pred, 5, axis=0) # batch_size = 250/5=50
Y_test_batches = np.split(Y_test, 5, axis=0)
batch_rmss = []
for y_pred, y_true in zip(Y_test_pred_batches, Y_test_batches):
batch_rmss.append(rms(y_pred, y_true))
np.mean(batch_rmss)
这样计算的输出是:0.28436336682976376
现在使用evaluate
:
test_mse = m.evaluate(X_test, Y_test, batch_size=50)[1]
test_mse
这样计算的输出是:0.28436335921287537
所以基本上它们是相同的。
如果你尝试使用np.split(Y_test_pred, 250, axis=0)
,将批次大小设置为1,那么在我的情况下输出是0.24441334738835332。而使用evaluate
时,batch_size=1
的输出是0.244413360953331。所以你可以看到它们是相同的。
英文:
Yes, your guess is correct, the mse calculated with the predict is indeed equal to the evaluate with batch_size=len(dataset).
It's very easy to understand because when you calculate the mse with predict you have not divided the your dataset into batches to calculate it, you just calculate all at once.
Obviously you can calculate you mse with predict also dividing into batches like this:
Y_test_pred_batches = np.split(Y_test_pred, 5 ,axis=0) #batch_size = 250/5=50
Y_test_batches = np.split(Y_test, 5 ,axis=0)
batch_rmss = []
for y_pred, y_true in zip(Y_test_pred_batches, Y_test_batches):
batch_rmss.append(rms(y_pred, y_true))
np.mean(batch_rmss)
The output of this is: 0.28436336682976376
Now with evaluate:
test_mse = m.evaluate(X_test, Y_test, batch_size=50)[1]
test_mse
The output of this is: 0.28436335921287537
So basically they are the same.
If you try with np.split(Y_test_pred, 250 ,axis=0) which makes you the batch size 1, the output in my case is 0.24441334738835332. And with evaluate batch_size=1 the output is 0.244413360953331. So you can see it's the same.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论