model.fit calculates validation only once after validation_freq train epochs and then never again

huangapple go评论56阅读模式
英文:

model.fit calculates validation only once after validation_freq train epochs and then never again

问题

First: 我无法提供任何代码 - 如果这足以关闭这个问题 - 那就这样吧。

The codebase I was provided is confidential and I was not able to reproduce this behavior in a standalone example after multiple hours of work. It's too much custom code that I barely understand which makes isolating single parts basically impossible.

That's why I'm now trying the opposite approach: looking at the tensorflow code to find out what variable combination has to happen to cause this behaviour and then work from there.

I'm asking for advice on how to debug this issue myself / or if anyone has seen something like this:


The actual Issue:

I'm training a tensorflow model with model.fit and set the validation_freq argument to e.g. 10. The behavior I get is that validation is performed on epoch 10 and that's it. On epoch 20, 30 ... and so on no validation is performed. To summarise: for each model.fit call I get only one evaluation in total. Tensorflow Version: 2.10.0

I tried going directly into site-packages/tensorflow/python/keras/engine/training.py to add print statements inside model.fit itself: here and here but I got no output so they were either not executed or print from this file doesn't reach stdout for some reason... or... I misunderstood the code flow of TensorFlow ...

I was changing site-packages/tensorflow/python/keras/engine/training.py while the code was actually using site-packages/keras/engine/training.py

What could cause this / how could I debug this?

英文:

First: I'm not able to provide any code - if that's enough reason to close this - so be it.

The codebase I was provided is confidential and I was not able to reproduce this behavior in a standalone example after multiple hours of work. It's too much custom code that I barely understand which makes isolating single parts basically impossible.

That's why I'm now trying the opposite approach: looking at the tensorflow code to find out what variable combination has to happen to cause this behaviour and then work from there.

I'm asking for advice on how to debug this issue myself / or if anyone has seen something like this:


The actual Issue:

I'm training a tensorflow model with model.fit and set the validation_freq argument to e.g. 10. The behavior I get is that validation is performed on epoch 10 and that's it. On epoch 20, 30 ... and so on no validation is performed. To summarise: for each model.fit call I get only one evaluation in total.
Tensorflow Version: 2.10.0

<s>I tried going directly into site-packages/tensorflow/python/keras/engine/training.py to add print statements inside model.fit itself: here and here but I got no output so they were either not executed or print from this file doesn't reach stdout for some reason... or... I misunderstood the code flow of TensorFlow ...</s>

I was changing site-packages/tensorflow/python/keras/engine/training.py while the code was actually using site-packages/keras/engine/training.py

What could cause this / how could I debug this?

答案1

得分: 1

以下是翻译好的部分:

  1. 确保您向 fit 方法提供了验证数据。
  2. 尝试手动设置验证的轮次:model.fit(...validation_freq=[10, 20, 30])
  3. 如果您正在编辑源代码并在Jupyter笔记本中工作,可能需要重新启动内核以使更改生效。
英文:

It's hard to tell exactly the issue if you're unable to provide examples but I would try the following.

  1. make sure you are providing validation data to the fit method
  2. Try manually setting the epochs after which to validate: model.fit(...validation_freq=[10, 20, 30])
  3. If you're editing the source code and working out of a jupyter notebook, you probably have to restart the kernal for the changes to take effect.

huangapple
  • 本文由 发表于 2023年5月26日 00:05:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76334289.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定