英文:
model.fit calculates validation only once after validation_freq train epochs and then never again
问题
First: 我无法提供任何代码 - 如果这足以关闭这个问题 - 那就这样吧。
The codebase I was provided is confidential and I was not able to reproduce this behavior in a standalone example after multiple hours of work. It's too much custom code that I barely understand which makes isolating single parts basically impossible.
That's why I'm now trying the opposite approach: looking at the tensorflow code to find out what variable combination has to happen to cause this behaviour and then work from there.
I'm asking for advice on how to debug this issue myself / or if anyone has seen something like this:
The actual Issue:
I'm training a tensorflow model with model.fit
and set the validation_freq
argument to e.g. 10. The behavior I get is that validation is performed on epoch 10 and that's it. On epoch 20, 30 ... and so on no validation is performed. To summarise: for each model.fit
call I get only one evaluation in total. Tensorflow Version: 2.10.0
I tried going directly into site-packages/tensorflow/python/keras/engine/training.py
to add print statements inside model.fit
itself: here and here but I got no output so they were either not executed or print from this file doesn't reach stdout for some reason... or... I misunderstood the code flow of TensorFlow ...
I was changing site-packages/tensorflow/python/keras/engine/training.py
while the code was actually using site-packages/keras/engine/training.py
What could cause this / how could I debug this?
英文:
First: I'm not able to provide any code - if that's enough reason to close this - so be it.
The codebase I was provided is confidential and I was not able to reproduce this behavior in a standalone example after multiple hours of work. It's too much custom code that I barely understand which makes isolating single parts basically impossible.
That's why I'm now trying the opposite approach: looking at the tensorflow code to find out what variable combination has to happen to cause this behaviour and then work from there.
I'm asking for advice on how to debug this issue myself / or if anyone has seen something like this:
The actual Issue:
I'm training a tensorflow model with model.fit
and set the validation_freq
argument to e.g. 10. The behavior I get is that validation is performed on epoch 10 and that's it. On epoch 20, 30 ... and so on no validation is performed. To summarise: for each model.fit
call I get only one evaluation in total.
Tensorflow Version: 2.10.0
<s>I tried going directly into site-packages/tensorflow/python/keras/engine/training.py
to add print statements inside model.fit
itself: here and here but I got no output so they were either not executed or print from this file doesn't reach stdout for some reason... or... I misunderstood the code flow of TensorFlow ...</s>
I was changing site-packages/tensorflow/python/keras/engine/training.py
while the code was actually using site-packages/keras/engine/training.py
What could cause this / how could I debug this?
答案1
得分: 1
以下是翻译好的部分:
- 确保您向
fit
方法提供了验证数据。 - 尝试手动设置验证的轮次:
model.fit(...validation_freq=[10, 20, 30])
- 如果您正在编辑源代码并在Jupyter笔记本中工作,可能需要重新启动内核以使更改生效。
英文:
It's hard to tell exactly the issue if you're unable to provide examples but I would try the following.
- make sure you are providing validation data to the
fit
method - Try manually setting the epochs after which to validate:
model.fit(...validation_freq=[10, 20, 30])
- If you're editing the source code and working out of a jupyter notebook, you probably have to restart the kernal for the changes to take effect.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论