Retraining time series (Keras) with just the newly arrived data (not fresh training from the scratch)

huangapple go评论66阅读模式
英文:

Retraining time series (Keras) with just the newly arrived data (not fresh training from the scratch)

问题

Introduction: 我有很多时间序列模型,每周都使用Keras进行训练。

Problem: 由于这些模型需要在AWS上消耗更多时间和资源,因此训练所有这些模型变得越来越困难,我正在寻找避免从头开始训练的方法。

What I know: 我可以将模型保存为.h5文件,并在新到达的时间序列数据上恢复训练。

What I don't know: 是否安全这样做?我担心如果仅恢复模型训练而不是从头开始,可能会对模型的完整性造成潜在危险。

英文:

Introduction: I have lots of time-series models that I train with Keras weekly.

Problem: Training all of these models are getting harder and harder since they require more time and resources in AWS and I am looking for ways of avoiding training from scratch.

What I know: I can save models as .h5 and resume training for the newly arrived time-series data.

What I don't know: Is it safe to do this or not? I am afraid of hidden dangers on model integrity if I just resume the model training and not start from scratch.

答案1

得分: 1

如果你只用新数据来训练,你会很快丢掉模型的所有先前知识。它会学习新数据并忘记旧数据,可能还会过拟合。不要这样做。

另一方面,如果你从上次的点继续,加入新数据并保留旧数据,则有可能你的模型已经对旧数据有太大的偏见,不会像它本可以学到新数据那样好(这取决于许多因素,比如新数据与旧数据的差异以及新数据与旧数据的比例)。

所以,对于第二种情况,实际上很难回答,你可以尝试几次,看看是否从头开始训练比继续训练更好。无论如何,不要从训练集中删除旧数据(除非你认为这些数据现在与你的项目不相关)。

英文:

If you train "only" with the new data, you will quickly throw away all your model's previous knowledge. It will learn the new data and forget the old data, possibly overfitting a lot too. Don't do this.

If, on the other hand, you resume from the last point, adding the new data and also keeping the old data, there is a possibility that your model is already too biased to the old data and doesn't learn the new data as well as it could (this depends on a lot of factors though, such as how different the new data is, and the proportion of the new data versus the old data)

So, for the second case, it's actually hard to answer, and you might try a few times to see if starting from scratch will be better than resuming. In any case, do not remove the old data from the training set (unless you think that data is now irrelevant for your project).

huangapple
  • 本文由 发表于 2020年1月3日 22:12:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/59580039.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定