Performing a fit of say 20 epochs or performing two fits of 10 epochs each is not equivalent because even though the network weights are preserved, the adaptive optimizer (Adam) state is not preserved between two different fits. Do you know if there is any way to preserve the optimizer state so that doing two consecutive fits of 10 epochs is equivalent to doing a single fit of 20 epochs?
08/02/2023, 5:47 PM
Hi @Manuel! Not currently, the optimizers are re-defined every time the fit is called. What specifically do you want to preserve across runs? You can modify the learning rate directly, with nf.models.learning_rate = new_value.
08/02/2023, 7:07 PM
Hi @Cristian (Nixtla) I currently have a custom evaluation function (not only for the metric, but I use a specific subset of data with a special ground truth), so I cannot use the built-in validation feature for early stopping. The problem is that my model is TFT-based and the training is very slow (even with a GPU), so optimizing the number of epochs while performing hyperparameters tuning (without early stopping) is very expensive because I have to start the training again for each epoch/step value (basically, I need two different runs to try 200 and 210 epochs). If I had the ability to do an incremental fit, I could create a training loop with an early stopping mechanism using my custom evaluation function. The problem is that if I try to do this the state of the optimizer is not preserved and therefore the results are poor (even trying to decrease the learning rate, I still lose the adaptivity from the previous fit).