Hi Team, I wanted to update my earlier question w...
# neural-forecast
a
Hi Team, I wanted to update my earlier question with some additional information and code. For some of the neuralforecast models, I am noticing that the training during cross validation is stopping impossibly early. It is my understanding that early_stop_patience_steps and val_check_steps combine to set a minimum number of epochs a model should train before early stopping is triggered. In my example, if early_stop_patience_steps=10 and val_check_steps=10, than early stopping on the cross validation shouldn't trigger until after epoch 100 (10x10) at the earliest. This has held true during the model fitting, but for some models the cross validation stops after epoch 9 (see code example below). I have experimented with various early_stop_patience_steps and val_check_steps, and this issue occurs at the first val_check_steps (i.e. if early_stopping_patience_steps = 10 and val_check_steps = 4, cross validation will stop at epoch 3). Is this a known issue or have I just set something up wrong during cross validation? Also, please let me know if there is any other information I should provide.
Copy code
from neuralforecast.models import TFT
from neuralforecast.core import NeuralForecast
from neuralforecast.losses.pytorch import MAPE

horizon = 6
input_size = 6

models = [TFT(h=horizon,
            input_size=input_size,
            early_stop_patience_steps=10,
            val_check_steps=10,
            max_steps=1000,
            scaler_type='revin',
            loss=MAPE(),
        ),]

nf = NeuralForecast(models = models,
                    freq='MS'
                   )

nf.fit(df = data,
       val_size = 12,
      )

fcst_df = nf.predict(df = data)

Y_hat_df = nf.cross_validation(df = data,
                               n_windows=15,
                               step_size=6,
                               val_size=12, 
                               verbose=True,
                              ).reset_index()
Copy code
Seed set to 1

Epoch 109: 100%

1/1 [00:00<00:00, 7.97it/s, v_num=1247, train_loss_step=1.110, train_loss_epoch=1.110, valid_loss=0.297]
Epoch 9: 100%

1/1 [00:00<00:00, 6.74it/s, v_num=1248, train_loss_step=1.080, train_loss_epoch=1.080, valid_loss=0.345]
Predicting DataLoader 0: 100%

1/1 [00:00<00:00, 95.30it/s]
j
Hey. It's not the epoch 100, but the 100th step (batch), so setting
val_check_steps=10
means "compute the validation every 10 steps (or batches)" and
early_stop_patience_steps=10
means "stop if the validation loss doesn't decrease for 10 consecutive evaluations" (100 steps). The number of steps per epoch depends on the batch size and the size of your dataset
a
Hi Jose, thank you for the response. I think that makes sense. I re-ran with less cross validation windows and the epochs increased. I'll let you know if anything else comes up.
Quick follow up, does this output makes sense? If I set the batch size and valid batch size to 1, why would training stop after 4 epochs?
Copy code
TFT(h=horizon,
              input_size=input_size,
              early_stop_patience_steps=5,
              val_check_steps=5,
              max_steps=1000,
              scaler_type='robust',
              loss=MAPE(),
              batch_size=1,
              valid_batch_size=1,
             ),

Y_hat_df = nf.cross_validation(df = prius_data,
                               n_windows=5,
                               step_size=6,
                               val_size=18, 
                               verbose=True,
                               refit=True,
                              ).reset_index()
Copy code
Epoch 34: 100%

1/1 [00:00<00:00, 6.59it/s, v_num=1323, train_loss_step=0.941, train_loss_epoch=0.941, valid_loss=0.268]
Epoch 29: 100%

1/1 [00:00<00:00, 6.69it/s, v_num=1324, train_loss_step=0.926, train_loss_epoch=0.926, valid_loss=0.192]
Using stored dataset.
Predicting DataLoader 0: 100%

1/1 [00:00<00:00, 142.49it/s]
Epoch 29: 100%

1/1 [00:00<00:00, 6.48it/s, v_num=1326, train_loss_step=0.936, train_loss_epoch=0.936, valid_loss=0.182]
Using stored dataset.
Predicting DataLoader 0: 100%

1/1 [00:00<00:00, 135.75it/s]
Epoch 4: 100%

1/1 [00:00<00:00, 6.56it/s, v_num=1328, train_loss_step=1.020, train_loss_epoch=1.020, valid_loss=1.250]
Using stored dataset.
Predicting DataLoader 0: 100%

1/1 [00:00<00:00, 145.98it/s]
Epoch 4: 100%

1/1 [00:00<00:00, 6.69it/s, v_num=1330, train_loss_step=0.955, train_loss_epoch=0.955, valid_loss=3.730]
Using stored dataset.
Predicting DataLoader 0: 100%

1/1 [00:00<00:00, 142.67it/s]
Epoch 4: 100%

1/1 [00:00<00:00, 6.54it/s, v_num=1332, train_loss_step=0.953, train_loss_epoch=0.953, valid_loss=3.390]
Using stored dataset.
Predicting DataLoader 0: 100%

1/1 [00:00<00:00, 139.84it/s]
j
How many series do you have? Maybe it's reaching the 1000 steps limit
a
I only have one series. I have also run this with start_padding_enabled = True, but I still see this behavior.