Hi Team I wanted to update my earlier question with some add Nixtla Community #neural-forecast

Hi Team, I wanted to update my earlier question w...

Alex Endresen

04/01/2024, 9:39 PM

Hi Team, I wanted to update my earlier question with some additional information and code. For some of the neuralforecast models, I am noticing that the training during cross validation is stopping impossibly early. It is my understanding that early_stop_patience_steps and val_check_steps combine to set a minimum number of epochs a model should train before early stopping is triggered. In my example, if early_stop_patience_steps=10 and val_check_steps=10, than early stopping on the cross validation shouldn't trigger until after epoch 100 (10x10) at the earliest. This has held true during the model fitting, but for some models the cross validation stops after epoch 9 (see code example below). I have experimented with various early_stop_patience_steps and val_check_steps, and this issue occurs at the first val_check_steps (i.e. if early_stopping_patience_steps = 10 and val_check_steps = 4, cross validation will stop at epoch 3). Is this a known issue or have I just set something up wrong during cross validation? Also, please let me know if there is any other information I should provide.

Copy code

from neuralforecast.models import TFT
from neuralforecast.core import NeuralForecast
from neuralforecast.losses.pytorch import MAPE

horizon = 6
input_size = 6

models = [TFT(h=horizon,
            input_size=input_size,
            early_stop_patience_steps=10,
            val_check_steps=10,
            max_steps=1000,
            scaler_type='revin',
            loss=MAPE(),
        ),]

nf = NeuralForecast(models = models,
                    freq='MS'
                   )

nf.fit(df = data,
       val_size = 12,
      )

fcst_df = nf.predict(df = data)

Y_hat_df = nf.cross_validation(df = data,
                               n_windows=15,
                               step_size=6,
                               val_size=12, 
                               verbose=True,
                              ).reset_index()

Copy code

Seed set to 1

Epoch 109: 100%

1/1 [00:00<00:00, 7.97it/s, v_num=1247, train_loss_step=1.110, train_loss_epoch=1.110, valid_loss=0.297]
Epoch 9: 100%

1/1 [00:00<00:00, 6.74it/s, v_num=1248, train_loss_step=1.080, train_loss_epoch=1.080, valid_loss=0.345]
Predicting DataLoader 0: 100%

1/1 [00:00<00:00, 95.30it/s]

José Morales

04/01/2024, 10:00 PM

Hey. It's not the epoch 100, but the 100th step (batch), so setting

val_check_steps=10

means "compute the validation every 10 steps (or batches)" and

early_stop_patience_steps=10

means "stop if the validation loss doesn't decrease for 10 consecutive evaluations" (100 steps). The number of steps per epoch depends on the batch size and the size of your dataset

Alex Endresen

04/01/2024, 10:30 PM

Hi Jose, thank you for the response. I think that makes sense. I re-ran with less cross validation windows and the epochs increased. I'll let you know if anything else comes up.

Alex Endresen

04/01/2024, 10:43 PM

Quick follow up, does this output makes sense? If I set the batch size and valid batch size to 1, why would training stop after 4 epochs?

Copy code

TFT(h=horizon,
              input_size=input_size,
              early_stop_patience_steps=5,
              val_check_steps=5,
              max_steps=1000,
              scaler_type='robust',
              loss=MAPE(),
              batch_size=1,
              valid_batch_size=1,
             ),

Y_hat_df = nf.cross_validation(df = prius_data,
                               n_windows=5,
                               step_size=6,
                               val_size=18, 
                               verbose=True,
                               refit=True,
                              ).reset_index()

Copy code

Epoch 34: 100%

1/1 [00:00<00:00, 6.59it/s, v_num=1323, train_loss_step=0.941, train_loss_epoch=0.941, valid_loss=0.268]
Epoch 29: 100%

1/1 [00:00<00:00, 6.69it/s, v_num=1324, train_loss_step=0.926, train_loss_epoch=0.926, valid_loss=0.192]
Using stored dataset.
Predicting DataLoader 0: 100%

1/1 [00:00<00:00, 142.49it/s]
Epoch 29: 100%

1/1 [00:00<00:00, 6.48it/s, v_num=1326, train_loss_step=0.936, train_loss_epoch=0.936, valid_loss=0.182]
Using stored dataset.
Predicting DataLoader 0: 100%

1/1 [00:00<00:00, 135.75it/s]
Epoch 4: 100%

1/1 [00:00<00:00, 6.56it/s, v_num=1328, train_loss_step=1.020, train_loss_epoch=1.020, valid_loss=1.250]
Using stored dataset.
Predicting DataLoader 0: 100%

1/1 [00:00<00:00, 145.98it/s]
Epoch 4: 100%

1/1 [00:00<00:00, 6.69it/s, v_num=1330, train_loss_step=0.955, train_loss_epoch=0.955, valid_loss=3.730]
Using stored dataset.
Predicting DataLoader 0: 100%

1/1 [00:00<00:00, 142.67it/s]
Epoch 4: 100%

1/1 [00:00<00:00, 6.54it/s, v_num=1332, train_loss_step=0.953, train_loss_epoch=0.953, valid_loss=3.390]
Using stored dataset.
Predicting DataLoader 0: 100%

1/1 [00:00<00:00, 139.84it/s]

José Morales

04/03/2024, 12:20 AM

How many series do you have? Maybe it's reaching the 1000 steps limit

Alex Endresen

04/03/2024, 2:55 PM

I only have one series. I have also run this with start_padding_enabled = True, but I still see this behavior.

6 Views

Open in Slack

Previous Next