Phil08/15/2023, 9:33 PM
is behaving weirdly for the validation set loss functions. If it's not
. The validation loss is way off. What is going on? I'm observing some really weird behavior out of the NBEATS model and I wanted to see if I am out to lunch or what. To get a sense of the model landscape. I started small. Out of my dataset of 100 timeseries, I wanted to find a basic set of hyper-parameters where I could overfit 10 timeseries really well. Once I felt comfortable with the complexity and relative space of model parameters, I would reduce the complexity, add regularization and try to get a model which could perform well on a validation set. After some trial and error I found I could overfit my data very well with this model configuration. I plotted the training loss there.
Next, I wanted to check the performance on a validation set. The main issue is no matter how I try to reduce the complexity. example: changing the number of stacks from 2 -> 1or number of blocks from 15 -> 1. or even number of mlp_units from 16 -> 2. Or adding dropout, The validation loss of MAPE or any other loss for that matter does not decrease significantly at all!!! See the second plot. The scale makes it look like it's decreasing but it gets to 1 - 1e-6.
model_name = "NBEATSx" stacks = 2 params = dict( h=180, input_size=360, loss=HuberLoss(), max_steps=2000, dropout_prob_theta=0.0, stack_types=stacks*["identity"], n_blocks=stacks*, mlp_units=[[16, 16] for _ in range(stacks)], scaler_type="standard", learning_rate=1e-3, random_seed=200, alias=model_name, batch_size=5, ) nf = NeuralForecast(models=[NBEATSx(**params)], freq="D") nf.fit(df=Y_train_df.reset_index()) Epoch 999: 100% 2/2 [00:00<00:00, 6.45it/s, v_num=788, train_loss_step=0.0128, train_loss_epoch=0.0122]
However, the biggest change comes from changing the
stacks = 1 params = dict( h=MODEL_HORIZON, input_size=CONTEXT_LENGTH, loss=HuberLoss(), max_steps=2000, dropout_prob_theta=0.5, stack_types=stacks*["identity"], n_blocks=stacks*, mlp_units=[[16, 16] for _ in range(stacks)], scaler_type="standard", learning_rate=1e-3, random_seed=200, alias=model_name, batch_size=5, # Validation params val_check_steps=1, valid_loss=MAPE(), early_stop_patience_steps=3000 )
. If I change it to
. At least the MAPEs are in a decent range and I see some decrease and typical curve behavior. What is going on?
Cristian (Nixtla)08/15/2023, 9:43 PM
? 16 units is an extremely small network.
Phil08/15/2023, 9:50 PM
Cristian (Nixtla)08/15/2023, 9:53 PM
Phil08/15/2023, 9:54 PM