Phil
08/15/2023, 9:33 PMscaler_type
is behaving weirdly for the validation set loss functions. If it's not identity
. The validation loss is way off. What is going on?
I'm observing some really weird behavior out of the NBEATS model and I wanted to see if I am out to lunch or what. To get a sense of the model landscape. I started small. Out of my dataset of 100 timeseries, I wanted to find a basic set of hyper-parameters where I could overfit 10 timeseries really well. Once I felt comfortable with the complexity and relative space of model parameters, I would reduce the complexity, add regularization and try to get a model which could perform well on a validation set.
After some trial and error I found I could overfit my data very well with this model configuration. I plotted the training loss there.
model_name = "NBEATSx"
stacks = 2
params = dict(
h=180,
input_size=360,
loss=HuberLoss(),
max_steps=2000,
dropout_prob_theta=0.0,
stack_types=stacks*["identity"],
n_blocks=stacks*[15],
mlp_units=[[16, 16] for _ in range(stacks)],
scaler_type="standard",
learning_rate=1e-3,
random_seed=200,
alias=model_name,
batch_size=5,
)
nf = NeuralForecast(models=[NBEATSx(**params)], freq="D")
nf.fit(df=Y_train_df.reset_index())
Epoch 999: 100%
2/2 [00:00<00:00, 6.45it/s, v_num=788, train_loss_step=0.0128, train_loss_epoch=0.0122]
Next, I wanted to check the performance on a validation set. The main issue is no matter how I try to reduce the complexity. example: changing the number of stacks from 2 -> 1or number of blocks from 15 -> 1. or even number of mlp_units from 16 -> 2. Or adding dropout, The validation loss of MAPE or any other loss for that matter does not decrease significantly at all!!! See the second plot. The scale makes it look like it's decreasing but it gets to 1 - 1e-6.
stacks = 1
params = dict(
h=MODEL_HORIZON,
input_size=CONTEXT_LENGTH,
loss=HuberLoss(),
max_steps=2000,
dropout_prob_theta=0.5,
stack_types=stacks*["identity"],
n_blocks=stacks*[1],
mlp_units=[[16, 16] for _ in range(stacks)],
scaler_type="standard",
learning_rate=1e-3,
random_seed=200,
alias=model_name,
batch_size=5,
# Validation params
val_check_steps=1,
valid_loss=MAPE(),
early_stop_patience_steps=3000
)
However, the biggest change comes from changing the scaler_type
. If I change it to identity
. At least the MAPEs are in a decent range and I see some decrease and typical curve behavior.
What is going on?Cristian (Nixtla)
08/15/2023, 9:43 PMmlp_units
? 16 units is an extremely small network.Phil
08/15/2023, 9:50 PMCristian (Nixtla)
08/15/2023, 9:53 PMPhil
08/15/2023, 9:54 PMnf.fit(df=Y_train_df.reset_index(), val_size=180)