TLDR: `scaler_type` is behaving weirdly for the va...
# neural-forecast
p
TLDR:
scaler_type
is behaving weirdly for the validation set loss functions. If it's not
identity
. The validation loss is way off. What is going on? I'm observing some really weird behavior out of the NBEATS model and I wanted to see if I am out to lunch or what. To get a sense of the model landscape. I started small. Out of my dataset of 100 timeseries, I wanted to find a basic set of hyper-parameters where I could overfit 10 timeseries really well. Once I felt comfortable with the complexity and relative space of model parameters, I would reduce the complexity, add regularization and try to get a model which could perform well on a validation set. After some trial and error I found I could overfit my data very well with this model configuration. I plotted the training loss there.
Copy code
model_name = "NBEATSx"
stacks = 2
params = dict(
    h=180,                         
    input_size=360,              
    loss=HuberLoss(),                
    max_steps=2000,                          
    dropout_prob_theta=0.0,
    stack_types=stacks*["identity"],
    n_blocks=stacks*[15],
    mlp_units=[[16, 16] for _ in range(stacks)], 
    scaler_type="standard",                 
    learning_rate=1e-3,
    random_seed=200,
    alias=model_name,
    batch_size=5,
)
nf = NeuralForecast(models=[NBEATSx(**params)], freq="D")
nf.fit(df=Y_train_df.reset_index())

Epoch 999: 100%
2/2 [00:00<00:00, 6.45it/s, v_num=788, train_loss_step=0.0128, train_loss_epoch=0.0122]
Next, I wanted to check the performance on a validation set. The main issue is no matter how I try to reduce the complexity. example: changing the number of stacks from 2 -> 1or number of blocks from 15 -> 1. or even number of mlp_units from 16 -> 2. Or adding dropout, The validation loss of MAPE or any other loss for that matter does not decrease significantly at all!!! See the second plot. The scale makes it look like it's decreasing but it gets to 1 - 1e-6.
Copy code
stacks = 1
params = dict(
    h=MODEL_HORIZON,                         
    input_size=CONTEXT_LENGTH,              
    loss=HuberLoss(),   
    max_steps=2000,                          
    dropout_prob_theta=0.5,
    stack_types=stacks*["identity"],
    n_blocks=stacks*[1],
    mlp_units=[[16, 16] for _ in range(stacks)], 
    scaler_type="standard",                 
    learning_rate=1e-3,
    random_seed=200,
    alias=model_name,
    batch_size=5,
    # Validation params
    val_check_steps=1,
    valid_loss=MAPE(),
    early_stop_patience_steps=3000
)
However, the biggest change comes from changing the
scaler_type
. If I change it to
identity
. At least the MAPEs are in a decent range and I see some decrease and typical curve behavior. What is going on?
c
Hi @Phil. A couple of questions: • Are you using the latest code from the main branch (with today's PR), or the pip version? • Do you have "short" time series? In which total length < input_size +h ? • Have you tried larger
mlp_units
? 16 units is an extremely small network.
p
Hi Cristian, • In these experiments, I am not using the latest version. Unfortunately, I am limited to pip version for now. specifically version 1.6.1. • The length of the ten timeseries in the training set is 676 days. In this case, the input size is 360 and horizon is 180. I have a test portion of 270 days. • I did try a larger number of mlp_units but the perfomance decreases. For reference, I hashed out the names of the timeseries and messed around with the scale of the numbers so I don't get in trouble but the general look of the timeseries look like this.
From my experiments, the training loss decreases most from adding blocks. The higher the number of blocks the lower the training loss.
c
the validation set is just a classic split right? After the training set of these variables
p
That's right. I simply call
nf.fit(df=Y_train_df.reset_index(), val_size=180)