Any recommendations for a robust approach to fine-tune the 'nhits' parameter in time series cross-validation?
Currently, I've been fine-tuning a separate optimal model for each train-test split over 12-month periods. This process results in 12 different optimal models, with the training data shown in blue and the test data in red.
When reviewing the optimal parameters for each split, I've noticed that it's challenging to identify a set of parameters that perform optimally across all these diverse splits.
Ideally, I'd like to find a way to configure the 'nhits' parameter so that it consistently yields low errors (measured by mean absolute error, MAE) across all of my train-test splits.
I've had some success with transfer learning nhits model, but find it challenging when attempting to train the 'nhits' parameter on my own dataset.
09/27/2023, 2:58 PM
Hi @Afiq Johari! The performance of the nhits is very stable for different values of hyperparameters. Models with significantly different configuration can produce very similar good results. With a small test set, it is reasonable that the optimal model changes between folds.
Any particular reason why you want to have separate models for each fold? Are you backtesting what would happen if you re-train the model each month?
09/27/2023, 3:55 PM
"The performance of the nhits is very stable for different values of hyperparameters."
This is interesting, is there any reason for this or articles that can explain a bit more?
I was expecting that a "specific" set of parameters will emerge after repeating the N-folds cross validations.
For instance, in the initial folds, I might discover that using the 'Sigmoid' activation function consistently results in the lowest error.
I extended this approach to other parameters, searching for combinations that minimize error.
Once I had narrowed down the list of parameter values to a smaller subset, I expected to find better models.
However, as you pointed out, it's entirely possible that "Models with significantly different configuration can produce very similar good results"
If there are any resources or articles that delve into why this occurs, I would greatly appreciate to learn more about this topic.
"Any particular reason why you want to have separate models for each fold?"
I'm hoping to discover a robust model, one with a set of parameter values that performs optimally across all the defined n-folds.
"Are you backtesting what would happen if you re-train the model each month?"
Yes, correct. To be more precise, based on the n-folds which give me n different optimal parameters, I narrow down the search space to then build another model that should be optimal across all the previous months and see whether the model can consistently provide low error rate.
The objective is to determine whether the model consistently yields a low error rate. Ideally, during these backtests, the average MAE should remain relatively consistent.
If there are better ways to perform this, I'm happy to learn more too.