Thread for practical insights of NHITS and TCN for...
# neural-forecast
m
Thread for practical insights of NHITS and TCN for Power Markets. (Courtesy of @Chris Gervais)
Background provided by @Chris Gervais: Background: • we have a feature store of ~5M features from power markets • some of those are forecast features, others are actuals • we can target any actuals series, and combine with future regressors (forecast features) and lagged regressors (actuals features) Findings: • when we generate datasets with mostly lagged features, TCN vs NHITS accuracy wise isn't that noticeable • when we have lots of future regressors in the dataset, TCN gets a noticeable accuracy pickup, still need to quantify how much but early estimates are ~15-20% MSE improvement, will calculate RMSSE shortly for target-wide evaluation • TCN trains considerably faster for the same dataset and target
"biggest takeaway for us was NHITS is probably best in class if you have no future regressors, but if you do, you may want to try TCN 🙂" "we think TFT may offer the same benefits but tbh it's so slow to train we haven't been able to test it"
@Kin Gtz. Olivares, this is something you might be interested in commenting.
k
Good to confirm that future alignment works @Chris Gervais, @Cristian (Nixtla) We need to include the future alignment into the NHITS Additionally the original TCN had different types of contexts that resemble the interpolation principles of NHITS that we could also explore
And regarding TFT speed It is due it not including the forking sequiences optimization technique
👍 1
c
As an aside, while we were going through all of this we stumbled upon https://github.com/ogrnz/feval - super helpful to sort out which set of forecasters perform the best
k
Yep We already had GW, DM and MCS tests in NeuralForecast 0.x
It got lost in the transition to NeuralForecast 1.x
I will check the package, thanks for the pointer
👍 1
c
Thanks for sharing the results @Chris Gervais! Indeed, the NHITS currently only concatenates the exogenous variables as inputs of the MLPs. If you have many features then MLPs might struggle to learn all the effects (maybe you will need to increase its size considerably). Adding the decoder as the TCN that receives future values can definitely help. I am still unsure of what is the best general approach. Pure "future alignment" might fail to model effects between observations of the horizon window (eg. a large spike in the future might decrease values now). In our NBEATSx paper adding the exogenous variables alone as inputs of the encoder achieved amazing results. Probably a combination of both approaches will work best in general, with the global context vector, or attention mechanisms such as TFT (but it adds a lot of computation time)
m
Interesting. But in practical terms, it's rare to have 5M features in a time series problem.
c
For the sake of clarity - we have access to ~5M features, a mix of lagged and future regressors, which was what allowed us to stumble onto this finding. We definitely don’t use all features in a single model 😊
m
I see. That makes more sense 😅.