Random, more theoretical question, has anyone ever...
# random
Random, more theoretical question, has anyone ever tried training a sequence model to take in the time series as input but the output sequence is the best performing stat model rather than the actuals? Seems to me like the output sequence could contain signals that aren't in the input sequence (major jumps or changepoints) but if the output sequence is based on a stat output it would. Let me know if I'm crazy but either way a fast 'auto-stat emulator' would be neat.
Hey @Tyler Blume, This is an exciting idea. You can obtain the outputs of the models as the parameters of distributions using the
option: https://nixtla.github.io/neuralforecast/losses.pytorch.html#distributionloss https://nixtla.github.io/neuralforecast/losses.pytorch.html#gaussian-mixture-mesh-gmm With the parameters as outputs, you can simulate the model output instead of only having a single forecast. This is almost a generative model.
The Gaussian Mixture distribution would allow you to exhibit some jumps and "regime switches". If you go a bit into that rabbit hole, the new DeepAR model does a Monte Carlo simulation to generate its quantiles too.
Basically you would train a sequence model to simulate a stat model (e.g. ARIMA). It's something I've thought about in the past to handle a mix of long and short time series. For example, ARIMA requires at least 1 year of data to be able to make a forecast with yearly seasonality (and 2 years to be able to fit the model). For example, what you could do is fit ARIMA and make predictions with ARIMA for sufficiently long time series, train a neural model targeting the ARIMA predictions and then you could use this neural model to perform predictions for time series shorter than 1 year (perhaps with some exogenous variables that can help the model, for example calendar features)
@Kin Gtz. Olivares that sounds cool and way more complicated than what I was saying haha, although if I unlocked a good idea in you I am happy! I was just suggesting replacing the output target sequence with the best forecast from an ensemble of stat models. The way I see it, if we have a crystal ball and choose the best stat model each time then we achieve SOTA results on basically any benchmark. There were some other similar ideas like FFORMA (for combining stat outputs) but I haven't seen anything trained up for seq2seq models on a large scale. Input is our time series, target sequence is a stat forecast rather than actuals.
@Manuel Basically yeah. If our time series is super intermittent you probably don't want an intermittent forecast (unless it's like seasonal or something) you probably want something smoother or near-constant. So going for a 'good' forecast (the best stat forecast) rather than an 'accurate' one (the actuals).