Hello team, first thanks for all the amazing (and ...
# general
m
Hello team, first thanks for all the amazing (and publicly available) work. Quick questions about statsforecast (potentially the other libraries) I can’t quite figure out from the documentation: Once I trained my models, what is the best way to make predictions on the test set for 3 day horizons as opposed to the whole test set starting from the last training date? • Would one have to apply the .forecast method iteratively one day at a time setting df (or y) to the data up to that day and h to 3? • Should one reuse the cross_validation method setting the df and the intpus_size/test_size/n_windows smartly to do this more efficiently?
👀 1
m
If I understand the question correctly, the second option is more efficient.
m
I'll give it a try and come back if something seems off. Thanks for the quick reply!
Hello again. So I have a bit of an issue with using both the .forecast and .cross_validation methods in testing. In both cases, it seems to be fitting the model to the new data, which during testing might not be desirable, nor efficient. Is there a way to force it to use the trained parameters instead of retraining? Am I missing something in my approach?
m
Is this what you are doing?
And do you want to do this?
(Credit to joaquinAmatRodrigo for the second image)
Currently, the cross-validation class does the first thing. If I understand correctly, you want to train a model on a subset of the data and forecast the future without retraining on newly available data. Maybe to save some computing resources and maybe because the model’s accuracy will not decrease so much. Right? If that’s the case, that functionality is currently not available. But we are working on it. However, in our experience, it makes sense to retrain “every time” you have new relevant data. Since statsforecast is highly efficient, even thousands of series take a couple of minutes. If you are using some of the Auto models and want to improve the training speed further, you can restrict the number of models. For example,
ETS (season_length:int=1, model:str='ANN')
will explore only a simple exponential smoothing.
Please also check this issue and like it if is are relevant to your use case: https://github.com/Nixtla/statsforecast/issues/287.
m
That's exactly it. So cv is doing like the first image, but with overlapping windows (unless one specifies step_size=horizon, then we get exactly the first image), right? With the overlapping windows then, if I had a test et of 100 points, it would effectively train 100 new models. This test would give me an estimate of what my model will be like in production if I retrain it every single day. Is that accurate?
I'm glad to hear you are working on the backtesting without refit. It would allow one to also measure how often the models need to be retrained or if the trained model is relatively stable through time. I'll be keeping an eye out for that functionality.
Thank you also for the tip on making the training more efficient. Is there anything else I should keep in mind for speed-up if I start doing a big number of refits. currently I only use num_jobs=-1. Should I think of installing any external libraries, do I need to setup use of GPU with neuralforecats models or is this all out of the box already?
m
That’s exactly it. So cv is doing like the first image, but with overlapping windows (unless one specifies step_size=horizon, then we get exactly the first image), right? With the overlapping windows then, if I had a test et of 100 points, it would effectively train 100 new models. This test would give me an estimate of what my model will be like in production if I retrain it every single day. Is that accurate? (edited)
Yes!
m
Great. Thank you so much for the help!
🙌 1