https://github.com/nixtla logo
#general
Title
# general
j

J.

09/06/2023, 8:39 AM
Hello! Thanks for the awesome project! I have a question regarding cross_validation function. Is it possible to define an initial "training window" size and let the algorithm perform rolling window until end of timeseries is reached without explicitly defining "training_size" or "n_windows"? I have multiple timeseries of different length and it seems a bit weird to specify a fixed amount of n_windows because I want as many cross validations on each timeseries as possible. My current approach is:
Copy code
# Find timeseries with least data
min_ts_count = big_df_for_stats.groupby(by="unique_id").count()["y"].min()
# This is the amount of minimum trained wanted for rolling! window
initial_training_size = 365
step_size = 4
h = 1

test_size_parameter = ((min_ts_count - initial_training_size)//step_size)
# This is the workaround to prevent the exception which happens when "(test_size-h)%step_size != 0"
test_size_parameter = test_size_parameter - (test_size_parameter%step_size) + h

res_df = sf.cross_validation(h=h, step_size=step_size, fitted=True, test_size=test_size_parameter, n_windows=None)
j

José Morales

09/07/2023, 4:50 PM
Hey. The library is built around the premise that you want to forecast 10 periods ahead for example and you want to estimate how good are your models doing that, so you run that procedure a couple of times (maybe 4 times 10 periods ahead). I don't think there's a way to do what you're asking here with the built-in cross_validation. However, that's just a convenience function, you could achieve this by iterating over the series, determining what you want your train size and number of windows to be and just use
StatsForecast.forecast
on that subset.
j

J.

09/08/2023, 9:55 AM
Makes sense since the whole hyperparameter tuning stuff (especially since it is done once per window) is so computationally expensive. I will think about doing only one Auto* tuning with the entire time-series to get the best models parameters and then do the cross-val with the best model. maybe that way I will be able to do the "extensive" cross-validation with reasonable runtime performance. Thanks you.
👍 1
3 Views