Could you please help me understand the n_windows ...
# mlforecast
m
Could you please help me understand the n_windows param from cross_validation with exogenous variables ? I went through the exogenous variables tutorial: https://github.com/Nixtla/mlforecast/blob/main/nbs/docs/how-to-guides/exogenous_features.ipynb My df_test looks exactly as the series_with_prices When I use n_windows=2 or some low number, everything works fine. I d like to simply go through the dataframe predicted 12 steps ahead at each timestep:
Copy code
horizon = 12
n_windows = len(df_test) - horizon

# Perform cross-validation without refitting
cv_results = fcst.cross_validation(
    df_test,
    # static_features=[],
    n_windows=n_windows,
    h=horizon,
    step_size=1,
    refit=False,
    fitted=True
)
With the above setup however I get:
Copy code
--> 462 self.fit_models(X, y)
    463 if fitted:
    464     fitted_values = self._compute_fitted_values(
    465         X_with_info=X_with_info,
...
--> 677         raise ValueError('Input data must be 2 dimensional and non empty.')
    679     # determine feature names
    680     if feature_name == 'auto':

ValueError: Input data must be 2 dimensional and non empty.
Is there something I am not getting ? : ) Thanks for patience.
j
Are all of your features static? If your dataframe looks like the example then price is dynamic
m
no all of my features are dynamic in fact. (At least if i understand dynamic correctly)
Each of the features is its own process and hence changes with every step
this works
Copy code
cv_results = fcst.cross_validation(
    df_test,
    static_features=[],
    n_windows=2,
    h=horizon,
    step_size=1,
    refit=False,
    fitted=True
)
but id like to simply go over the entire dataset, somehow I am unable to set up n_windows correctly
j
You should provide which columns are static (don't change over time for a single serie) through the
static_features
argument. By providing an empty list you're saying none are static (they all change over time) which is probably causing a bad join somewhere
m
They in fact do all change over time, so static_features = [] makes sense.
I have only one timeseries and its exogenous inputs, they all change over time.
static_features I assume are some kind of identifiers that stay the same over time.
j
Oh I just saw it's commented out. What happens if you use it?
m
• I am trying with empty list and n_windows = 1000 and it seems to be working, • I just somehow need to set up n_windows to go through the entire test set and i am unsure how to do it.
j
1,000 windows seems excessive. Are you only forecasting one step ahead?
m
No 12 steps ahead, but I need to move through the entire test set in a walk forward manner. This I cant understand how to set up.
i.e.: • observe single data point (y, X) • predict y+1, y+2, ...., y+12, for X+1, X+2 ... • observe next data point (y+1, X+1) • predict y+2, y+3, ..., y+13, for X+2, ... , X+13 Observe a data point predict 12 steps forward in a loop basically.
Or am I doing something completely wrong? (most tutorials seem to have cross validation of 20 points at most, which in practice I have to verify on a much longer window to have some confidence)
j
I think you can use the
input_size
but you have to be careful with the number of samples there, because if you're using lag5 for example and
dropna=True
then the features drop 5 rows, so you'll need
input_size=6
to get a single sample at the end
Yeah, 1,000 seems like a lot but since you're not setting refit and it's a single serie it may not take that long
m
I see, perhaps I misunderstood, because the lags are present in the transformed dataframe, the "observing of data point" and the next 12 step prediction might not be what is really going on. • maybe I need to set up forecast horizon in the MLForecast.fit method
Copy code
max_horizon: int, optional (default=None)
Train this many models, where each model will predict a specific horizon.
• it seems this is for something else. For when I assume different dynamics for each time horizon it seems.
Also this has been asked million times I assume 😄 • Is there an elegant way to set up a periodic refit? • Ex. I dont want to refit after every step, I want to refit every day in the cross_validation. Is this doable ? Maybe I can use:
Copy code
before_predict_callback
 after_predict_callback
j
Hmm not sure what you mean. If you set refit=True then the model is retrained on every step, if you set it to False it is only trained the first time. You can also set an integer, for example refit=7 retrains the model every 7 steps
m
aaa I see, amazing, amazing. that is exactly what I needed.