https://github.com/nixtla logo
#mlforecast
Title
# mlforecast
m

Matej

10/05/2023, 6:49 PM
Could you please help me understand the n_windows param from cross_validation with exogenous variables ? I went through the exogenous variables tutorial: https://github.com/Nixtla/mlforecast/blob/main/nbs/docs/how-to-guides/exogenous_features.ipynb My df_test looks exactly as the series_with_prices When I use n_windows=2 or some low number, everything works fine. I d like to simply go through the dataframe predicted 12 steps ahead at each timestep:
Copy code
horizon = 12
n_windows = len(df_test) - horizon

# Perform cross-validation without refitting
cv_results = fcst.cross_validation(
    df_test,
    # static_features=[],
    n_windows=n_windows,
    h=horizon,
    step_size=1,
    refit=False,
    fitted=True
)
With the above setup however I get:
Copy code
--> 462 self.fit_models(X, y)
    463 if fitted:
    464     fitted_values = self._compute_fitted_values(
    465         X_with_info=X_with_info,
...
--> 677         raise ValueError('Input data must be 2 dimensional and non empty.')
    679     # determine feature names
    680     if feature_name == 'auto':

ValueError: Input data must be 2 dimensional and non empty.
Is there something I am not getting ? : ) Thanks for patience.
j

José Morales

10/05/2023, 6:55 PM
Are all of your features static? If your dataframe looks like the example then price is dynamic
m

Matej

10/05/2023, 6:55 PM
no all of my features are dynamic in fact. (At least if i understand dynamic correctly)
Each of the features is its own process and hence changes with every step
this works
Copy code
cv_results = fcst.cross_validation(
    df_test,
    static_features=[],
    n_windows=2,
    h=horizon,
    step_size=1,
    refit=False,
    fitted=True
)
but id like to simply go over the entire dataset, somehow I am unable to set up n_windows correctly
j

José Morales

10/05/2023, 6:58 PM
You should provide which columns are static (don't change over time for a single serie) through the
static_features
argument. By providing an empty list you're saying none are static (they all change over time) which is probably causing a bad join somewhere
m

Matej

10/05/2023, 6:59 PM
They in fact do all change over time, so static_features = [] makes sense.
I have only one timeseries and its exogenous inputs, they all change over time.
static_features I assume are some kind of identifiers that stay the same over time.
j

José Morales

10/05/2023, 7:01 PM
Oh I just saw it's commented out. What happens if you use it?
m

Matej

10/05/2023, 7:02 PM
• I am trying with empty list and n_windows = 1000 and it seems to be working, • I just somehow need to set up n_windows to go through the entire test set and i am unsure how to do it.
j

José Morales

10/05/2023, 7:03 PM
1,000 windows seems excessive. Are you only forecasting one step ahead?
m

Matej

10/05/2023, 7:04 PM
No 12 steps ahead, but I need to move through the entire test set in a walk forward manner. This I cant understand how to set up.
i.e.: • observe single data point (y, X) • predict y+1, y+2, ...., y+12, for X+1, X+2 ... • observe next data point (y+1, X+1) • predict y+2, y+3, ..., y+13, for X+2, ... , X+13 Observe a data point predict 12 steps forward in a loop basically.
Or am I doing something completely wrong? (most tutorials seem to have cross validation of 20 points at most, which in practice I have to verify on a much longer window to have some confidence)
j

José Morales

10/05/2023, 7:09 PM
I think you can use the
input_size
but you have to be careful with the number of samples there, because if you're using lag5 for example and
dropna=True
then the features drop 5 rows, so you'll need
input_size=6
to get a single sample at the end
Yeah, 1,000 seems like a lot but since you're not setting refit and it's a single serie it may not take that long
m

Matej

10/05/2023, 7:29 PM
I see, perhaps I misunderstood, because the lags are present in the transformed dataframe, the "observing of data point" and the next 12 step prediction might not be what is really going on. • maybe I need to set up forecast horizon in the MLForecast.fit method
Copy code
max_horizon: int, optional (default=None)
Train this many models, where each model will predict a specific horizon.
• it seems this is for something else. For when I assume different dynamics for each time horizon it seems.
Also this has been asked million times I assume 😄 • Is there an elegant way to set up a periodic refit? • Ex. I dont want to refit after every step, I want to refit every day in the cross_validation. Is this doable ? Maybe I can use:
Copy code
before_predict_callback
 after_predict_callback
j

José Morales

10/05/2023, 7:46 PM
Hmm not sure what you mean. If you set refit=True then the model is retrained on every step, if you set it to False it is only trained the first time. You can also set an integer, for example refit=7 retrains the model every 7 steps
m

Matej

10/05/2023, 7:50 PM
aaa I see, amazing, amazing. that is exactly what I needed.