Andrew Doherty
02/27/2023, 10:13 AMfede (nixtla) (they/them)
02/27/2023, 10:16 PMinput_size
parameter controls the length of the in-sample time series: https://nixtla.github.io/statsforecast/core.html#statsforecast.cross_validation. You can use it to perform cross-validation with sliding windows.Andrew Doherty
02/27/2023, 10:19 PMinput_size
argument to be implemented in MLForecast.cross_validation
to enable a sliding window? Happy to raise an issue and contribute if possible.fede (nixtla) (they/them)
03/06/2023, 7:33 PMMLForecast
has the keep_last_n
argument instead of input_size
to perform sliding windows. We are working on standardizing argument names across the nixtlaverse. 🙂 Here’s the reference to the cross-validation method: https://nixtla.github.io/mlforecast/forecast.html#mlforecast.cross_validationAndrew Doherty
03/06/2023, 7:51 PMkeep_last_n
and cross_validation
. In the electricity_peak_forecasting notebook when using the keep_last_n
argument the code it fails if keep_last_n < (Y_df.shape[0] - window_size)
. In the example the minimum window that works is keep_last_n = 6528
:cross_validation.
I have just tried the end_to_end_walkthrough.ipynb
notebook Training and Forecasting sections and the same error occurs when differences =[24]
and keep_last_n
is < 1008. No error is raised if the differences
argument is not used. This therefore looks like a problem when slicing the data when there are exogenous/differences present and is an issue when both when using fit
/ predict
or cross_validation
.keep_last_n
in MLForecast.
First, here the self.last_dates
is not correct when using keep_last_n
. This results in null values for the exogenous features when there is a merge in _get_features_for_next_step
here. I corrected this using a bit of a hack:
self.last_dates = pd.DatetimeIndex([sorted_df.index.get_level_values(self.time_col)[-1]])
This appears to work for my use case but I don't know the design of MLForecast well so this might not be correct for other cases such as multiple `unique_id`'s,
Secondly, once this was fixed I noticed that the X
and y
used in fit_model
here had all the data and not just the last n samples. I implemented the following hack before `return self.fit_models(X, y)`:
if keep_last_n is not None:
X, y = X[-keep_last_n:], y[-keep_last_n:]
This is not the right place to fix this as it should be done in core.py
I think but I just did this quickly to fix and get some results.
Do you have any thoughts? Happy to keep digging into the code if that helps, raise an issue on Github or share this with someone else if you don't have time?
Thanks againfede (nixtla) (they/them)
03/22/2023, 6:36 PMAndrew Doherty
03/22/2023, 10:43 PMJosé Morales
03/23/2023, 2:24 AMkeep_last_n
argument is used only for predicting, it is meant to be an efficiency parameter for cases when you have very long series and your updates don't require all the history.
For example if your series are of length 10,000 and your features only require the last 50 days, then setting keep_last_n=50
makes it so that only the last 50 values of each serie are kept and used to compute the updates, this is because in the updates the whole transformation is computed but only the last value is kept.
I think it'd be better to add the input_size
argument to do exactly the same as in statsforecast, I'll work on that and let you know when it's done.Andrew Doherty
03/23/2023, 10:45 AMinput_size
argument, thanks a lot for working on this as it is really important for my current use case. Once the code is ready (even in a separate branch/fork) if you could let me know I will start using it to test on my data.keep_last_n
is doing so it might be later this evening. I think it might only be occurring in the predict step. I'll tag you in the issue later.fede (nixtla) (they/them)
03/23/2023, 5:41 PMkeep_last_n
and for including the new feature 🙂
Sorry for the misunderstanding @Andrew Doherty 🙌Andrew Doherty
03/23/2023, 5:57 PMJosé Morales
03/24/2023, 3:24 AMdropna=False
Andrew Doherty
03/24/2023, 8:51 AM