Andrew Doherty02/27/2023, 10:13 AM
fede (nixtla) (they/them)02/27/2023, 10:16 PM
parameter controls the length of the in-sample time series: https://nixtla.github.io/statsforecast/core.html#statsforecast.cross_validation. You can use it to perform cross-validation with sliding windows.
Andrew Doherty02/27/2023, 10:19 PM
argument to be implemented in
to enable a sliding window? Happy to raise an issue and contribute if possible.
fede (nixtla) (they/them)03/06/2023, 7:33 PM
argument instead of
to perform sliding windows. We are working on standardizing argument names across the nixtlaverse. 🙂 Here’s the reference to the cross-validation method: https://nixtla.github.io/mlforecast/forecast.html#mlforecast.cross_validation
Andrew Doherty03/06/2023, 7:51 PM
. In the electricity_peak_forecasting notebook when using the
argument the code it fails if
. In the example the minimum window that works is
keep_last_n < (Y_df.shape - window_size)
keep_last_n = 6528
I have just tried the
notebook Training and Forecasting sections and the same error occurs when
is < 1008. No error is raised if the
argument is not used. This therefore looks like a problem when slicing the data when there are exogenous/differences present and is an issue when both when using
First, here the
is not correct when using
. This results in null values for the exogenous features when there is a merge in
here. I corrected this using a bit of a hack:
This appears to work for my use case but I don't know the design of MLForecast well so this might not be correct for other cases such as multiple `unique_id`'s,
Secondly, once this was fixed I noticed that the
self.last_dates = pd.DatetimeIndex([sorted_df.index.get_level_values(self.time_col)[-1]])
here had all the data and not just the last n samples. I implemented the following hack before `return self.fit_models(X, y)`:
This is not the right place to fix this as it should be done in
if keep_last_n is not None:
X, y = X[-keep_last_n:], y[-keep_last_n:]
I think but I just did this quickly to fix and get some results.
Do you have any thoughts? Happy to keep digging into the code if that helps, raise an issue on Github or share this with someone else if you don't have time?
fede (nixtla) (they/them)03/22/2023, 6:36 PM
Andrew Doherty03/22/2023, 10:43 PM
José Morales03/23/2023, 2:24 AM
argument is used only for predicting, it is meant to be an efficiency parameter for cases when you have very long series and your updates don't require all the history.
For example if your series are of length 10,000 and your features only require the last 50 days, then setting
makes it so that only the last 50 values of each serie are kept and used to compute the updates, this is because in the updates the whole transformation is computed but only the last value is kept.
I think it'd be better to add the
argument to do exactly the same as in statsforecast, I'll work on that and let you know when it's done.
Andrew Doherty03/23/2023, 10:45 AM
argument, thanks a lot for working on this as it is really important for my current use case. Once the code is ready (even in a separate branch/fork) if you could let me know I will start using it to test on my data.
is doing so it might be later this evening. I think it might only be occurring in the predict step. I'll tag you in the issue later.
fede (nixtla) (they/them)03/23/2023, 5:41 PM
and for including the new feature 🙂
Sorry for the misunderstanding @Andrew Doherty 🙌
Andrew Doherty03/23/2023, 5:57 PM
José Morales03/24/2023, 3:24 AM
Andrew Doherty03/24/2023, 8:51 AM