This message was deleted Nixtla Community #general

Join Slack

This message was deleted.

# general

Slackbot

02/27/2023, 10:13 AM

This message was deleted.

fede (nixtla) (they/them)

02/27/2023, 10:16 PM

hey @Andrew Doherty! The

input_size

parameter controls the length of the in-sample time series: https://nixtla.github.io/statsforecast/core.html#statsforecast.cross_validation. You can use it to perform cross-validation with sliding windows.

Andrew Doherty

02/27/2023, 10:19 PM

Hey Fede, thanks a lot for getting back to me so quickly. Sorry I missed that!

👍 1

Andrew Doherty

03/06/2023, 11:17 AM

Hi @fede (nixtla) (they/them), I am now using MLForecast and I was wondering whether there is a plan for the

input_size

argument to be implemented in

MLForecast.cross_validation

to enable a sliding window? Happy to raise an issue and contribute if possible.

fede (nixtla) (they/them)

03/06/2023, 7:33 PM

Hey @Andrew Doherty!

MLForecast

has the

keep_last_n

argument instead of

input_size

to perform sliding windows. We are working on standardizing argument names across the nixtlaverse. 🙂 Here’s the reference to the cross-validation method: https://nixtla.github.io/mlforecast/forecast.html#mlforecast.cross_validation

Andrew Doherty

03/06/2023, 7:51 PM

Ah, sorry Fede. I thought that was dropping part of the forecast horizon as is done in some trading markets. Thanks for the clarification.

Andrew Doherty

03/20/2023, 5:15 PM

Hi Again @fede (nixtla) (they/them). Hope you are well. I have been making good progress evaluating MLForecast for our production solution, however, I am having an issue using

keep_last_n

and

cross_validation

. In the electricity_peak_forecasting notebook when using the

keep_last_n

argument the code it fails if

keep_last_n < (Y_df.shape[0] - window_size)

. In the example the minimum window that works is

keep_last_n = 6528

Andrew Doherty

03/20/2023, 5:15 PM

Am I doing something wrong?

Andrew Doherty

03/20/2023, 5:16 PM

I have carried different tests and the problem remains with and without differencing/exogenous features.

Andrew Doherty

03/22/2023, 11:49 AM

I have been continuing my investigation and have looked at whether this problem is just present when using

cross_validation.

I have just tried the

end_to_end_walkthrough.ipynb

notebook Training and Forecasting sections and the same error occurs when

differences =[24]

and

keep_last_n

is < 1008. No error is raised if the

differences

argument is not used. This therefore looks like a problem when slicing the data when there are exogenous/differences present and is an issue when both when using

fit

predict

cross_validation

Andrew Doherty

03/22/2023, 5:46 PM

Good afternoon @fede (nixtla) (they/them), I have done some digging and I think there might be two bugs when using

keep_last_n

in MLForecast. First, here the

self.last_dates

is not correct when using

keep_last_n

. This results in null values for the exogenous features when there is a merge in

_get_features_for_next_step

here. I corrected this using a bit of a hack:

self.last_dates = pd.DatetimeIndex([sorted_df.index.get_level_values(self.time_col)[-1]])

This appears to work for my use case but I don't know the design of MLForecast well so this might not be correct for other cases such as multiple `unique_id`'s, Secondly, once this was fixed I noticed that the

and

used in

fit_model

here had all the data and not just the last n samples. I implemented the following hack before `return self.fit_models(X, y)`:

Copy code

if keep_last_n is not None:
    X, y  = X[-keep_last_n:], y[-keep_last_n:]

This is not the right place to fix this as it should be done in

core.py

I think but I just did this quickly to fix and get some results. Do you have any thoughts? Happy to keep digging into the code if that helps, raise an issue on Github or share this with someone else if you don't have time? Thanks again

fede (nixtla) (they/them)

03/22/2023, 6:36 PM

hey @Andrew Doherty! Thank you for taking the time to dig into the problem. Please help us to raise an issue on GitHub to track the problem. :) @José Morales, do you have any thoughts about the issue?

Andrew Doherty

03/22/2023, 10:43 PM

No problem at all. I’ll raise an issue tomorrow morning. Let me know if I can help.

❤️ 1

José Morales

03/23/2023, 2:24 AM

Hi. The

keep_last_n

argument is used only for predicting, it is meant to be an efficiency parameter for cases when you have very long series and your updates don't require all the history. For example if your series are of length 10,000 and your features only require the last 50 days, then setting

keep_last_n=50

makes it so that only the last 50 values of each serie are kept and used to compute the updates, this is because in the updates the whole transformation is computed but only the last value is kept. I think it'd be better to add the

input_size

argument to do exactly the same as in statsforecast, I'll work on that and let you know when it's done.

José Morales

03/23/2023, 2:43 AM

Although those errors you're getting seem a bit odd, the keep_last_n argument should only impact the predict step, I'm not sure why you get errors on the transform. I'd really appreciate if you can open an issue with a minimal reproducible example.

Andrew Doherty

03/23/2023, 10:45 AM

Thanks a lot for this @José Morales, that makes sense. Regarding the

input_size

argument, thanks a lot for working on this as it is really important for my current use case. Once the code is ready (even in a separate branch/fork) if you could let me know I will start using it to test on my data.

Andrew Doherty

03/23/2023, 10:49 AM

I will create a minimal reproducible example and raise an issue later today - I'll need to have a think about this error based on on my new understanding of what

keep_last_n

is doing so it might be later this evening. I think it might only be occurring in the predict step. I'll tag you in the issue later.

👀 1

fede (nixtla) (they/them)

03/23/2023, 5:41 PM

Thank you @José Morales for clarifying the actual behavior of

keep_last_n

and for including the new feature 🙂 Sorry for the misunderstanding @Andrew Doherty 🙌

Andrew Doherty

03/23/2023, 5:57 PM

Absolutely no problem @fede (nixtla) (they/them). Looking forward to Jose's work going live. 😄

José Morales

03/24/2023, 3:24 AM

hey @Andrew Doherty, we just merged the PR adding the input_size argument, so if you install from the main branch you should be able to use it. please let us know how it goes. keep in mind that the number you set there won't necessarily be the number of training samples per serie because they can be shorter in the window and also some rows will be dropped unless you set

dropna=False

Andrew Doherty

03/24/2023, 8:51 AM

Amazing José, thanks a lot for the very quick turnaround. I’ll install today and get back to you early next week.

57 Views

Open in Slack

Previous Next