I have a question about how I should be thinking about `step Nixtla Community #neural-forecast

I have a question about how I should be thinking a...

Jan

04/11/2025, 11:34 PM

I have a question about how I should be thinking about

step_size

when using the LSTM. Say I need to predict the next 24 hours every hour and I want to use the last 48 hours to do so, and I have future exogenous features that change every hour (for example weather forecasts), and turn into actuals when the time passes beyond the present. My data frame right now consists of non-overlapping windows of 72 steps long, where the first 48 steps are mostly duplicates, as the actual values of the exogenous features changes only one step at the time. So I'm basically using

input_size=48

horizon=24

and

step_size=72

when training an LSTM. However, I'm not sure that I'm doing this right as it seems like the model trains very poorly even though there's a lot of data (for example, the forecasted values rarely start from the last known values), and the predictions on a future hold-out set are very poor. Am I doing the windowing correctly? Or should I be feeding only 25 hour windows to the model (so

input_size=1

horizon=24

and

step_size=25

) where the first row are the latest actuals and have the LSTM do the tracking of the past? And is this different for other architectures such as NHITS?

Marco

04/14/2025, 1:08 PM

Hello! In my opinion,

step_size

should almost always be set to 1. What

step_size

does, is that it controls the distance between consecutive temporal windows during training, When set to 1, you ensure the maximum number of training windows. When you increase it, the number of windows decreases, which probably explains why the model performs poorly. In your case, I would set

input_size=48

horizon=24

step_size=1

. Also, just to be sure, your dataframe should be in the long format (see here). And I also think that you can use a better model than LSTM, something like NHITS actually. Definitely worth trying.

Jan

04/14/2025, 5:00 PM

Thanks for your reply Marco - I'm not sure I understand how to incorporate the changing forecasts with

step_size=1

though. If I understand it well, the windowing code would step through the data one step at a time, but the values for the next 24 value of the future exogenous features change every step. To account for this, I thought it could work to copy the full window each time (so if I have X time steps in my data, my training DF has

X * step_size

rows, and so the same number of windows as my original DF (with non-updating exogenous features) and

step_size=1

) and set the

step_size = input_size + horizon

, which is very inefficient but at least makes sure that the model sees the latest data (if I'm doing it correctly). Could you clarify this? Or is there maybe an example in the documentation that shows how to work with future exogenous variables that change over time? As for your comment on the long format - that's only with respect to multiple outcomes, right? The examples I've seen in the documentation (example) are wide with respect to the features (aka, the DFs have multiple columns for each exogenous feature).

Marco

04/14/2025, 7:02 PM

Suppose your data is [1,2,3,4,5,6,7,8,9,10] and your model has a horizon of 1 and input size of 3. With

step_size=1

, we internally create training windows as: ([input sequence] - [target sequence]) [1,2,3] - [4] [2,3,4] - [5] [3,4,5] - [6] [4,5,6] - [7] [5,6,7] - [8] [6,7,8] - [9] [7,8,9] - [10] Now, if

step_size = 3

, then it becomes: [1,2,3] - [4] [4,5,6] - [7] [7,8,9] - [10] You see how increasing

step_size

reduces the number of windows for training.

Marco

04/14/2025, 7:04 PM

As for the exogenous features, simply include a column for each and specify whether they are

hist_exog

futr_exog

. If

futr_exog

, then they must be provided in the

futr_df

argument when predicting. You can follow our tutorial on forecasting with exogenous features.

Jan

04/14/2025, 7:49 PM

Thanks again for your reply! To address your example with the sequence, I basically transform the sequence in your example to:

[1,2,3,4,2,3,4,5,3,4,5,6, ...]

. This keeps the number of windows the same as the original sequence, but obviously explodes the data by multiplying to by the

step_size

. I'm just not sure how else to incorporate the changing future exogenous variables. I can see how it works for testing - as you suggest, you just provide the latest values in for

futr_df

, but how to do it for training?

Jan

04/14/2025, 7:50 PM

Here is a slightly more elaborate example to illustrate my problem and proposed solution. Say the Y to predict is:

Copy code

Additionally, I have an exogenous feature like forecasted temperature, which for every hour forecasts the next two hours. Here is some example data:

Copy code

forecasted_at  time    temp  
0:00           0:00    50  
0:00           1:00    59  
1:00           1:00    62  
1:00           2:00    61  
2:00           2:00    60  
2:00           3:00    65  
3:00           3:00    66  
3:00           4:00    71  
4:00           4:00    75  
4:00           5:00    81  
...

(time represents the start of the interval) Let's say I take

input_size=2

and

horizon=2

. To make a forecast at 2:00 for 2:00 and 3:00, I want to use the historical prices for 0:00 and 1:00 and the temperature forecasts made at 2:00 for 2:00 and 3:00. The way I construct the data (assuming I'm taking the latest forecast as the actual), the window for this step (with

step_size=4

) look like :

Copy code

forecasted_at   time    Y    temp  
2:00            0:00    1    50  
2:00            1:00    2    62  
2:00            2:00    3*   60  
2:00            3:00    4*   65

(

is available during training but not at inference and to be forecasted). Does this make sense? How else would I do this with

step_size=1

? If my horizon is very short I can imagine making many exogenous features like "forecasted temperature at x time ahead" and setting

step_size=1

, but that becomes unwieldy when the horizon gets larger.

Tyler Nisonoff

04/15/2025, 12:02 AM

@Marco If I were to frame another way, how should one handle exogenous features (like weather forecasts) change their values not just as time progresses, but also as the forecast vintage changes. For example:

At time T-48, you have a forecast X0 for time T1...TN

At time T-47 you have a updated forecast X1 for time T1..TN

....

At time T-24, you have an updated forecast for time T..N

...

At time T, you have the actual value

it seems like the typical long-form style DF with step_size of 1 assumes that at time T the value of the future variable is static for any point in time prior to that.

Marco

04/15/2025, 1:09 PM

This is not something that we handle during training. For training, exogenous values cannot be updated. When forecasting, what you're describing is possible by simply updating the

futr_df

dataframe.

Tyler Nisonoff

04/15/2025, 1:23 PM

Got it, we see pretty poor performance without the training update, which does resolve in some architectures such as nhits when using the large step size with the expanded data frame in order to show the model the updating features at train time But as Jan pointed out here, other architectures seem to struggle with this workaround 😕

Jan

04/15/2025, 5:18 PM

Can you clarify the statement that exogenous values cannot be updated? If we go with the set-up that I laid out, doesn't the model get updated exogenous values for each window? or do the exogenous features get processed differently or wiped somehow? I tried an experiment comparing a model fitted on

step_size=1

and

step_size=input_size+horizon

(with stale features) and it looked like it produced the same result. And more generally, do you have any suggestions for how to handle this set-up? Like we should consider different model classes? It seems like this should be a pretty common set-up.

3 Views

Open in Slack

Previous Next