Jan
04/11/2025, 11:34 PMstep_size
when using the LSTM. Say I need to predict the next 24 hours every hour and I want to use the last 48 hours to do so, and I have future exogenous features that change every hour (for example weather forecasts), and turn into actuals when the time passes beyond the present.
My data frame right now consists of non-overlapping windows of 72 steps long, where the first 48 steps are mostly duplicates, as the actual values of the exogenous features changes only one step at the time. So I'm basically using input_size=48
, horizon=24
and step_size=72
when training an LSTM. However, I'm not sure that I'm doing this right as it seems like the model trains very poorly even though there's a lot of data (for example, the forecasted values rarely start from the last known values), and the predictions on a future hold-out set are very poor.
Am I doing the windowing correctly? Or should I be feeding only 25 hour windows to the model (so input_size=1
, horizon=24
and step_size=25
) where the first row are the latest actuals and have the LSTM do the tracking of the past? And is this different for other architectures such as NHITS?Marco
04/14/2025, 1:08 PMstep_size
should almost always be set to 1.
What step_size
does, is that it controls the distance between consecutive temporal windows during training, When set to 1, you ensure the maximum number of training windows. When you increase it, the number of windows decreases, which probably explains why the model performs poorly.
In your case, I would set input_size=48
, horizon=24
, step_size=1
.
Also, just to be sure, your dataframe should be in the long format (see here).
And I also think that you can use a better model than LSTM, something like NHITS actually. Definitely worth trying.Jan
04/14/2025, 5:00 PMstep_size=1
though. If I understand it well, the windowing code would step through the data one step at a time, but the values for the next 24 value of the future exogenous features change every step. To account for this, I thought it could work to copy the full window each time (so if I have X time steps in my data, my training DF has X * step_size
rows, and so the same number of windows as my original DF (with non-updating exogenous features) and step_size=1
) and set the step_size = input_size + horizon
, which is very inefficient but at least makes sure that the model sees the latest data (if I'm doing it correctly). Could you clarify this? Or is there maybe an example in the documentation that shows how to work with future exogenous variables that change over time?
As for your comment on the long format - that's only with respect to multiple outcomes, right? The examples I've seen in the documentation (example) are wide with respect to the features (aka, the DFs have multiple columns for each exogenous feature).Marco
04/14/2025, 7:02 PMstep_size=1
, we internally create training windows as: ([input sequence] - [target sequence])
[1,2,3] - [4]
[2,3,4] - [5]
[3,4,5] - [6]
[4,5,6] - [7]
[5,6,7] - [8]
[6,7,8] - [9]
[7,8,9] - [10]
Now, if step_size = 3
, then it becomes:
[1,2,3] - [4]
[4,5,6] - [7]
[7,8,9] - [10]
You see how increasing step_size
reduces the number of windows for training.Marco
04/14/2025, 7:04 PMhist_exog
or futr_exog
. If futr_exog
, then they must be provided in the futr_df
argument when predicting. You can follow our tutorial on forecasting with exogenous features.Jan
04/14/2025, 7:49 PM[1,2,3,4,2,3,4,5,3,4,5,6, ...]
. This keeps the number of windows the same as the original sequence, but obviously explodes the data by multiplying to by the step_size
. I'm just not sure how else to incorporate the changing future exogenous variables. I can see how it works for testing - as you suggest, you just provide the latest values in for futr_df
, but how to do it for training?Jan
04/14/2025, 7:50 PMtime Y
0:00 1
1:00 2
2:00 3
3:00 4
4:00 5
...
Additionally, I have an exogenous feature like forecasted temperature, which for every hour forecasts the next two hours. Here is some example data:
forecasted_at time temp
0:00 0:00 50
0:00 1:00 59
1:00 1:00 62
1:00 2:00 61
2:00 2:00 60
2:00 3:00 65
3:00 3:00 66
3:00 4:00 71
4:00 4:00 75
4:00 5:00 81
...
(time represents the start of the interval)
Let's say I take input_size=2
and horizon=2
. To make a forecast at 2:00 for 2:00 and 3:00, I want to use the historical prices for 0:00 and 1:00 and the temperature forecasts made at 2:00 for 2:00 and 3:00. The way I construct the data (assuming I'm taking the latest forecast as the actual), the window for this step (with step_size=4
) look like :
forecasted_at time Y temp
2:00 0:00 1 50
2:00 1:00 2 62
2:00 2:00 3* 60
2:00 3:00 4* 65
(*
is available during training but not at inference and to be forecasted).
Does this make sense? How else would I do this with step_size=1
? If my horizon is very short I can imagine making many exogenous features like "forecasted temperature at x time ahead" and setting step_size=1
, but that becomes unwieldy when the horizon gets larger.Tyler Nisonoff
04/15/2025, 12:02 AMAt time T-48, you have a forecast X0 for time T1...TN
At time T-47 you have a updated forecast X1 for time T1..TN
....
At time T-24, you have an updated forecast for time T..N
...
At time T, you have the actual valueit seems like the typical long-form style DF with step_size of 1 assumes that at time T the value of the future variable is static for any point in time prior to that.
Marco
04/15/2025, 1:09 PMfutr_df
dataframe.Tyler Nisonoff
04/15/2025, 1:23 PMJan
04/15/2025, 5:18 PMstep_size=1
and step_size=input_size+horizon
(with stale features) and it looked like it produced the same result.
And more generally, do you have any suggestions for how to handle this set-up? Like we should consider different model classes? It seems like this should be a pretty common set-up.