I have a question about how I should be thinking a...
# neural-forecast
j
I have a question about how I should be thinking about
step_size
when using the LSTM. Say I need to predict the next 24 hours every hour and I want to use the last 48 hours to do so, and I have future exogenous features that change every hour (for example weather forecasts), and turn into actuals when the time passes beyond the present. My data frame right now consists of non-overlapping windows of 72 steps long, where the first 48 steps are mostly duplicates, as the actual values of the exogenous features changes only one step at the time. So I'm basically using
input_size=48
,
horizon=24
and
step_size=72
when training an LSTM. However, I'm not sure that I'm doing this right as it seems like the model trains very poorly even though there's a lot of data (for example, the forecasted values rarely start from the last known values), and the predictions on a future hold-out set are very poor. Am I doing the windowing correctly? Or should I be feeding only 25 hour windows to the model (so
input_size=1
,
horizon=24
and
step_size=25
) where the first row are the latest actuals and have the LSTM do the tracking of the past? And is this different for other architectures such as NHITS?
m
Hello! In my opinion,
step_size
should almost always be set to 1. What
step_size
does, is that it controls the distance between consecutive temporal windows during training, When set to 1, you ensure the maximum number of training windows. When you increase it, the number of windows decreases, which probably explains why the model performs poorly. In your case, I would set
input_size=48
,
horizon=24
,
step_size=1
. Also, just to be sure, your dataframe should be in the long format (see here). And I also think that you can use a better model than LSTM, something like NHITS actually. Definitely worth trying.
j
Thanks for your reply Marco - I'm not sure I understand how to incorporate the changing forecasts with
step_size=1
though. If I understand it well, the windowing code would step through the data one step at a time, but the values for the next 24 value of the future exogenous features change every step. To account for this, I thought it could work to copy the full window each time (so if I have X time steps in my data, my training DF has
X * step_size
rows, and so the same number of windows as my original DF (with non-updating exogenous features) and
step_size=1
) and set the
step_size = input_size + horizon
, which is very inefficient but at least makes sure that the model sees the latest data (if I'm doing it correctly). Could you clarify this? Or is there maybe an example in the documentation that shows how to work with future exogenous variables that change over time? As for your comment on the long format - that's only with respect to multiple outcomes, right? The examples I've seen in the documentation (example) are wide with respect to the features (aka, the DFs have multiple columns for each exogenous feature).
m
Suppose your data is [1,2,3,4,5,6,7,8,9,10] and your model has a horizon of 1 and input size of 3. With
step_size=1
, we internally create training windows as: ([input sequence] - [target sequence]) [1,2,3] - [4] [2,3,4] - [5] [3,4,5] - [6] [4,5,6] - [7] [5,6,7] - [8] [6,7,8] - [9] [7,8,9] - [10] Now, if
step_size = 3
, then it becomes: [1,2,3] - [4] [4,5,6] - [7] [7,8,9] - [10] You see how increasing
step_size
reduces the number of windows for training.
As for the exogenous features, simply include a column for each and specify whether they are
hist_exog
or
futr_exog
. If
futr_exog
, then they must be provided in the
futr_df
argument when predicting. You can follow our tutorial on forecasting with exogenous features.
j
Thanks again for your reply! To address your example with the sequence, I basically transform the sequence in your example to:
[1,2,3,4,2,3,4,5,3,4,5,6, ...]
. This keeps the number of windows the same as the original sequence, but obviously explodes the data by multiplying to by the
step_size
. I'm just not sure how else to incorporate the changing future exogenous variables. I can see how it works for testing - as you suggest, you just provide the latest values in for
futr_df
, but how to do it for training?
Here is a slightly more elaborate example to illustrate my problem and proposed solution. Say the Y to predict is:
Copy code
time   Y  
0:00   1  
1:00   2  
2:00   3  
3:00   4  
4:00   5  
...
Additionally, I have an exogenous feature like forecasted temperature, which for every hour forecasts the next two hours. Here is some example data:
Copy code
forecasted_at  time    temp  
0:00           0:00    50  
0:00           1:00    59  
1:00           1:00    62  
1:00           2:00    61  
2:00           2:00    60  
2:00           3:00    65  
3:00           3:00    66  
3:00           4:00    71  
4:00           4:00    75  
4:00           5:00    81  
...
(time represents the start of the interval) Let's say I take
input_size=2
and
horizon=2
. To make a forecast at 2:00 for 2:00 and 3:00, I want to use the historical prices for 0:00 and 1:00 and the temperature forecasts made at 2:00 for 2:00 and 3:00. The way I construct the data (assuming I'm taking the latest forecast as the actual), the window for this step (with
step_size=4
) look like :
Copy code
forecasted_at   time    Y    temp  
2:00            0:00    1    50  
2:00            1:00    2    62  
2:00            2:00    3*   60  
2:00            3:00    4*   65
(
*
is available during training but not at inference and to be forecasted). Does this make sense? How else would I do this with
step_size=1
? If my horizon is very short I can imagine making many exogenous features like "forecasted temperature at x time ahead" and setting
step_size=1
, but that becomes unwieldy when the horizon gets larger.
t
@Marco If I were to frame another way, how should one handle exogenous features (like weather forecasts) change their values not just as time progresses, but also as the forecast vintage changes. For example:
At time T-48, you have a forecast X0 for time T1...TN
At time T-47 you have a updated forecast X1 for time T1..TN
....
At time T-24, you have an updated forecast for time T..N
...
At time T, you have the actual value
it seems like the typical long-form style DF with step_size of 1 assumes that at time T the value of the future variable is static for any point in time prior to that.
m
This is not something that we handle during training. For training, exogenous values cannot be updated. When forecasting, what you're describing is possible by simply updating the
futr_df
dataframe.
t
Got it, we see pretty poor performance without the training update, which does resolve in some architectures such as nhits when using the large step size with the expanded data frame in order to show the model the updating features at train time But as Jan pointed out here, other architectures seem to struggle with this workaround 😕
j
Can you clarify the statement that exogenous values cannot be updated? If we go with the set-up that I laid out, doesn't the model get updated exogenous values for each window? or do the exogenous features get processed differently or wiped somehow? I tried an experiment comparing a model fitted on
step_size=1
and
step_size=input_size+horizon
(with stale features) and it looked like it produced the same result. And more generally, do you have any suggestions for how to handle this set-up? Like we should consider different model classes? It seems like this should be a pretty common set-up.