Hello! I'm just getting started with the library a...
# neural-forecast
t
Hello! I'm just getting started with the library and having a lot of fun with it! My goal is to set up a model that predicts 24 hours worth of prices every day. I was originally going to do this by training up to some date with horizon=24, and for the next N days, call
nf.predict(futr_df=<day-to-pred>)
However, it seems that this always returns a dataframe with a ds column with just the next 24 hours after where I stopped training. Is there some way to apply to model to the next N days without retraining it every time? or would i have to retrain / finetune on the data since then? Perhaps the latter is the only way to support historical features?
k
Hey @Tyler Nisonoff, Thanks for using our library. It seems to me that you are looking for the
cross_validation
method. • https://nixtla.github.io/neuralforecast/core.html#neuralforecast.cross_validationhttps://nixtla.github.io/statsforecast/examples/crossvalidation.html
t
Thanks for the response @Kin Gtz. Olivares! Im not sure thats what im looking for though -- I want to use the model over many days in the future, given that Ive trained it at some point in the past. So imagine today i train a model using historical data. and then over the next month, I want to make daily predictions. Is cross_validation still the recommended tool for that?
k
c
Hi @Tyler Nisonoff. Can you give us more details of the use case? Here are three different cases: • Today you want to make the forecast for the next 30 days. Without waiting for the newer data you will need to recursively produce forecasts (not recommended, and not supported), using one forecast as input for the next day. • Make 1 daily prediction each day during the next month, using the latest data as each day finishes. You can save the model and load it every new day as Kin mentioned. • Simulate historic forecasts for the last 30 days (you already have the real data). In this case you can use
cross_validation
. Does this help? Which use case are you considering?
t
Thanks Kin / Chistian! The use-case I'm going for is (2). My problem isn't the save / load -- but rather if I train up to, say, may 10th, predicting may 11th works fine, but if I call:
Copy code
nf.predict(futr_df=to_pred)
where
to_pred
is a dataframe with a
ds
column with ds = May 12th and may 12ths Exogenous Variables, the returning DF will have a ds column for may 11th. The predictions seem reasonable, so I'm just hacking around this by changing the datatimes returned, but I took it as a signal that maybe I'm doing something wrong. I can try to come up with a simple repro tomorrow if helpful
c
thanks for the additional details. The problem is that the predict function does not "update" the stored dataset, so by default it can only predict the immediate values after the train set. However, you can pass the new data each time using the
df
parameter of the
predict
function and the dates will match!
You need to pass the latest data plus all the historic information you want to use. The model will only use the information from the new
df
to predict the future values after
df
(and use
futr_df
for the future exogenous variables)
t
ahhh okay thank you I'll test that out tomorrow! Maybe I'm using Future exogenous variables incorrectly then. If I'm trying to predict may 12, I was having the forecast variables (say a generation forecast) for may 12 be set in a dataframe with
ds ==  may 11
and trying to pass that into futur_df, but I thinkk what you're saying is it should be set for
ds == may 12
, and I should pass that as
futr_df
and the rest of the historical data in as
df
c
yes, all the information must match their actual dates. So if you have a forecast for may 12 at 6pm, use that datestamp. Also note that the
futr_df
has to have the exact same variables used for training, and can only have 1 forecast horizon (24 hs for example). And lastly, should have the immediate values after the database in
df
.
but dont forget to pass the data in
df
as well! if not, it will use the stored information from the training data.
t
great that all makes sense, thank you!
so if i have a horizon of 24 (hourly data), and an input size of 5*24, and just future exogenous variables (to keep it simple) What should the shape of
df
and
futr_df
be (in terms of rows)? seems like
futr_df
would have 24 rows, and if im predicting may 12, they'd have ds for all the may 12 hours. If ive trained up to, may 10th, I'm a bit confused what the
df
should look like
c
Think of the
predict
function as the
forward
step of the model. It will receive `df`+`futr_df` as inputs. •
df
should end at 11pm may 11 (or the last timestamp before the one you want to predict), and have at least 5*24 rows if you only have 1 time series (the function is intelligent, if you pass more data, it will only use the last 5*24 timestamps). •
futr_df
should have the exogenous future values for the date you are forecasting, may 12 in this case, with 24 rows. This applies for any date you are forecasting, once you pass a new
df
, it will not use the training data ending in may 10th.
t
awesome, thanks again for the help!