This message was deleted.
# mlforecast
s
This message was deleted.
c
Problem set up Say we are predicting sales per store per month. • We want to predict sales 1 and 2 months into the future. • We’re using a one-model-per-step approach. We’ll train two models: One will predict sales for February, the other will predict sales for March. With just “regular” static features (and dynamic features that are known for future
ds
), we could do…
Copy code
fcst = MLForecast(models=models, freq=1, =[1,2])

fcst.fit(X,id_col='unique_id',time_col='ds', target_col='y',static_features=static_features,max_horizon=2 ,dropna=False)
Adding a lagged exogenous feature Now let’s say we want to add a feature that counts inbound
inquires
received by sales in a month We won’t know this variable at future months and couldn’t calculate it without its own forecast, unlike I think the examples of price catalog or fourier. But we will know it for all historical months (and imagine we’re predicting sales of a product that can take several months to close). So the lagged
inquires
in prior months could be informative of future sales. Let’s add it Prediction I think I know how to add
inquires
to `X_df`: use transform_exog() to generate
inquiries_lag1
and
inquires_lag2
. Model1 and model2 will each get “sent” the right rows of
X_df
to make their predictions. Training? I’m unsure how to set up training without leakage using built-ins Take the training row that has the features and target for one store in December 2023. I believe this row will be used twice in training: • model1 will be trained to fit December’s target from one month prior (~November anchor date). It can “see” the real values of
inquires_lag1
and
inquires_lag2
(data from October and November). • model2 will be trained to fit December’s target from two months prior (~October anchor date). Now it shouldn’t be able to see the real value of
inquiries_lag1
on the December row in training data. That data refers to inquiries in November, which is after the October anchor date How do I set up the model training so when model2 is fitting the December data, it can’t “see” the December row’s value of
inquiries_lag1
? Or am I misunderstanding a piece? Thanks again!
j
Hey. Thanks for using mlforecast. When we build the target we take each row and add the future targets, not the past ones. So I believe the statement is: • model1 uses december's row to predict december • model2 uses december's row to predict january from next year You can see what the training set looks like by running preprocess:
Copy code
from mlforecast import MLForecast
from mlforecast.feature_engineering import transform_exog
from mlforecast.utils import generate_series, generate_prices_for_series

series = generate_series(2, freq='M', equal_ends=True)
prices = generate_prices_for_series(series)
prices_lags = transform_exog(prices, lags=[1, 2])
series_wp = series.merge(prices_lags, on=['unique_id', 'ds'])
fcst = MLForecast(models=[], freq='M')
series_wp.head()
fcst.preprocess(series_wp, max_horizon=2).head()
In this example the second model uses april's row to predict may. Please let us know if this helps
c
That's awesome! Thanks a bunch @José Morales this is really helpful! And thanks for correcting my description of how dates map to targets, that was key I ran this and it makes sense (with a slight edit for my example to get prices on monthly vs daily frequency). I can see, I don't need to mask any of the lagged features during training of a one-model-per-step forecast, because each model's target col gets shifted to line up the model's target with the row of features it should be able to see. And that of course masks all "future" lag features That's fantastic. Thank you again for talking me through it! And a great reminder I should use
preprocess()
more
j
Glad to be of help. Let us know if you have any more questions or run into any issues
gratitude thank you 1