Comment, not question. Recently I included in a t...
# general
b
Comment, not question. Recently I included in a thread/post that I was getting substantially better results with R fable including prophet model than I could with SF and MF. With SF/MF/NF I followed the deseasonalizing approach whereas with fable I used the built in "common exogenous" variables trend and season. I decided to test using trend/season in the Nixtla approach so I extracted those using the statsmodels package. The results of that test suggest it wasn't the prophet model driving the difference, but that, at least with my current data, using the trend/season worked better than deseasonalizing. For example, with approximately 1200 series I compared the MASE for the best model from R fable and SF/MF/NF. Went from fable having the lower MASE ~65% of the time to SF/MF/NF having it ~70% of the time. Similarly, I went from having a MASE below 0.5 (twice as good as a Naive model) for SF/MF/NF for ~45% of series to 82%. Interestingly, even though I included AutoARIMA with fable, prophet "won" with the lowest MASE the vast majority of the time when working in R. BUT, with the Nixtla packages the AutoARIMA package won a solid majority of the time, beating out all of the available SF models, most of the regressors available in MF, and four neural models--supporting other work from Nixtla that traditional and MF models frequently best deep learning. Really appreciate all of the feedback I've received from this community/the Nixtla team to help me transition from fable to the Nixtlaverse.
đź‘€ 1
j
Having these as common exogenous variables here would be an interesting addition
b
@Jason Gofford I created a feature enhancement on the Nixtla SF Github page back in September. https://github.com/Nixtla/statsforecast/issues/633
t
I completely agree!
đź‘Ť 1
j
Hey. Thanks for the detailed post. Would having something that you could give your df to and have it return a tuple like train_df, future_X help? I believe that could work for all libs
b
@José Morales. Yes, I think that'd be perfect.
j
I'll start working on something and report back in the issue or here
❤️ 1
Can you share the statsmodels code you used to extract the trend and season?
b
# Create an empty list to store DataFrames for each unique ID results = [] # Group the DataFrame by unique_id grouped = HCPCS_Grouped_ts3.groupby('unique_id') # Perform STL decomposition for each group and store the results for unique_id, group_data in grouped: result = STL(group_data['y'], seasonal=13, robust=True).fit() # Create a DataFrame with the components and dates result_df = pd.DataFrame({ 'ds': group_data.index, # Use the index as dates 'unique_id': unique_id, 'seasonal': result.seasonal, 'trend': result.trend, }) results.append(result_df) # Combine the DataFrames into a final DataFrame final_df = pd.concat(results, ignore_index=True)
🙌 1
Please feel free to poke holes in this. I think what I've done is sort of between the common_xregs from fable and a decomposition model. I'm not an expert in time series forecasting, so would appreciate any expert pointing any possible flaws.
j
The only potential I can see with this from a tree-based method perspective (lightgbm, xgboost) is that physical detrending of the target is often used to overcome the problem of extrapolation. It's not clear to me if having the seasonal and trend as multivariate exogenous features would resolve this, or how you'd reliably extrapolate the trend in particular to
X_df
t
Mlforecast perspective I think (don’t want to talk for them) is to use differencing to overcome extrapolation issue, it also tends to work best in my experience over flatly de trending. Trend Features won’t allow it to extrapolate beyond it’s bounds but do help with stochastic seasonality and stuff like that.
đź‘Ť 1
j
Hey @Brian Head, looking at your example I see how you generate the training features, but how do you create the future values of the trend and seasonality?
b
So, perhaps that's a problem in my approach. I was trying to replicate the something along the lines of common_xregs or a decomposition model like in fable (see beow), neither of which require providing future frame data. When I applied the data from the STL decomposition and feed it to SF/MF in cross_validation I was thinking it didn't provide those in the "test" windows. But, I'm now thinking it must have. I noticed just a few minutes before you posted your question that my predictions looked odd when doing the actual .predict. Perhaps I put the cart before the horse. Would still like to see something along the lines of these two approaches in SF. Welcome any feedback. https://fable.tidyverts.org/reference/common_xregs.html https://otexts.com/fpp3/stl.html
Hey @José Morales, it looks like MSTL does what I'm trying to do. It decomposes the series. And, you don't have to add anything for the future frame. Could something be borrowed from it? https://nixtla.github.io/statsforecast/docs/tutorials/multipleseasonalities.html
j
Yeah I think we can implement something with that
🙌 1
b
@José Morales Thought I'd follow-up to say that I used what I had to feed the season and trend variables I created to NF models as historical exogenous variables. Doing so improved the models quite a bit. Relating this to a question asked int he #general channel about historical exogenous variables for MF. I'm not sure those algos make that possible (from what I've seen I'm doubtful), but if so it'd be a nice addition.
t
Sorry you just mean passing like a counting variable as trend to mlforecast?
b
I mean the trend and season components from STL decomposition.
t
Gotcha yeah they can help for trees as it give more things for the tree to split on. But might not be worth it generally. By specifying historical exogenous do you mean you wouldn’t pass any future values?
b
Correct
j
@Brian Head do those improvements reflect on a holdout set? Since the MSTL uses the complete series to estimate the trend and seasonality I think it could have a bit of leakage of you use those features to then perform CV
Here's what I have so far. You can use it to generate your training and future dataframes and test if the improvements reflect on data that the model hasn't seen:
Copy code
import pandas as pd
from statsforecast import StatsForecast
from statsforecast.models import MSTL, _predict_mstl_seas
from statsforecast.utils import generate_series

# fit+predict (right now I predict to get the future df structure, but you could use the MSTL prediction as well)
series = generate_series(10)
season_length = 7
horizon = 7
sf = StatsForecast(
    models=[MSTL(season_length=season_length)],
    freq='D',
)
preds = sf.fit_predict(df=series, h=horizon).reset_index().drop(columns='MSTL')

# extract features
train_features = []
future_features = []
for model in sf.fitted_[:, 0]:
    train_features.append(model.model_[['trend', 'seasonal']])
    future_df = pd.DataFrame({
        'trend': model.trend_forecaster.predict(horizon)['mean'],
        'seasonal': _predict_mstl_seas(model.model_, horizon, season_length)
    })
    future_features.append(future_df)
train_df = pd.concat([series, pd.concat(train_features).reset_index(drop=True)], axis=1)
X_df = pd.concat([preds, pd.concat(future_features).reset_index(drop=True)], axis=1).reset_index(drop=True)
đź‘Ť 1
b
Thanks, @José Morales. I'll give it a try and will report back.
@José Morales I tested this on a smaller sub-sample (n=10). In 9/10 cases it reduced the SMAPE. In several cases it dropped it by what I would consider a substantial amount. I basically withheld three months of data from a ~60 month set, and fit and predicted with and without the trend and seasonal exogenous variables and then calculated the SMAPE from the unseen actuals and predicted values. I believe that is what you were suggesting. Let me know if you see any issues. Thanks so much for working on this. I think it'll add a lot of value.
j
This is actually pretty effective now that I've tried it. I'm wondering though, given that the S and T components are in units of y, is it more appropriate to extract them before or after target transforms?
j
Hey @Brian Head. We're adding this to the new feature_engineering module and we're also adding this guide. Would you mind taking a look when you have time and let us know what you think of the example and the mstl_decomposition function?
❤️ 1
b
@José Morales Looks great. Only one minor thing, in the results should MASE be displayed as a percentage? Since it isn't a percentage error I'm thinking it shouldn't. Not central to the code or example, but might cause confusion for some. Other than that, looks good and like that you show in there the improvement in mase/smape compared to without the feature engineering. Love that y'all are putting this forward.
j
Fixed, thanks! Yeah although the MASE is 1, so it's as good as the seasonal naive. If you specify a seasonal_order in the ARIMA model the one without exog is better haha but I hope people focus on the usage and evaluate if it works for them. Thanks for suggesting this! I think it'll be very useful for mlforecast for example
🙌 1
v
@José Morales is this guide still available somewhere, currently returns page missing. https://github.com/Nixtla/statsforecast/blob/feat-eng/nbs/docs/how-to-guides/generating_features.ipynb