This message was deleted Nixtla Community #general

Join Slack

This message was deleted.

# general

Slackbot

10/12/2023, 1:11 PM

This message was deleted.

👀 1

Jason Gofford

10/12/2023, 1:16 PM

Having these as common exogenous variables here would be an interesting addition

Brian Head

10/12/2023, 1:53 PM

@Jason Gofford I created a feature enhancement on the Nixtla SF Github page back in September. https://github.com/Nixtla/statsforecast/issues/633

Tyler Blume

10/12/2023, 3:11 PM

I completely agree!

👍 1

José Morales

10/12/2023, 6:06 PM

Hey. Thanks for the detailed post. Would having something that you could give your df to and have it return a tuple like train_df, future_X help? I believe that could work for all libs

Brian Head

10/12/2023, 6:07 PM

@José Morales. Yes, I think that'd be perfect.

José Morales

10/12/2023, 6:08 PM

I'll start working on something and report back in the issue or here

❤️ 1

José Morales

10/12/2023, 6:44 PM

Can you share the statsmodels code you used to extract the trend and season?

Brian Head

10/12/2023, 6:48 PM

# Create an empty list to store DataFrames for each unique ID results = [] # Group the DataFrame by unique_id grouped = HCPCS_Grouped_ts3.groupby('unique_id') # Perform STL decomposition for each group and store the results for unique_id, group_data in grouped: result = STL(group_data['y'], seasonal=13, robust=True).fit() # Create a DataFrame with the components and dates result_df = pd.DataFrame({ 'ds': group_data.index, # Use the index as dates 'unique_id': unique_id, 'seasonal': result.seasonal, 'trend': result.trend, }) results.append(result_df) # Combine the DataFrames into a final DataFrame final_df = pd.concat(results, ignore_index=True)

🙌 1

Brian Head

10/12/2023, 7:10 PM

Please feel free to poke holes in this. I think what I've done is sort of between the common_xregs from fable and a decomposition model. I'm not an expert in time series forecasting, so would appreciate any expert pointing any possible flaws.

Jason Gofford

10/12/2023, 7:36 PM

The only potential I can see with this from a tree-based method perspective (lightgbm, xgboost) is that physical detrending of the target is often used to overcome the problem of extrapolation. It's not clear to me if having the seasonal and trend as multivariate exogenous features would resolve this, or how you'd reliably extrapolate the trend in particular to

X_df

Tyler Blume

10/12/2023, 7:40 PM

Mlforecast perspective I think (don’t want to talk for them) is to use differencing to overcome extrapolation issue, it also tends to work best in my experience over flatly de trending. Trend Features won’t allow it to extrapolate beyond it’s bounds but do help with stochastic seasonality and stuff like that.

👍 1

José Morales

10/12/2023, 8:13 PM

Hey @Brian Head, looking at your example I see how you generate the training features, but how do you create the future values of the trend and seasonality?

Brian Head

10/12/2023, 8:45 PM

So, perhaps that's a problem in my approach. I was trying to replicate the something along the lines of common_xregs or a decomposition model like in fable (see beow), neither of which require providing future frame data. When I applied the data from the STL decomposition and feed it to SF/MF in cross_validation I was thinking it didn't provide those in the "test" windows. But, I'm now thinking it must have. I noticed just a few minutes before you posted your question that my predictions looked odd when doing the actual .predict. Perhaps I put the cart before the horse. Would still like to see something along the lines of these two approaches in SF. Welcome any feedback. https://fable.tidyverts.org/reference/common_xregs.html https://otexts.com/fpp3/stl.html

Brian Head

10/12/2023, 9:44 PM

Hey @José Morales, it looks like MSTL does what I'm trying to do. It decomposes the series. And, you don't have to add anything for the future frame. Could something be borrowed from it? https://nixtla.github.io/statsforecast/docs/tutorials/multipleseasonalities.html

José Morales

10/12/2023, 10:12 PM

Yeah I think we can implement something with that

🙌 1

Brian Head

10/16/2023, 2:36 PM

@José Morales Thought I'd follow-up to say that I used what I had to feed the season and trend variables I created to NF models as historical exogenous variables. Doing so improved the models quite a bit. Relating this to a question asked int he #C031EJJMH46 channel about historical exogenous variables for MF. I'm not sure those algos make that possible (from what I've seen I'm doubtful), but if so it'd be a nice addition.

Tyler Blume

10/16/2023, 2:41 PM

Sorry you just mean passing like a counting variable as trend to mlforecast?

Brian Head

10/16/2023, 2:46 PM

I mean the trend and season components from STL decomposition.

Tyler Blume

10/16/2023, 2:58 PM

Gotcha yeah they can help for trees as it give more things for the tree to split on. But might not be worth it generally. By specifying historical exogenous do you mean you wouldn’t pass any future values?

Brian Head

10/16/2023, 3:42 PM

Correct

José Morales

10/16/2023, 4:47 PM

@Brian Head do those improvements reflect on a holdout set? Since the MSTL uses the complete series to estimate the trend and seasonality I think it could have a bit of leakage of you use those features to then perform CV

José Morales

10/16/2023, 5:04 PM

Here's what I have so far. You can use it to generate your training and future dataframes and test if the improvements reflect on data that the model hasn't seen:

Copy code

import pandas as pd
from statsforecast import StatsForecast
from statsforecast.models import MSTL, _predict_mstl_seas
from statsforecast.utils import generate_series

# fit+predict (right now I predict to get the future df structure, but you could use the MSTL prediction as well)
series = generate_series(10)
season_length = 7
horizon = 7
sf = StatsForecast(
    models=[MSTL(season_length=season_length)],
    freq='D',
)
preds = sf.fit_predict(df=series, h=horizon).reset_index().drop(columns='MSTL')

# extract features
train_features = []
future_features = []
for model in sf.fitted_[:, 0]:
    train_features.append(model.model_[['trend', 'seasonal']])
    future_df = pd.DataFrame({
        'trend': model.trend_forecaster.predict(horizon)['mean'],
        'seasonal': _predict_mstl_seas(model.model_, horizon, season_length)
    })
    future_features.append(future_df)
train_df = pd.concat([series, pd.concat(train_features).reset_index(drop=True)], axis=1)
X_df = pd.concat([preds, pd.concat(future_features).reset_index(drop=True)], axis=1).reset_index(drop=True)

👍 1

Brian Head

10/16/2023, 6:17 PM

Thanks, @José Morales. I'll give it a try and will report back.

Brian Head

10/16/2023, 8:17 PM

@José Morales I tested this on a smaller sub-sample (n=10). In 9/10 cases it reduced the SMAPE. In several cases it dropped it by what I would consider a substantial amount. I basically withheld three months of data from a ~60 month set, and fit and predicted with and without the trend and seasonal exogenous variables and then calculated the SMAPE from the unseen actuals and predicted values. I believe that is what you were suggesting. Let me know if you see any issues. Thanks so much for working on this. I think it'll add a lot of value.

Jason Gofford

10/28/2023, 5:31 AM

This is actually pretty effective now that I've tried it. I'm wondering though, given that the S and T components are in units of y, is it more appropriate to extract them before or after target transforms?

José Morales

11/15/2023, 9:39 PM

Hey @Brian Head. We're adding this to the new feature_engineering module and we're also adding this guide. Would you mind taking a look when you have time and let us know what you think of the example and the mstl_decomposition function?

❤️ 1

Brian Head

11/15/2023, 9:54 PM

@José Morales Looks great. Only one minor thing, in the results should MASE be displayed as a percentage? Since it isn't a percentage error I'm thinking it shouldn't. Not central to the code or example, but might cause confusion for some. Other than that, looks good and like that you show in there the improvement in mase/smape compared to without the feature engineering. Love that y'all are putting this forward.

José Morales

11/15/2023, 10:17 PM

Fixed, thanks! Yeah although the MASE is 1, so it's as good as the seasonal naive. If you specify a seasonal_order in the ARIMA model the one without exog is better haha but I hope people focus on the usage and evaluate if it works for them. Thanks for suggesting this! I think it'll be very useful for mlforecast for example

🙌 1

Valeriy

12/27/2023, 6:50 PM

@José Morales is this guide still available somewhere, currently returns page missing. https://github.com/Nixtla/statsforecast/blob/feat-eng/nbs/docs/how-to-guides/generating_features.ipynb

José Morales

12/27/2023, 7:30 PM

Yes, here: https://nixtlaverse.nixtla.io/statsforecast/docs/how-to-guides/generating_features.html

7 Views

Open in Slack

Previous Next