Hello,
I have been working in R with Hyndman et a...

# generalb

Brian Head

09/08/2023, 4:11 PMHello,
I have been working in R with Hyndman et al.'s fable family of packages. I'm trying to replicate my work using StatsForecast and MLforecast to see if I can improve performance (both accuracy and speed).
Right now it appears what I've produced using Fable family of packages and Nixtla packages are getting somewhat similar results in terms of MASE on data trained using CV. However, in some cases Fable does better and others Nixtla. I believe there are two reasons: 1) I'm able to use Prophet with Fable (I know Nixtla's research suggestes ensembles using StatsForcast can produce on par or better, but I'm looking at ensembles and prophet is still getting better results); and 2) in Fable I'm able to use the "common exogenous" (https://fable.tidyverts.org/reference/common_xregs.html) variables trend and season.
I've searchd for answers to these questions in previous posts, but didn't find answers that directly addressed them.
My questions are:
• Is there a way to replicate the common exogenous regressors using Nixtla packages? I'm thinking I might need to decompose the series and then merge that data into the dataframe. If that's right, how do I deal with needing those external regressors in the future frame for forecasts? Or is there something else I might have missed?
• Is there a way to incorporate prophet into MLforecast? I think only sklearn, xgboost, and lightgbm work, but wanted to confirm. I have been trying to also forcast using the prophet API and them merging the data, but it'd be better to have it all in one set from the get-go.
Appreciate any guidance.

❤️ 1

m

Mariana Menchero

09/08/2023, 4:27 PMHi **@Brian Head** Thanks for using StatsForecast. Regarding your first question, we don't have that functionality yet in StatsForecast. If that is something that interests you, please help us by opening a new issue https://github.com/Nixtla/statsforecast/issues/new/choose

❤️ 1

j

José Morales

09/08/2023, 5:14 PMIn mlforecast you can use target transformations to remove the trend and seasonality instead of using them as features (example guide).
About prophet, mlforecast uses a single global model, whereas prophet works on a per-serie basis. Are you modeling a single serie in mlforecast?

❤️ 1

m

Mairon Cesar Simoes Chaves

09/08/2023, 7:59 PMHello Brian! Just like you, I also came from fable and modeltime! I'm just a Nixtla user, but I recently experimented with adding common external regressors (fourier terms) as dynamic features (future frame). I'll send you how I did it:

Copy code

```
df_completo_encoded.set_index('ds',inplace = True)
#Harmonicos anuais
df_completo_encoded['sin365'] = np.sin(2 * np.pi * 1 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['cos365'] = np.cos(2 * np.pi * 1 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['sin365_2'] = np.sin(2 * np.pi * 2 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['cos365_2'] = np.cos(2 * np.pi * 2 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['sin365_3'] = np.sin(2 * np.pi * 3 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['cos365_3'] = np.cos(2 * np.pi * 3 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['sin365_4'] = np.sin(2 * np.pi * 4 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['cos365_4'] = np.cos(2 * np.pi * 4 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['sin365_5'] = np.sin(2 * np.pi * 5 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['cos365_5'] = np.cos(2 * np.pi * 5 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['sin365_6'] = np.sin(2 * np.pi * 6 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['cos365_6'] = np.cos(2 * np.pi * 6 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['sin365_7'] = np.sin(2 * np.pi * 7 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['cos365_7'] = np.cos(2 * np.pi * 7 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['sin365_8'] = np.sin(2 * np.pi * 8 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['cos365_8'] = np.cos(2 * np.pi * 8 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['sin365_9'] = np.sin(2 * np.pi * 9 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['cos365_9'] = np.cos(2 * np.pi * 9 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['sin365_10'] = np.sin(2 * np.pi * 10 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['cos365_10'] = np.cos(2 * np.pi * 10 * df_completo_encoded.index.dayofyear / 360)
#Harmonicos mensais
df_completo_encoded['sin30'] = np.sin(2 * np.pi * 1 * df_completo_encoded.index.day / 30)
df_completo_encoded['cos30'] = np.cos(2 * np.pi * 1 * df_completo_encoded.index.day / 30)
df_completo_encoded['sin30_2'] = np.sin(2 * np.pi * 2 * df_completo_encoded.index.day / 30)
df_completo_encoded['cos30_2'] = np.cos(2 * np.pi * 2 * df_completo_encoded.index.day / 30)
df_completo_encoded['sin30_3'] = np.sin(2 * np.pi * 3 * df_completo_encoded.index.day / 30)
df_completo_encoded['cos30_3'] = np.cos(2 * np.pi * 3 * df_completo_encoded.index.day / 30)
#Harmonicos dia da semana
df_completo_encoded['sin7_1'] = np.sin(2 * np.pi * 1 * df_completo_encoded.index.day_of_week / 7)
df_completo_encoded['cos7_1'] = np.cos(2 * np.pi * 1 * df_completo_encoded.index.day_of_week / 7)
df_completo_encoded['sin7_2'] = np.sin(2 * np.pi * 2 * df_completo_encoded.index.day_of_week / 7)
df_completo_encoded['cos7_2'] = np.cos(2 * np.pi * 2 * df_completo_encoded.index.day_of_week / 7)
df_completo_encoded['sin7_3'] = np.sin(2 * np.pi * 3 * df_completo_encoded.index.day_of_week / 7)
df_completo_encoded['cos7_3'] = np.cos(2 * np.pi * 3 * df_completo_encoded.index.day_of_week / 7)
df_completo_encoded.reset_index(drop = False,inplace = True)
train = df_completo_encoded.query("ds < '2023-07-07'")
test = df_completo_encoded.query("ds >= '2023-07-07'")
from statsforecast import StatsForecast as sf
from sklearn.linear_model import Ridge, Lasso
from sklearn.ensemble import RandomForestRegressor,GradientBoostingRegressor , ExtraTreesRegressor
from xgboost import XGBRegressor
from lightgbm import LGBMRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neural_network import MLPRegressor
from mlforecast.target_transforms import Differences
from mlforecast.utils import PredictionIntervals
from window_ops.ewm import ewm_mean
from window_ops.rolling import rolling_mean, seasonal_rolling_mean,rolling_min, rolling_max, rolling_std
mlf = MLForecast(
freq = 'D',
models=[ XGBRegressor(n_jobs = -1),LGBMRegressor(n_jobs = -1)],
target_transforms=[Differences([1,7])],
lag_transforms={
1: [(rolling_mean, 2),(rolling_mean, 3),(rolling_mean, 4),(rolling_mean, 5),(rolling_mean, 6),(rolling_mean, 7),
(rolling_mean, 7), (rolling_mean, 14), (rolling_mean, 28),(ewm_mean, 0.9), expanding_mean,
(rolling_min,7), (rolling_min,14),(rolling_min,28),
(rolling_max,7), (rolling_max,14),(rolling_max,28),
(rolling_std,2),(rolling_std,3),(rolling_std,4),(rolling_std,5),(rolling_std,6),(rolling_std,7), (rolling_std,14),(rolling_std,28),
(diff,1),(diff,2),(diff,3),(diff,4),(diff,5),(diff,6),(diff,7),(diff,14),(diff,21),(diff,28)]
},
lags=[1,2,3,4,6,7,14,21,28],
date_features=['month', 'year', 'day_of_week', 'day_of_year','is_month_start','quarter','days_in_month'],
num_threads=32
)
aux = mlf.preprocess( train,
id_col='unique_id',
time_col='ds',
target_col='y',
static_features=['ADI','CV2','cluster_0','cluster_1','cluster_2','cluster_3','cluster_4'],)
%%time
mlf.fit( train,
id_col='unique_id',
#max_horizon = 47,
time_col='ds',
target_col='y',
static_features= ['gtin','ADI','CV2','cluster_0','cluster_1','cluster_2','cluster_3','cluster_4'],)
# Filtrar colunas do DataFrame que começam com os prefixos desejados
filtered_columns = df_completo_encoded.filter(regex='|'.join( ['sin', 'cos']))
# Adicionar as colunas 'unique_id' e 'ds' ao início da lista
selected_columns = ['unique_id', 'ds'] + filtered_columns.columns.tolist()
forecasts = mlf.predict(47,
dynamic_dfs=[test[selected_columns]]
)
forecasts.head()
```

❤️ 2

b

Brian Head

09/13/2023, 8:43 PMj

José Morales

09/13/2023, 9:05 PMCan you share your statsforecast code?

b

Brian Head

09/14/2023, 6:31 PMThanks, **@José Morales**. I was able to figure out the problem. I had a bad join.

Brian Head

10/05/2023, 1:11 PMj

José Morales

10/05/2023, 3:32 PMDo you get better results with a one model per serie? We don't have it on the roadmap but it's something you can easily do, e.g.

Copy code

```
id2model = {}
predictions = []
for uid in df['unique_id'].unique():
uid_df = df[df['unique_id'] == uid]
fcst = MLForecast(...)
fcst.fit(uid_df)
id2model[uid] = fcst
predictions.append(fcst.predict(10))
predictions = pd.concat(predictions)
```

b

Brian Head

10/06/2023, 2:52 PMThanks, **@José Morales**. The global model and local models provided similar results. Appreciate the suggestioned solution.

2 Views