This message was deleted Nixtla Community #general

Join Slack

This message was deleted.

# general

Slackbot

09/08/2023, 4:11 PM

This message was deleted.

❤️ 1

Mariana Menchero

09/08/2023, 4:27 PM

Hi @Brian Head Thanks for using StatsForecast. Regarding your first question, we don't have that functionality yet in StatsForecast. If that is something that interests you, please help us by opening a new issue https://github.com/Nixtla/statsforecast/issues/new/choose

❤️ 1

José Morales

09/08/2023, 5:14 PM

In mlforecast you can use target transformations to remove the trend and seasonality instead of using them as features (example guide). About prophet, mlforecast uses a single global model, whereas prophet works on a per-serie basis. Are you modeling a single serie in mlforecast?

❤️ 1

Mairon Cesar Simoes Chaves

09/08/2023, 7:59 PM

Hello Brian! Just like you, I also came from fable and modeltime! I'm just a Nixtla user, but I recently experimented with adding common external regressors (fourier terms) as dynamic features (future frame). I'll send you how I did it:

Copy code

df_completo_encoded.set_index('ds',inplace = True)

#Harmonicos anuais
df_completo_encoded['sin365'] = np.sin(2 * np.pi * 1 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['cos365'] = np.cos(2 * np.pi * 1 *  df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['sin365_2'] = np.sin(2 * np.pi * 2 *  df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['cos365_2'] = np.cos(2 * np.pi * 2 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['sin365_3'] = np.sin(2 * np.pi * 3 *  df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['cos365_3'] = np.cos(2 * np.pi * 3 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['sin365_4'] = np.sin(2 * np.pi * 4 *  df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['cos365_4'] = np.cos(2 * np.pi * 4 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['sin365_5'] = np.sin(2 * np.pi * 5 *  df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['cos365_5'] = np.cos(2 * np.pi * 5 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['sin365_6'] = np.sin(2 * np.pi * 6 *  df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['cos365_6'] = np.cos(2 * np.pi * 6 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['sin365_7'] = np.sin(2 * np.pi * 7 *  df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['cos365_7'] = np.cos(2 * np.pi * 7 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['sin365_8'] = np.sin(2 * np.pi * 8 *  df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['cos365_8'] = np.cos(2 * np.pi * 8 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['sin365_9'] = np.sin(2 * np.pi * 9 *  df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['cos365_9'] = np.cos(2 * np.pi * 9 * df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['sin365_10'] = np.sin(2 * np.pi * 10 *  df_completo_encoded.index.dayofyear / 360)
df_completo_encoded['cos365_10'] = np.cos(2 * np.pi * 10 * df_completo_encoded.index.dayofyear / 360)

#Harmonicos mensais
df_completo_encoded['sin30'] = np.sin(2 * np.pi * 1 * df_completo_encoded.index.day / 30)
df_completo_encoded['cos30'] = np.cos(2 * np.pi * 1 *  df_completo_encoded.index.day / 30)
df_completo_encoded['sin30_2'] = np.sin(2 * np.pi * 2 *  df_completo_encoded.index.day / 30)
df_completo_encoded['cos30_2'] = np.cos(2 * np.pi * 2 * df_completo_encoded.index.day / 30)
df_completo_encoded['sin30_3'] = np.sin(2 * np.pi * 3 *  df_completo_encoded.index.day / 30)
df_completo_encoded['cos30_3'] = np.cos(2 * np.pi * 3 * df_completo_encoded.index.day / 30)

#Harmonicos dia da semana
df_completo_encoded['sin7_1'] = np.sin(2 * np.pi * 1 * df_completo_encoded.index.day_of_week / 7)
df_completo_encoded['cos7_1'] = np.cos(2 * np.pi * 1 *  df_completo_encoded.index.day_of_week / 7)
df_completo_encoded['sin7_2'] = np.sin(2 * np.pi * 2 *  df_completo_encoded.index.day_of_week / 7)
df_completo_encoded['cos7_2'] = np.cos(2 * np.pi * 2 * df_completo_encoded.index.day_of_week / 7)
df_completo_encoded['sin7_3'] = np.sin(2 * np.pi * 3 *  df_completo_encoded.index.day_of_week / 7)
df_completo_encoded['cos7_3'] = np.cos(2 * np.pi * 3 * df_completo_encoded.index.day_of_week / 7)

df_completo_encoded.reset_index(drop = False,inplace = True)

train = df_completo_encoded.query("ds < '2023-07-07'")
test = df_completo_encoded.query("ds >= '2023-07-07'")


from statsforecast import StatsForecast as sf

from sklearn.linear_model import Ridge, Lasso
from sklearn.ensemble import RandomForestRegressor,GradientBoostingRegressor ,  ExtraTreesRegressor
from xgboost import XGBRegressor
from lightgbm import LGBMRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neural_network import MLPRegressor
from mlforecast.target_transforms import Differences
from mlforecast.utils import PredictionIntervals
from window_ops.ewm import ewm_mean
from window_ops.rolling import rolling_mean, seasonal_rolling_mean,rolling_min, rolling_max, rolling_std


mlf = MLForecast(
    freq = 'D',
    models=[ XGBRegressor(n_jobs = -1),LGBMRegressor(n_jobs = -1)],
     target_transforms=[Differences([1,7])],
     lag_transforms={
                        1: [(rolling_mean, 2),(rolling_mean, 3),(rolling_mean, 4),(rolling_mean, 5),(rolling_mean, 6),(rolling_mean, 7),
                            (rolling_mean, 7), (rolling_mean, 14), (rolling_mean, 28),(ewm_mean, 0.9), expanding_mean,
                            (rolling_min,7), (rolling_min,14),(rolling_min,28),
                            (rolling_max,7), (rolling_max,14),(rolling_max,28),
                            (rolling_std,2),(rolling_std,3),(rolling_std,4),(rolling_std,5),(rolling_std,6),(rolling_std,7), (rolling_std,14),(rolling_std,28),
                            (diff,1),(diff,2),(diff,3),(diff,4),(diff,5),(diff,6),(diff,7),(diff,14),(diff,21),(diff,28)]
                      },
    lags=[1,2,3,4,6,7,14,21,28],
   date_features=['month', 'year', 'day_of_week', 'day_of_year','is_month_start','quarter','days_in_month'],
      num_threads=32
                    )



aux = mlf.preprocess( train,
    id_col='unique_id',
    time_col='ds',
    target_col='y',
    static_features=['ADI','CV2','cluster_0','cluster_1','cluster_2','cluster_3','cluster_4'],)


%%time

mlf.fit( train,
    id_col='unique_id',
    #max_horizon = 47,
    time_col='ds',
    target_col='y',
    static_features= ['gtin','ADI','CV2','cluster_0','cluster_1','cluster_2','cluster_3','cluster_4'],)


# Filtrar colunas do DataFrame que começam com os prefixos desejados
filtered_columns = df_completo_encoded.filter(regex='|'.join( ['sin', 'cos']))

# Adicionar as colunas 'unique_id' e 'ds' ao início da lista
selected_columns = ['unique_id', 'ds'] + filtered_columns.columns.tolist()

forecasts = mlf.predict(47, 
                        dynamic_dfs=[test[selected_columns]]

                       )
forecasts.head()

❤️ 2

Brian Head

09/13/2023, 8:43 PM

@José Morales I'm modeling many series. I've tried followingthe guidance you provided and documentation I can find, but I'm not getting anywhere near as good of results from mlforecast as with statsforecast. I've tried multiple combinations of lags and transformations. For example, I'm working on publicly available data so I can share what I'm doing, but I get the same issue (significantly better accuracy [mape, smape, mase] out of statsforecast models compared to mlforecast models). This is the open data I'm using: #load M3 data to use for example df, *_ = M3.load('./data', group='Monthly')

José Morales

09/13/2023, 9:05 PM

Can you share your statsforecast code?

Brian Head

09/14/2023, 6:31 PM

Thanks, @José Morales. I was able to figure out the problem. I had a bad join.

Brian Head

10/05/2023, 1:11 PM

@José Morales coming back to this thread as I was looking at another comment in it. I realized you mentioned mlforecast is doing a global model. Is there any way now (I've searched and not been able to find it) or is there consideration for local modeling each series in future enhancements when you have mulitple series? I'm wondering if that is why prophet seems to do better in many of my series.

José Morales

10/05/2023, 3:32 PM

Do you get better results with a one model per serie? We don't have it on the roadmap but it's something you can easily do, e.g.

Copy code

id2model = {}
predictions = []
for uid in df['unique_id'].unique():
    uid_df = df[df['unique_id'] == uid]
    fcst = MLForecast(...)
    fcst.fit(uid_df)
    id2model[uid] = fcst
    predictions.append(fcst.predict(10))
predictions = pd.concat(predictions)

Brian Head

10/06/2023, 2:52 PM

Thanks, @José Morales. The global model and local models provided similar results. Appreciate the suggestioned solution.

3 Views

Open in Slack

Previous Next