Hi guys, I was wondering why are the predictions s...
# mlforecast
v
Hi guys, I was wondering why are the predictions so 'flat' by default. For the following settings:
Copy code
horizon = 12
models = [LinearRegression(), BayesianRidge(), lgb.LGBMRegressor(verbosity=-1)] #xgb.XGBRegressor(verbosity=0)
forecast_ml = MLForecast(models=models, 
                         lags=range(1, horizon+1),
                         freq='B')
See attached the output and cross validation. I have also added the cross-validation for the same settings on darts. Now if I use the differences transform it looks better, but quite noisy:
Copy code
forecast_ml = MLForecast(models=models, 
                         lags=range(1, horizon+1),
                         lag_transforms={
                            1: [ExpandingMean()],
                            horizon: [RollingMean(window_size=horizon)],
                        },
                         freq='B',
                         target_transforms=[Differences([1,horizon])])
Any tips?
j
How do they compare in terms of error?
v
These are with default parameters (no Diff transform)
Copy code
LinearRegression - MLForecast RMSE for prediction: 1.068
LinearRegression - MLForecast MAPE for prediction: 6.62 %
BayesianRidge - MLForecast RMSE for prediction: 1.068
BayesianRidge - MLForecast MAPE for prediction: 6.62 %
LGBMRegressor - MLForecast RMSE for prediction: 0.9197
LGBMRegressor - MLForecast MAPE for prediction: 5.63 %
Copy code
LinearRegression - MLForecast RMSE using cross-validation: 1.381
LinearRegression - MLForecast MAPE using cross-validation: 5.09 %
BayesianRidge - MLForecast RMSE using cross-validation: 1.381
BayesianRidge - MLForecast MAPE using cross-validation: 5.09 %
LGBMRegressor - MLForecast RMSE using cross-validation: 7.665
LGBMRegressor - MLForecast MAPE using cross-validation: 24.4 %
Copy code
LinearRegression() - Darts RMSE for prediction: 1.264
LinearRegression() - Darts MAPE for prediction: 7.17 %
BayesianRidge() - Darts RMSE for prediction: 1.264
BayesianRidge() - Darts MAPE for prediction: 7.17 %
LGBMRegressor(verbose=-1) - Darts RMSE for prediction: 1.318
LGBMRegressor(verbose=-1) - Darts MAPE for prediction: 7.73 %
Copy code
LinearRegression() - Darts RMSE using cross-validation: 2.176
LinearRegression() - Darts MAPE using cross-validation: 6.41 %
BayesianRidge() - Darts RMSE using cross-validation: 2.176
BayesianRidge() - Darts MAPE using cross-validation: 6.41 %
LGBMRegressor(verbose=-1) - Darts RMSE using cross-validation: 2.193
LGBMRegressor(verbose=-1) - Darts MAPE using cross-validation: 6.66 %
j
So the flat forecast is better, isn't it? 0.91 RMSE vs 1.31
v
It is better in the cross-validation, for the actual prediction the Diff one seems better:
Copy code
LinearRegression - MLForecast RMSE for prediction: 0.8298
LinearRegression - MLForecast MAPE for prediction: 5.29 %
BayesianRidge - MLForecast RMSE for prediction: 0.8294
BayesianRidge - MLForecast MAPE for prediction: 5.29 %
LGBMRegressor - MLForecast RMSE for prediction: 0.621
LGBMRegressor - MLForecast MAPE for prediction: 3.73 %
Copy code
LinearRegression - MLForecast RMSE using cross-validation: 3.184
LinearRegression - MLForecast MAPE using cross-validation: 12.3 %
BayesianRidge - MLForecast RMSE using cross-validation: 3.082
BayesianRidge - MLForecast MAPE using cross-validation: 12.0 %
LGBMRegressor - MLForecast RMSE using cross-validation: 1.732
LGBMRegressor - MLForecast MAPE using cross-validation: 6.53 %
j
Your data has trend so tree-based models can't extrapolate, using the first difference should help with that
🙌 1
v
It seems to help with the tree-based models but worsen for the regression ones. Is there a way to still make it follow the data (ie non-flat) but with less noise?
j
@Vítor Barbosa, your model might get better by including more information. either using some exogenous features if available, or you use some timestamp informations. nixtla has some cool stuff: https://nixtlaverse.nixtla.io/mlforecast/docs/how-to-guides/lag_transforms_guide.html. so don't just rely on rolling mean and expanding mean. also view the window size as a hyperparameter for the model. maybe try to tune it also. bur start with adding more infos like min, max or add rolling means for short term lags and for long-term lags. i hope you model can improve based on this.