Hello is there any particular thing I am missing d...
# mlforecast
s
Hello is there any particular thing I am missing due to which i am getting different metric scores between automl and manual cross-validation? For AutoML the code is following:
Copy code
my_lgb = AutoModel(
    model=lgb.LGBMRegressor(),
    config=my_lgb_config,
)
auto_mlf = AutoMLForecast(
    models={'lgb':my_lgb},
    freq='7D',
    season_length=4,
    init_config=my_init_config,
    fit_config = my_fit_config,
)
auto_mlf.fit(
    processed_df,
    n_windows=12,
    h=horizon,
    num_samples=1000,  # number of trials to run
    loss=custom_loss,
)
I then save and reload it and re-eval with added metrics that werent being tracked in the automl bit. But the results yield different metric values:
Copy code
auto_mlf.save('AutoLightGBM')
mlf=MLForecast.load('AutoLightGBM/lgb')
cv_res = mlf.cross_validation(processed_df,
                              n_windows=12,
                              h=4,
                            step_size=4,
                            refit=False,
                            static_features=['country','department'],
                            )
mae_error=mae(cv_res, models=['lgb'])['lgb'].mean()
mape_error=mape(cv_res, models=['lgb'])['lgb'].mean()
smape_error=smape(cv_res, models=['lgb'])['lgb'].mean()
bias = np.mean(cv_res['lgb']-cv_res['y'])
metrics = {'mae':mae_error, 'mape':mape_error, 'smape':smape_error, 'bias':bias}
o
The cv refits the model at least once. The refit argument in cv (and also in auto_mlf.fit):
Retrain model for each cross validation window. If False, the models are trained at the beginning and then used to predict each window. If positive int, the models are retrained every refit windows.
So, it seems you're not comparing the same fitted model. The first is optimized over 1000 samples. The second then loads that optimized model, and refits it against the processed_df.
s
so what do i need to do to ensure same results?
shouldnt the optimized models eval score be the same as the cv score for that particular metric given it would be trained on the same train set as the automl one and evaluated on same windows as well?
o
If all else equal, perhaps - but there's too little information. If you send a piece of code that I can run I can have a better look.
s
Its literally the same processed_df as part of the same notebook and as you can see the same stored optimized params for the model pipeline. What else could affect the change in metrics?....if you can point out I can clarify. I havent tested it on a publicly available dataset.
Also when you say 'it seems you're not comparing the same fitted model' I dont understand how its different. Sure I am retraining it in cv but its the same params and the training set is also the same as the one that was used in the automl pipeline as the windows and horizon arguments are the same