Hi team, I've used the AutoMLForecast - AutoLight...
# mlforecast
s
Hi team, I've used the AutoMLForecast - AutoLightGBM combination, as suggested in the example, to train on a time series. However, when I attempt to reproduce the results using the cross-validation function on a custom LGBM instance with the discovered optimized hyper-parameters and lag features, I am unable to achieve the same performance. My assumption is that the model applies some target transformation based on season length, but I am not certain. Could anyone provide clarification on this issue? Additionally, how can one specify target transformations as part of the
my_init_config
function, as shown in the example on the website? When I use a simple log-difference combination as I typically do with Cross Validation, the loss function returns NaN. For the loss function, I am using MAE as described here: def custom_loss(df, train_df): return mae(df, models=["model"])["model"].mean() Any guidance on these matters would be greatly appreciated. Thank you!
j
Hey. How are you running the custom LGBM instance? You should be able to get the same result with something like this:
Copy code
best_config = auto_mlf.results_['AutoLightGBM'].best_trial.user_attrs['config']
my_lgb = LGBMRegressor(**best_config['model_params'])
my_mlf = MLForecast(models=my_lgb, freq=my_freq, **best_config['mlf_init_params'])
my_mlf.cross_validation(auto_settings like windows, h, refit)
How do you specify the log difference? As
[GlobalSklearnTransformer(FunctionTransformer(np.log1p, np.expm1)), Differences(...)]
?
s
Yes I specify the log difference as you suggested:
Copy code
target_transforms=[GlobalSklearnTransformer(FunctionTransformer(func=np.log1p, inverse_func=np.expm1)),
        Differences([season]),]
As for the usage i am using LightGBMCV with the model params and input features with the same number of windows and horizon
j
The default of the auto model is not to refit. Are you setting
refit=False
in the cross_validation call?
s
Yes I've done so
Okay so upon further inspection I have realised that if you provide your own config in fit it wont apply any target transforms. Which is now even more confusing as if i were to use the code as you suggested above I should be able to recreate the performance but I am unable to. And I am still struggling to use my own target transforms as that simply leads to -inf loss
j
The target transforms are provided in the init config. Do you have values less than zero in your data? I'm not sure if log1p raises an error or returns NaN for negative values
s
if i provide my own init_config it overrides and skips the self._seasonality_based_config function call
so now I am back to square one given the best params are correct why can I not re-produce it using a simple MlForecast Object?
j
Here's an example:
Copy code
import math

import lightgbm as lgb
from mlforecast import MLForecast
from mlforecast.auto import AutoMLForecast, AutoLightGBM
from mlforecast.utils import generate_series
from utilsforecast.losses import smape

series = generate_series(10, min_length=100)
auto = AutoMLForecast(
    models={'lgb': AutoLightGBM()},
    freq="D",
    season_length=7,
)
auto.fit(series, n_windows=2, h=7, num_samples=5)
best_trial = auto.results_['lgb'].best_trial
auto_res = best_trial.value
best_config = best_trial.user_attrs['config']
mlf = MLForecast(
    models={'lgb': lgb.LGBMRegressor(**best_config['model_params'])},
    freq="D",
    **best_config['mlf_init_params'],
)
cv_res = mlf.cross_validation(series, n_windows=2, h=7, refit=False)
cv_res['id_cutoff'] = cv_res['unique_id'].astype(str) + '_' + cv_res['cutoff'].astype(str)
manual_res = smape(cv_res, models=['lgb'], id_col='id_cutoff')['lgb'].mean()
assert math.isclose(auto_res, manual_res)
Is that what you're trying to reproduce? The trial score?
s
Yes
Okay it is working now! I must've had a logical error somewhere in my code! Thank you!
🙌 1