Nixtla Community #mlforecast

Slackbot

03/04/2025, 2:17 PM

This message was deleted.

03/04/2025, 4:55 PM

hi there, how do I retrieve the values of the features defined in

lags

lag_transforms

when doing predict?

Chris Naus

03/06/2025, 4:51 PM

Is there a way to use several custom train/test sets instead of the mlf.cross_validation()?

Bersu T

03/07/2025, 10:51 AM

Hi, are the Auto formulas random? Sometimes when I run the code, I get different top configurations without changing anything else.

Bersu T

03/07/2025, 3:54 PM

Is there a way to access multiple top performing trials for the Auto formulas? auto_mlf.results_['AutoLightGBM'].best_trial.user_attrs['config'] only gives the first one, however I want to determine per unique id in cross validation which configuration is the best.

Gabriel Luis Cajeux

03/09/2025, 8:53 PM

Hi, I need help with a functionality of past_covariates in MLForecast. In order to forecast an horizon of 24 steps, i need to provide at least 1 step of the future value of this past covariate in order to direct forecast the 24 steps. But, in NeuralForecast that is not necessary. Which value i must past as a future value of the past covariate so the functionality of MLForecast matches of the NeuralForecast? -- An example is NHITS with neuralforecast, where i need only to pass the past_covariates to the model (no future values), along with the horizon.

Bersu T

03/10/2025, 11:06 AM

Hi, I'm trying to compute SHAP values for feature importance in MLForecast but getting a ValueError about categorical features not matching between train and valid datasets. I'm following the SHAP example from the docs. I'm getting the error "ValueError: train and valid dataset categorical_feature do not match" in MLForecast with LightGBM, even though I'm using only one dataset (df_encoded).

Copy code

auto_mlf = AutoMLForecast(
    freq="ME",
    season_length=12,
    models={
        'lgb': AutoLightGBM() 
    },
     fit_config=lambda trial: {'static_features': ['unique_id']}  
)
auto_mlf.fit(
    df=df_encoded,
    n_windows=n_windows,
    h=h,
    step_size=step_size,
    fitted=True,
    num_samples=40,
    loss=loss_fn
)
config = auto_mlf.results_['lgb'].best_trial.user_attrs['config']

fcst = MLForecast(
    models=[LGBMRegressor(**config['model_params'])],
    freq="ME",
    **config['mlf_init_params']
)

cv_result2 = fcst.cross_validation(
    df_encoded,
    n_windows=n_windows,  # number of windows
    h=h, 
    step_size=step_size,
    static_features= ['unique_id']
) prep = fcst.preprocess(df_encoded, static_features=['unique_id'])
X = prep.drop(columns=['unique_id', 'ds', 'y'])
fcst.fit(df_encoded, static_features=['unique_id']) explainer = shap.Explainer(fcst.models_["LGBMRegressor"].predict, X)
shap_values = explainer(X)

Bersu T

03/11/2025, 7:12 AM

Is there a way to optimize models during training on CRPS? Here I saw how you could tweak the model for MAE: https://github.com/Nixtla/mlforecast/discussions/430. What function could we use to make a probabilistic loss work?

Bersu T

03/14/2025, 11:26 AM

Hi, I'm working with AutoMLForecast and trying to modify the best trial configuration after optimization. Specifically, I'm doing:

Copy code

config_lgb = auto_mlf.results_['my_lgb'].best_trial.user_attrs['config']
config_lgb['mlf_init_params']['date_features'] = ['month']

What is the recommended way to update the best trial configuration to make some minor modifications?

Vitor Cerqueira

03/14/2025, 12:50 PM

Hi everyone, First of all, thank you and congrats on the amazing tools you're building. I'm working with irregular time series, and trying to do some feature engineering without resampling the data. Is there an effective way of doing this with mlforecast? I want to compute some statistics with time-based "lags". Instead of computing the mean of the past k lags, compute the mean on all observations in the past k hours. Does that make sense?

Mikhael Chris

03/17/2025, 10:40 AM

Hey, I am creating custom lag transform using Numba. Currently I am excperiencing longer execution time. I want to time how long will my custom lag transform run using my data when training. Do you have any idea how should I do this?

Bersu T

03/17/2025, 11:40 AM

I'm using AutoMLForecast with AutoLightGBM and notice that increasing num_samples leads to worse performance. My current setup uses MAE as the loss function and relies on AutoML's default search space. Is this normal? Is there something I should check?

Mikhael Chris

03/21/2025, 4:39 AM

Hi everyone, If I am using LGBMRegressor in MLForecast and training 100 unique_id, would the model learn the whole 100 time series in 1 model or would there be separate model for each id? If it's combined as 1 model, how would MLForecast handles the difference of seasonality and statistic between different time series?

IHAS

03/22/2025, 10:26 PM

Hi everyone! I've been reading the documentation on SHAP values, and something caught my attention: when I transform my target variable, the output of the waterfall plot becomes harder to interpret. How can I properly adjust the output while still keeping the target transformation for the model?

Brian Head

03/25/2025, 9:39 PM

I am using MLForecast version 0.6.2. It's a managed environment so updating a package is a process that takes time is not entirely in my control. If there is anything about the older version that might explain the below I'd appreciate knowing that too. After doing a

mlf.preprocess

and looking at my transformations, I'm seeing a few things I'm wondering about. They are: • How do I have values for lags that don't exist? For example, a lag of 1 in the same line as the first value not being null. • Same with expanding means and seasonal means. • Maybe this is causing the issue I'm seeing with expanding mean or maybe not. I thought an expanding mean was the mean for all values up to a position of interest. So, in row 3 of a series it'd be the mean of rows 1-3. No? I'm seeing different for the expanding mean that I calculate using the above understanding. Appreciate any help. Thanks

jan rathfelder

03/26/2025, 6:58 AM

So lag1 should correspond to your y value in t-1. so compare. I think expanding starts at the beginning of your series and computes a metric for all time steps. Let’s say expanding max() should look at each step and find the max value it has seen so far. So if you have y[2,4,1,3], your expanding max should be: [2,4,4,4]. Does this help a bit?

Bersu T

03/26/2025, 9:39 AM

Anyone know why AutoRandomForest gets stuck after three trials while other models work fine?

Mikhael Chris

03/27/2025, 2:40 AM

Hi, I am forecasting daily for 90 days and currently wondering if it's possible to create custom lag transform to take last week sales, instead of last 7 days sales. For this week all day, I need to take the last week monday-sunday sales. Is it posisble to use njit to have this? (i.e.24-30 march will have feature of the total sales of 17-23 march)

Santosh Puvvada

03/28/2025, 6:30 AM

Hi Everyone, I am currently using a local ML Model - XGB on a single time series. I tried using AutoDifferencing parameter. It is making output go haywire(that too in wrong direction) .. But i though AutoDifferencing will help me with the series where there is trend. Why is this happening & how to rectify this in an automated way. I cant manually change parameters every time( remove autodifferencing, add auto differencing etc) Code i am using is as follows: models1 = [xgb.XGBRegressor()] fcst1 = MLForecast( models=models1, freq=freq, date_features=['year', 'month', 'quarter'], target_transforms = [AutoDifferences(max_diffs=1)], ) fcst1.fit(df, static_features=[]) prediction = fcst1.predict(h=forecast_length) Any help would be great.

Barrett Layman

03/28/2025, 11:33 AM

Hello everyone! I want to create a one model per step where each model predicts the next 21 days in increments of 7 days. I wasn't sure about the interaction between lightgbm.cv and max_horizon. I was wondering if someone could let me know whether there is an example for this case. ( I couldn't find one on GitHub or the main page ) lgb_cv = LightGBMCV( freq='d', lags=[7, 14], lag_transforms={ 7 : [ ExpandingMean(), ] ) cv_hist = lgb_cv.fit( sales_pd, n_windows=4, h=21, dropna=False, metric=comp_loss, num_iterations=10_000, params={'verbosity': -1, 'learning_rate': 0.2, 'num_leaves': 128}, early_stopping_evals=5, ) mlf = MLForecast.from_cv(lgb_cv) mlf.fit(sales, dropna=False) preds = mlf.predict(h=7,max_horizon=21)

03/28/2025, 4:13 PM

Hi, just want to clarify that if I specify lag 1 in one model per step setting, does, for example, the 2nd model take the first model's result as "lag 1" feature?

IHAS

04/01/2025, 8:10 PM

Hi! I’m working on a demand forecasting problem with 44,000 unique IDs, and I have 1.5 years of historical data, resulting in a total of 30 million rows. Most of the data exhibits strong weekly seasonality (every 7 days). I’m considering using Nixtla for this task. Given the size and nature of the dataset, can I approach this using any method from Nixtla’s library, or are there specific methods or models that are better suited for handling datasets with such characteristics (e.g., high cardinality, strong seasonality)? My main priority is achieving a good RMSE while ensuring scalability of the solution. Would love to hear your recommendations!

04/07/2025, 8:19 PM

Hi, I'm trying to use early stopping and eval_set arguments of lgbm but got stuck on MLForecast.fit function, which doesn't have an argument for passing eval_set. I know you can do custom training and plug in the model back to the MLForecast but that way I lost the functionality of conformal prediction that came from MLForecast.cv. Is there any work around for this?

04/09/2025, 8:16 PM

I also found a issue using lgbmcv. So basically I have fixed lgbm params and fed into lgbmcv, and I got pretty decent forecast (screenshot1), but if I fit another lgbm model from lgbmcv by

MLForecast.from_cv()

I got zig zaggy predictions (screenshot2). I'm not sure this indicates anything but I can't find a way to explain these discrepancy as all the boosters in lgbmcv gave me the same "shape".

Chris Naus

04/10/2025, 1:24 PM

Is there any documentation that defines the search space for parameters in the different mlforecast models during hyperparameter tuning?

Jonghyun Yun

04/11/2025, 11:37 AM

Hi Team, currently I am training new lightGBM models (with some lag / transformed lag variables) everyday and make daily forecast for next 6 mo. In future, I will train them less frequently. However, I am not clear how to feed latest target variables into a pre-trained model. Could you please teach me? Another question is that how to specify static_features while I’m using AutoMLForecast?

Bersu T

04/23/2025, 7:55 AM

Hi, where can I retrieve more information on the target transforms. When I look at my auto ML config I see this: 'mlf_init_params': {'lags': None, 'target_transforms': [<mlforecast.target_transforms.GlobalSklearnTransformer at 0x7f86e3e25750>, <mlforecast.target_transforms.Differences at 0x7f86e3e27a10>, <mlforecast.target_transforms.LocalStandardScaler at 0x7f86e1240710>]. I would like to know at what the differences took place for example.

Bersu T

04/23/2025, 8:10 PM

Also if AutoML adds a lag 1 feature, how does that work when you're predicting 18 months into the future? You don’t actually have the lag 1 value at that point right

Tiago Augusto Ferreira

04/24/2025, 7:23 PM

Hi everyone. I have two questions: 1 - I need do a forecast with MLForecast considering a gap between my train dataset and scoring dataset. For example: i stop the train in the time 50 and do a forecast to the time 57. It's possible in Nixtla? I see a dependency with the h parameter and, for get the 57 forecast i need do a prediction to all the interval, it's correct? 2 - I need do simulations with a diferent set of values for a diferent set of features for a same unique_id and same data horizon. It's possible pass the dataframe with all combinations and do a forecast for all scenarios?

Copy code

Ex. {
      unique_id : 1,
      ds: 50,
      feature_a : 1
},
{
      unique_id : 1,
      ds: 50,
      feature_a : 2
}

Sarim Zafar

04/27/2025, 11:13 AM

Hi Nixtla team 👋! I'm working with

mlforecast

and exploring ways to automate feature engineering. For feature creation, I think I see paths forward (e.g., adding/excluding custom feature groups as part of optuna trial). However, I'm looking for guidance on automated feature selection within the

mlforecast

pipeline itself. I considered using the

Optuna

integration, but that seems more geared towards overall hyperparameter tuning rather than specifically iterating on feature sets before the main model tuning/training. Is there a recommended way to achieve automated feature selection with

mlforecast

currently? Or, perhaps, could a small module dedicated to feature selection (e.g., based on importance, stepwise methods) be a potential future addition? That would be incredibly helpful! Any advice or pointers would be greatly appreciated! Thanks! 🙏

👀 1