# mlforecast

- n
Naren Castellon

05/15/2024, 11:04 PMHello Nixtla team!! I have version of

, To use the auto models, can I do it with this version, or do I have to update it at once?`MLForecast 0.12.0`

j- 2
- 13

- r
R. Sheshka

05/16/2024, 8:48 AMđź‘‹ Hello, team! - r
R. Sheshka

05/16/2024, 8:49 AMi am using MlForecast 0.13.0 on Mac OS M1. - r
R. Sheshka

05/16/2024, 8:49 AMthe use of mlforecast: Class GroupedArray, method apply_multithreaded_transforms cause segmentation error on Darwin platform. - r
R. Sheshka

05/16/2024, 8:50 AMWorks only limiting the number of thread to one (num_threads=1). - r
R. Sheshka

05/16/2024, 8:50 AMthe problem occurs during the future processing in model.fit method. - r
R. Sheshka

05/16/2024, 8:51 AMSomeone treated already a similar issue ?j- 2
- 16

- t
thomas delaunait

05/18/2024, 8:06 PMDear Nixtla team! I have a question regarding the newly released AutoXgboost. Is it possible in the CV process to specify step_size as it is possible in statsforecasts? It seems only possible to specify n-windows and h. Thank you for your helpj- 2
- 7

- a
Arwa Abougharib

05/20/2024, 1:54 AMHello Nixtla team! I have a question regarding the use of SHAP in mlforecast. I'm implementing the example SHAP code in the documentation found here. However, I get an error at the line 'shap_values = explainer(X)' saying "Copy code

Has anyone faced a similar issue?`TypeError: 'DataFrame' object cannot be interpreted as an integer`

j- 2
- 2

- n
nizar fawal

05/21/2024, 7:27 AMhello Nixtla team, I'm using your MLforecast in a project and I want to save the final model into a pickle in order to deploy it in webapp. However I can't seem to find how to save the model ? can you help me please ?j- 2
- 1

- g
Guillaume GALIE

05/22/2024, 4:12 PMhello Is there a way to export parameters importance after hyperparameter optimization done by Optuna ? (I am using AutoXGBoost) I can see get_param_importances from Optuna website but I don't know if we have access to it with mlforecast say differently can we access the optimized study in order to build some interesting plots ? Thanks in advance https://optuna.readthedocs.io/en/stable/reference/generated/optuna.importance.get_param_importances.html#optuna.importance.get_param_importancesj- 2
- 1

- j
jan rathfelder

05/23/2024, 9:00 AMquick question regarding mlflow. i see the example using statsforecast (https://nixtlaverse.nixtla.io/statsforecast/docs/tutorials/mlflow.html). before i try myself: i assume i can just switch statsforecast with mlforecast basically, right?j- 2
- 4

- j
jan rathfelder

05/24/2024, 8:27 PMthere has been a post here or somewhere in the documentation where it was shown how to use automl models in combination with tuning additional parameters like lags used, target transform etc. anyone can help to point me to that again?j- 2
- 1

- a
Andreas Kaae

05/27/2024, 10:56 AMSetting and tuning

. How high is a good amount to use for tuning the prediction intervals? I have data back to 2019 and I am forecasting pr. hour. Currently I am setting`conformal_distribution`

and`n_windows=7`

. So as far as I understand it will use the last`h=168`

observations for each time serie`7*168 = 1176`

for the first window,`168`

for the second etc. - is this sufficient or is it waaay to little to tweak the interval?`168`

đź‘Ť 1jv- 3
- 5

- j
jan rathfelder

05/28/2024, 3:04 PMWhen trying to update my model, I get the following. The question is: is there a way around? or is updating generally not possible when target transform is used?`~/miniconda3/envs/demand_env/lib/python3.8/site-packages/mlforecast/forecast.py in update(self, df)`

`*986* df : pandas or polars DataFrame`

`*987* Dataframe with new observations."""`

`--> 988 self.ts.update(df)`

`~/miniconda3/envs/demand_env/lib/python3.8/site-packages/mlforecast/core.py in update(self, df)`

`*849* if new_groups.any():`

`*850* if self.target_transforms is not None:`

`--> 851 raise ValueError("Can not update target_transforms with new series.")`

`*852* new_ids = ufp.filter_with_mask(sizes[self.id_col], new_groups)`

`*853* new_ids_df = ufp.filter_with_mask(df, ufp.is_in(df[self.id_col], new_ids))`

`ValueError: Can not update target_transforms with new series.`

j- 2
- 2

- w
Weikai Lu

05/30/2024, 7:54 PMHello everyone, Iâ€™ve been using neuralforecast and mlforecast for some time now, but Iâ€™m still kinda new to the field of time series forecasting. mlforecast greatly simplifies the process of incorporating lags and rolling window features into my models. I try to enhance my model by introducing additional features beyond the standard lags and lag transformations. Even though these exogenous variables exhibit strong correlation and have high importance scores, Iâ€™ve noticed a significant decrease in the accuracy of my model during cross-validation after their inclusion, which is very confusing to me. Hereâ€™s how I set up my model and did the cross-validation: mlf = MLForecast(models=model_1, freq='1h', lags=[1, 24, 48, 72, 168], lag_transforms={12:[RollingMean(window_size=24)], 24: [RollingMean(window_size=24)], 48: [RollingMean(window_size=48)]}, date_features=["year", "month", "dayofweek", "day", "hour"]) # Seasonal data crossvalidation_df_met = mlf.cross_validation( df=df[['ds', 'y', 'unique_id', 'is_holiday', 'is_weekday', 'temperature_2m', 'relative_humidity_2m', 'shortwave_radiation_instant', 'distance_to_holiday']], h=48, n_windows=10, ) Any ideas or suggestions would be super helpful. Thanks in advance!j- 2
- 3

- n
Naren Castellon

06/01/2024, 1:14 AMHello Nixtla team. I have a query, if I want to make a Knn model, can I find the optimal K with some ML forecast library, or should I first train the model with sklearn, and that way And in that way obtain the value of K.j- 2
- 1

- n
Naren Castellon

06/02/2024, 3:56 AMHi team I have encountered this type of situation several times in some projects that I have carried out, and I make the predictions then represent them visually as shown in the graph. To this result of the predictions I apply a couple of metrics, for example MSE. So far so good, but what happens when I then apply the cross validation and then evaluate the result of cross validation with the metrics, it happens that the best model is not the one I mean when I apply the metrics with the result of the predictions, Why does this discrepancy occur? What can I do to make a decision in these cases? What am I failing? I would expect it to be the same result in the model selection, although numerically the result was different!!! I am using this definition to evaluate cross validation from imagej- 2
- 14

- j
jan rathfelder

06/03/2024, 8:55 PMi found a bug when trying to update my model. for me sometimes my code worked and sometimes updating my model did not work and threw an error. i could not understand why, but now i found that when my tuning run finds differencing, then i can update without problems, if no differencing is applied, then updating throws an error. i have added a loom video where you can see me running the same code but in one case i use differencing order 1 and in the other order 0 (so no differencing) and when there is no differencing, updating fails: https://www.loom.com/share/b7bda858a2264e7e808d842bf1570e94?sid=e75f6ba5-9a59-470e-90a5-ca6456b98614j- 2
- 3

- t
Truong Hoang

06/04/2024, 1:05 AMHi nixtla team and community, I have a question regarding bounded forecast and differencing in mlforecast. Suppose my time series is strictly positive, however after the differencing process, there are negative values. In this case, how do I make sure that the final forecasting output is still positive? From my understanding, using CoxBox transformation or setting a poisson objective won't really help here since including differences means that the actual input to the model will still contain negative values. Currently I'm just clipping the final output to 0 but wondering if there is another way to do it. Appreciate any help here, thanks!j- 2
- 2

- q
Quang Bui

06/07/2024, 2:22 AMHi team, I'm using a 32-core Azure compute instance to run LightGBM cross-validation`cv = LightGBMCV()`, setting

. After running`num_threads=32`

, and then retraining the model with the best iteration using`cv.fit()`

, the final model I get is on that turns out to be trained with just`MLForecast.from_cv()`

.`num_threads=1`

Copy code

I'm training on a very large dataset, and I notice that it takes a very long time to complete.`MLForecast(models=[LGBMRegressor], freq=5min, lag_features=['lag1', 'lag2', 'lag3', 'lag4', 'lag5', 'lag6', 'lag12', 'lag288', 'lag576', 'lag864', 'lag1152', 'exponentially_weighted_mean_lag1_alpha0.5', 'rolling_mean_lag1_window_size12', 'rolling_mean_lag1_window_size24', 'rolling_mean_lag1_window_size288', 'rolling_mean_lag1_window_size864', 'rolling_quantile_lag1_p0.5_window_size12', 'rolling_quantile_lag1_p0.5_window_size288', 'rolling_quantile_lag1_p0.5_window_size864', 'rolling_std_lag1_window_size12', 'rolling_std_lag1_window_size288', 'seasonal_rolling_mean_lag1_season_length288_window_size7', 'seasonal_rolling_std_lag1_season_length288_window_size7', 'seasonal_rolling_quantile_lag1_p0.5_season_length288_window_size7', 'seasonal_rolling_min_lag1_season_length288_window_size7', 'seasonal_rolling_max_lag1_season_length288_window_size7', 'rolling_mean_lag12_window_size288', 'rolling_mean_lag24_window_size288', 'rolling_mean_lag288_window_size12', 'rolling_mean_lag288_window_size288', 'rolling_std_lag288_window_size12', 'rolling_mean_lag576_window_size12', 'rolling_std_lag576_window_size12', 'rolling_mean_lag864_window_size12', 'rolling_std_lag864_window_size12'], date_features=[<function localize_and_get_five_min_index at 0x7f82991b0040>, <function localize_hour at 0x7f8256837a30>, <function localize_and_identify_weekend at 0x7f82569bc310>, <function localize_dayofweek at 0x7f82569bcaf0>], num_threads=1)`

j- 2
- 13

- j
Johannes Emme

06/07/2024, 7:59 AMHi, I'm trying to access the fitted values during training of a lgbm regression model. However, I experience an index error that to me is quite strange. Below I have made a mini example of the problem, which only occurs when I set

larger than 10 (it works for 10 and lower).`max_horizon`

Copy code`from mlforecast import MLForecast from lightgbm import LGBMRegressor import pandas as pd df = pd.concat([ pd.DataFrame({ 'id': ['A'] * 1000, 'ds': pd.date_range(start='2020-01-01', periods=1000, freq='H'), 'y': range(1000) }), pd.DataFrame({ 'id': ['B'] * 1000, 'ds': pd.date_range(start='2020-01-01', periods=1000, freq='H'), 'y': range(1000) }) ]) fcst = MLForecast( models=LGBMRegressor(), freq='H', lags=[1, 2, 3], ) fcst.fit(df, id_col='id', time_col='ds', target_col='y', max_horizon=11, fitted=True ) in_sample_predictions = fcst.forecast_fitted_values() print(in_sample_predictions)`

Copy code`File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/mlforecast/forecast.py:412, in MLForecast._compute_fitted_values(self, base, X, y, id_col, time_col, target_col, max_horizon) 409 for horizon in range(max_horizon): 410 horizon_base = ufp.copy_if_pandas(base, deep=True) 411 horizon_base = ufp.assign_columns( --> 412 horizon_base, target_col, y[:, horizon] 413 ) 414 horizon_fitted_values.append(horizon_base) 415 for name, horizon_models in self.models_.items(): IndexError: index 10 is out of bounds for axis 1 with size 10`

đź‘Ť 1j- 2
- 2

- j
jan rathfelder

06/07/2024, 8:30 PMi get this when i want to update my model. i only want to update 45 from 47 articles and the rest is not included in my update_df. i thought this worked before (and i think i did partial updating before), but can't figure out what is the problem here. any ideas? --->`45 loaded_model.update(df_update)`

`46`

`*47* # apply encoder:`

`~/miniconda3/envs/demand_env/lib/python3.8/site-packages/mlforecast/forecast.py in update(self, df)`

`*986* df : pandas or polars DataFrame`

`*987* Dataframe with new observations."""`

`--> 988 self.ts.update(df)`

`~/miniconda3/envs/demand_env/lib/python3.8/site-packages/mlforecast/core.py in update(self, df)`

`*867* if isinstance(tfm, _BaseGroupedArrayTargetTransform):`

`*868* ga = GroupedArray(values, indptr)`

`--> 869 ga = tfm.update(ga)`

`*870* df = ufp.assign_columns(df, self.target_col, ga.data)`

`*871* else:`

`~/miniconda3/envs/demand_env/lib/python3.8/site-packages/mlforecast/target_transforms.py in update(self, ga)`

`*111* core_ga = CoreGroupedArray(ga.data, ga.indptr, self.num_threads)`

`*112* for scaler in self.scalers_:`

`--> 113 transformed = scaler.update(core_ga)`

`*114* core_ga = core_ga._with_data(transformed)`

`*115* return GroupedArray(transformed, ga.indptr)`

`~/miniconda3/envs/demand_env/lib/python3.8/site-packages/coreforecast/scalers.py in update(self, ga)`

`*348* )`

`*349* if self.tails_.size != tails_indptr[-1]:`

`--> 350 raise ValueError("Number of tails doesn't match the number of groups")`

`*351* tails_ga = GroupedArray(self.tails_, tails_indptr, num_threads=ga.num_threads)`

`*352* combined = tails_ga._append(ga)`

`ValueError: Number of tails doesn't match the number of groups`

j- 2
- 3

- a
Affan M

06/10/2024, 8:11 PMHi team, I'm trying to set up a few AutoMLForecasts, and I'm having trouble accessing the fitted values. Is it possible to extract them? I would like to perform hierarchical reconciliation on it following the forecasts. Edit: I believe I have found a way to do it, although it's one model at a time. auto_mlf.models_["lgb"].fit(train, fitted=True).forecast_fitted_values() I would still be interested in knowing if there was a better way to do it.j- 2
- 4

- w
Weikai Lu

06/12/2024, 9:41 PMHi Team, I have been learning a lot from using the Mlforecast library. Itâ€™s been a great resource so far! Iâ€™ve been experimenting with the

function and I had a question about it. I understand that in time series analysis, when we create a validation set, it should only include information that would be available at the time of prediction. This means that lagged features for the validation set should be computed based on data up to the last point in the training set for each window. I was wondering how the`cross_validation`

function in Mlforecast handles this. Does it ensure that lagged features for the validation set are only computed based on data up to the last point in the training set for each window? I hope my question makes sense. Any guidance on this would be really helpful. Thank you so much!`cross_validation`

j- 2
- 2

- b
Braaannigan

06/13/2024, 9:11 AMHi - I'm doing cross-validation with mlforecast. My primary comparisons are between different feature sets or transforms rather than different models. Is there any existing tooling to help compare results from different mlforecast models?g- 2
- 2

- b
Braaannigan

06/18/2024, 8:06 PMHi - looks like I'm getting seg faults with a re-install today. I suspect numpy 2.0 is the issue. Anyone else come across this?j- 2
- 14

- s
Sarim Zafar

06/18/2024, 9:03 PMHi team, I've used the AutoMLForecast - AutoLightGBM combination, as suggested in the example, to train on a time series. However, when I attempt to reproduce the results using the cross-validation function on a custom LGBM instance with the discovered optimized hyper-parameters and lag features, I am unable to achieve the same performance. My assumption is that the model applies some target transformation based on season length, but I am not certain. Could anyone provide clarification on this issue? Additionally, how can one specify target transformations as part of the

function, as shown in the example on the website? When I use a simple log-difference combination as I typically do with Cross Validation, the loss function returns NaN. For the loss function, I am using MAE as described here: def custom_loss(df, train_df): return mae(df, models=["model"])["model"].mean() Any guidance on these matters would be greatly appreciated. Thank you!`my_init_config`

j- 2
- 13

- j
jan rathfelder

06/19/2024, 1:17 PMHi, i would like to build a custom objective function for xgboost. but all stand solutions fail, because i cant really alter the fit method like i could do in xgboost (where i can just specify a custom objective function). i tried multiple ways but all failed so far. do you guys have any idea on how to do this? it seems one issue is that the train data that i can access is not in the Dmatrix style. i am happy for any suggestions herej- 2
- 5

- o
Olgahan Cat

06/19/2024, 3:55 PMhi guys! first, I would like to thank for this awesome package. I have a question: when I use a model in MLForecast with multiple time series, does the model fit for each series separately, or uses all series at the same time to estimate parameters?j- 2
- 2