Nixtla Community #statsforecast

Maria Jose Arroyo Doria

11/22/2024, 12:42 AM

Hi, Is there any way to create custom metrics like the combination of BIAS% and MAE% for being used during the cross validation? def evaluate_cross_validation(df, metric): models = df.drop(columns=['unique_id', 'ds', 'cutoff', 'y']).columns.tolist() evals = [] # Calculate loss for every unique_id and cutoff. for cutoff in df['cutoff'].unique(): eval_ = evaluate(df[df['cutoff'] == cutoff], metrics=[metric], models=models) evals.append(eval_) evals = pd.concat(evals) evals = evals.groupby('unique_id').mean(numeric_only=True) # Averages the error metrics for all cutoffs for every combination of model and unique_id evals['best_model'] = evals.idxmin(axis=1) return evals evaluation_df = evaluate_cross_validation(crossvaldation_df, mse) evaluation_df.head()

Gabe Richard

11/22/2024, 7:10 PM

Hi there, Is anyone aware of an issue when fugue tries to run with dask? I've been trying to use dask for distributed training since my training dataset is pretty large (~13 mil rows and also has exogenous variables). I've tried upgrading both fugue and dask to latest and even went from python 3.11 to 3.12 to see if it'll help but I keep running into the same issue. Any help or advice is greatly appreciated.

Copy code

FugueBug                                  Traceback (most recent call last)
Cell In[8], line 41
     31 dask_client = Client()
     32 engine = DaskExecutionEngine(dask_client=dask_client)
     33 y_pred = sf.forecast(
     34     df=ts_train,
     35     h=ts_val.shape[0],
     36     level=[90],
     37     X_df=ts_val[[col for col in ts_val.columns if col not in ['y']]],
     38     id_col='unique_id',
     39     time_col='ds',
     40     target_col='y',
---> 41 ).compute()
     42 # Add actual values to forecast
     43 forecast = y_pred.merge(ts_val, how='left', on=['unique_id', 'ds'])

File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/dask_expr/_collection.py:481, in FrameBase.compute(self, fuse, concatenate, **kwargs)
    479     out = out.repartition(npartitions=1)
    480 out = out.optimize(fuse=fuse)
--> 481 return DaskMethodsMixin.compute(out, **kwargs)

File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/dask/base.py:372, in DaskMethodsMixin.compute(self, **kwargs)
    348 def compute(self, **kwargs):
    349     """Compute this dask collection
    350 
    351     This turns a lazy Dask collection into its in-memory equivalent.
   (...)
    370     dask.compute
    371     """
--> 372     (result,) = compute(self, traverse=False, **kwargs)
    373     return result

File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/dask/base.py:660, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    657     postcomputes.append(x.__dask_postcompute__())
    659 with shorten_traceback():
--> 660     results = schedule(dsk, keys, **kwargs)
    662 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])

File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/dask_expr/_expr.py:3799, in _execute_internal_graph()
   3796 @staticmethod
   3797 def _execute_internal_graph(internal_tasks, dependencies, outkey):
   3798     cache = dict(dependencies)
-> 3799     res = execute_graph(internal_tasks, cache=cache, keys=[outkey])
   3800     return res[outkey]

File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/dask_expr/_groupby.py:1193, in operation()
   1191 if kwargs is None:
   1192     kwargs = {}
-> 1193 return dask_func(
   1194     frame,
   1195     list(by),
   1196     key=_slice,
   1197     group_keys=group_keys,
   1198     args=args,
   1199     **_as_dict("observed", observed),
   1200     **_as_dict("dropna", dropna),
   1201     **kwargs,
   1202 )

File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/dask_expr/_groupby.py:1229, in groupby_slice_apply()
   1218 def groupby_slice_apply(
   1219     df,
   1220     grouper,
   (...)
   1227     **kwargs,
   1228 ):
-> 1229     return _groupby_slice_apply(
   1230         df,
   1231         grouper,
   1232         key,
   1233         func,
   1234         *args,
   1235         group_keys=group_keys,
   1236         dropna=dropna,
   1237         observed=observed,
   1238         **kwargs,
   1239     )

File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/dask/dataframe/groupby.py:210, in _groupby_slice_apply()
    208 if key:
    209     g = g[key]
--> 210 return g.apply(func, *args, **kwargs)

File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/fugue_dask/execution_engine.py:151, in _map()
    149     return PandasDataFrame([], output_schema).as_pandas()
    150 pdf = pdf.reset_index(drop=True)
--> 151 pdf = _fix_dask_bug(pdf)
    152 res = _core_map(pdf)
    153 return res.astype(output_dtypes)

File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/fugue_dask/execution_engine.py:126, in _fix_dask_bug()
    125 def _fix_dask_bug(pdf: pd.DataFrame) -> pd.DataFrame:
--> 126     assert_or_throw(
    127         pdf.shape[1] == len(input_schema),
    128         FugueBug(
    129             "partitioned dataframe has different number of columns: "
    130             f"{pdf.columns} vs {input_schema}"
    131         ),
    132     )
    133     return pdf

File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/triad/utils/assertion.py:42, in assert_or_throw()
     40     raise AssertionError()
     41 if isinstance(_exception, Exception):
---> 42     raise _exception
     43 if isinstance(_exception, str):
     44     raise AssertionError(_exception)

FugueBug: partitioned dataframe has different number of columns: Index(['__fugue_serialized_blob__', '__fugue_serialized_blob_no__',
       '__fugue_serialized_blob_name__'],
      dtype='object') vs unique_id:long,__fugue_serialized_blob__:bytes,__fugue_serialized_blob_no__:long,__fugue_serialized_blob_name__:str,__fugue_serialized_blob_dummy__:long

Omri Kramer

12/17/2024, 9:44 PM

Hi all! First of all, thank you for the great package. I was wondering if there is a bug in prediction intervals for ETS with multiplicative errors or if my understanding is wrong. I was expecting models with multiplicative errors to have wider intervals as the prediction gets larger but that doesn't really seems to be the case in the example below:

Copy code

import numpy as np
import pandas as pd

np.random.seed(42)
n_periods = 98
seasonal_period = 7

dates = pd.date_range(start="2023-01-01", periods=n_periods, freq="D")

seasonality = np.sin(
    np.linspace(0, np.pi / 2, seasonal_period)
)
seasonality_pattern = np.tile(seasonality, n_periods // seasonal_period)
baseline_level = 100
y_true = baseline_level * (1 + seasonality_pattern)

error_amplitude = 0.05
errors = np.random.normal(1, error_amplitude, n_periods)
y = y_true * errors

model = AutoETS(season_length=seasonal_period, model="MNM")
train = y[:91]
fitted_model = model.fit(train, np.arange(len(train)))
print(fitted_model.model_["method"])

forecasts = fitted_model.predict(h=7, level=[99])

print(pd.DataFrame(forecasts).assign(
    d1=lambda x: x["hi-99"] - x["mean"],
    d2=lambda x: -x["lo-99"] + x["mean"],
    r1=lambda x: x["hi-99"] / x["mean"] - 1,
    r2=lambda x: 1 - x["lo-99"] / x["mean"],
    y=y[91:],
    y_true=y_true[91:],
))

The result is:

Copy code

ETS(M,N,M)
         mean       lo-99       hi-99         d1         d2        r1  \
0   97.220357   85.385598  109.055116  11.834759  11.834759  0.121731   
1  125.941699  114.106940  137.776459  11.834759  11.834759  0.093970   
2  147.232211  135.397451  159.066970  11.834759  11.834759  0.080382   
3  172.418248  160.583488  184.253007  11.834760  11.834760  0.068640   
4  184.106207  172.271447  195.940967  11.834760  11.834760  0.064282   
5  198.262093  186.427333  210.096853  11.834760  11.834760  0.059693   
6  201.371897  189.537137  213.206658  11.834760  11.834760  0.058771   

         r2           y      y_true  
0  0.121731  104.843225  100.000000  
1  0.093970  121.463115  125.881905  
2  0.080382  147.542534  150.000000  
3  0.068640  167.363826  170.710678  
4  0.064282  172.947760  186.602540  
5  0.059693  199.503335  196.592583  
6  0.058771  202.610553  200.000000

As you can see the difference between the intervals and the prediction stays the same while the ratio is getting smaller. I am using statsforecast 2.0.0.

Anthony Giorgio

12/18/2024, 8:56 PM

Hi I am using statforecast MSTL model to forecast hourly temperature given a time series of three years The time series has daily and yearly seasonality. So I assume I had to put season_length=[24, 24*365] However the fit of the model runs forever without finishing at all. If i set only the season_length = [24] the model runs quickly in seconds. Is there a reason?

thomas delaunait

01/09/2025, 10:48 AM

@Mariana Menchero Hello Mariana and Nixtla Team. Happy new year to all of you! Thank you for your amazing work. I have quick question. Do you know when it will be planned to release the feat: Forward method TBATS/AutoTBATS ? its seems to be on the main branch since feb 2024. Thank you!

👀 1

Aravind Karunakaran

01/13/2025, 11:19 AM

Im using SimpleExponentialSmoothing to forecast my time-series data which seems to have no apparent trend or seasonality - the model fitted values are great but all the predicted values are the same (i.e the output values just form a straight line). Any explanations for this?

Makarand Batchu

01/13/2025, 12:21 PM

Hi #C05CAFFR22H team. I have recently upgraded statsforecast package version to the latest version (2.0.0) and cross_validation is now taking so much longer. Was any of the base models updated? Thanks in advance

Guillaume GALIE

01/15/2025, 4:21 PM

Hello I raise following exception with cross validation and MSTL Model (MSTL(season_length = [12],alias='MSTL'))

Copy code

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\statsforecast\models.py:5273, in MSTL.forward(self, y, h, X, X_future, level, fitted)
   5269         res = self.trend_forecaster._add_conformal_intervals(
   5270             fcst=res, y=x_sa, X=X, level=level
   5271         )
   5272 # reseasonalize results
-> 5273 seas_h = _predict_mstl_seas(model_, h=h, season_length=self.season_length)
   5274 seas_insample = model_.filter(regex="seasonal*").sum(axis=1).values
   5275 res = {
   5276     key: val + (seas_insample if "fitted" in key else seas_h)
   5277     for key, val in res.items()
   5278 }

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\statsforecast\models.py:4999, in _predict_mstl_seas(mstl_ob, h, season_length)
   4998 def _predict_mstl_seas(mstl_ob, h, season_length):
-> 4999     seascomp = _predict_mstl_components(mstl_ob, h, season_length)
   5000     return seascomp.sum(axis=1)

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\statsforecast\models.py:4992, in _predict_mstl_components(mstl_ob, h, season_length)
   4990     mp = seasonal_periods[i]
   4991     colname = seasoncolumns[i]
-> 4992     seascomp[:, i] = np.tile(
   4993         mstl_ob[colname].values[-mp:], trunc(1 + (h - 1) / mp)
   4994     )[:h]
   4995 return seascomp

ValueError: could not broadcast input array from shape (10,) into shape (12,)

Here an example of 1 time serie to reproduce KO

Copy code

import pandas as pd
from statsforecast import StatsForecast
from statsforecast.models import (Naive,MSTL) 
from utilsforecast.data import generate_series
freq = 'MS'
season_length = 12
min_length = 25
df = generate_series(n_series=1, freq=freq, min_length=min_length, max_length=min_length)
sffcst = StatsForecast(models = [MSTL(season_length = [season_length])], freq = freq, n_jobs=-1,fallback_model=Naive(),verbose=True)
sf_crossvalidation_df=sffcst.cross_validation(df = df, h=12, step_size = 1, n_windows = 3, refit=False).reset_index(drop=True)
sf_crossvalidation_df

What is strange is that it works with less history => if you keep only 15Months of data then it doesn't crash

Copy code

import pandas as pd
from statsforecast import StatsForecast
from statsforecast.models import (Naive,MSTL) 
from utilsforecast.data import generate_series

freq = 'MS'
season_length = 12
min_length = 15

df = generate_series(n_series=1, freq=freq, min_length=min_length, max_length=min_length)

sffcst = StatsForecast(models = [MSTL(season_length = [season_length])], freq = freq, n_jobs=-1,fallback_model=Naive(),verbose=True)
sf_crossvalidation_df=sffcst.cross_validation(df = df, h=12, step_size = 1, n_windows = 3, refit=False).reset_index(drop=True)
sf_crossvalidation_df

Pradyumna Mahajan

01/16/2025, 6:54 AM

Hey team, I am using Statsforecast. I wanted to use a custom frequency of 3 hours, but there is no offset for that. What can I do? I saw some examples of freq=1 in the docs, but what does that mean? Thanks in advance 🙂

Piero Danti

01/16/2025, 2:45 PM

Hello, I would like to perform multi timeseries forecasting using #C05CAFFR22H. Is it possible? Furthermore, can I also forecast new time-series?

Naren Castellon

01/19/2025, 9:00 PM

I have the following Model, How can I save it and how can I load it once saved, to then train it again?

# Instantiate StatsForecast class as sf

sf = StatsForecast(

df = train,

models = models,

freq ='D',

n_jobs = -1)

# Train the model

sf.fit()

# Forecast

Y_hat = sf.predict(horizon)

Bersu T

02/05/2025, 10:59 AM

Hi team, I have a couple questions with regards to prediction intervals. Does ARIMA also use conformal predictions in statsforecast? Can we perform a calibration test (coverage probability), to see how well this interval is calibrated? And have you guys already compared prediction intervals across ARIMA, ML and NN timeseries? I am doing research for my thesis and plan on making comparisons in terms of sharpness and calibration of the prediction interval across different models and hierarchical reconciliation methods.

Filipa Encarnação Louzeiro

02/10/2025, 5:09 PM

Hi all, After generating windows for cross-validation, is there a way to calculate rmse (or any other loss function) knowing that my cross-validation dataframe is a spark dataframe? (I mean, beyond the obvious solution of programming the loss expression)

Simon

02/15/2025, 1:52 PM

Hello everyone! I am currently looking for a method to train an MSTL model, save it, and add new data without needing to retrain on the full dataset. Do you know if this functionality is supported in statsforecast?

Slackbot

02/20/2025, 11:54 AM

This message was deleted.

Vaibhav Gupta

02/24/2025, 3:02 AM

Hello Nixtla team, I have noticed a small error in the docs you have provided for the statsforecast library, may I know how to contribute to fixing it?

Slackbot

02/25/2025, 10:34 AM

This message was deleted.

Rodrigo Sodré

03/09/2025, 9:22 PM

Greetings everyone! I'm trying to predict the next steps of a time series using AutoArima. The data is composed of 2 year daily observation of 96 assets. This is how the dataframe looks after formatting it to Nixtla's input format (attached), 92928 rows × 3 columns. If I use

sf,forecast

everything works just fine:

pred = sf.predict(h=horizon, df=train_df)

But for every new observation I have to append it to the dataframe and call forecast, which will train everything again. So I tried to change to `sf.fit + sf.predict`:

sf.fit(df=train_df)

# update train_df

pred = sf.predict(h=horizon, X_df=train_df)

but I'm getting the following error:

Copy code

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File <timed exec>:12

File /opt/conda/lib/python3.11/site-packages/statsforecast/core.py:747, in _StatsForecast.predict(self, h, X_df, level)
    742     warnings.warn(
    743         "Prediction intervals are set but `level` was not provided. "
    744         "Predictions won't have intervals."
    745     )
    746 self._validate_exog(X_df)
--> 747 X, level = self._parse_X_level(h=h, X=X_df, level=level)
    748 if self.n_jobs == 1:
    749     fcsts, cols = self.ga.predict(fm=self.fitted_, h=h, X=X, level=level)

File /opt/conda/lib/python3.11/site-packages/statsforecast/core.py:692, in _StatsForecast._parse_X_level(self, h, X, level)
    690 expected_shape = (h * len(<http://self.ga|self.ga>), self.ga.data.shape[1] + 1)
    691 if X.shape != expected_shape:
--> 692     raise ValueError(
    693         f"Expected X to have shape {expected_shape}, but got {X.shape}"
    694     )
    695 processed = ufp.process_df(X, self.id_col, self.time_col, None)
    696 return GroupedArray(processed.data, processed.indptr), level

ValueError: Expected X to have shape (96, 2), but got (92928, 3)

"*Expected X to have shape (96, 2), but got (92928, 3)*" I don't understand why the shape isn't valid for this case if it's the same I trained b4 and it's working with

forecast

. Am I using it incorrectly? Is there a proper way to call

fit/predict

? Thanks in advance.

Simon

03/10/2025, 7:31 PM

Hi all, is it possible to find the

unique_id

to which an MSTL fitted model corresponds? For the following, I do not find any identification:

Copy code

sf.fitted_[0, 0].model_

Thank you for any hints!

Ankit Hemant Lade

03/17/2025, 3:05 PM

Hey @Marco I am trying to extract the parameters for stats module such as alpha, beta, gamma etc. Currently, I am doing it for AutoTheta but I am not able to access any parameters from stats is there any way?

Bersu T

03/18/2025, 2:56 PM

Hi! Do I need to instantiate conformal prediction for ARIMA during training, or can I also just apply it only during prediction?

Sergio André López Pereo

03/26/2025, 3:35 AM

Hey! Good night everyone. I have a question related with the implementation. Do you have any kind of document or article about the time complexity of the AutoTBATS model in terms of the seasons array?

03/26/2025, 12:34 PM

Hi, Does Statsforecast support VAR models? I mean, I have multiple variables (columns) in the dataset needing TSA.

Alex Berry

03/31/2025, 5:30 PM

What might cause prediction intervals to look this jagged? I am getting intervals similar to this when using HoltWinters and ARIMA models. The lower and upper bounds are the 2.5 and 97.5 percentiles, respectively.

Santosh Srivatsa

04/08/2025, 2:54 AM

Hello everyone, I’m using AutoARIMA from Nixtla’s StatsForecast to detect missed transmissions from multiple data sources, each identified by a unique

unique_id

. Here’s a quick overview of my workflow: 1. Data Preparation: ◦ I create two DataFrames (train and test), both with columns:

unique_id

ds

, and

. ◦ The train and test DataFrames have the same end date but different start dates. 2. Model Setup and Fitting: ◦ I initialize the

StatsForecast

object with the

AutoARIMA

model, setting parameters like

season_length

and

freq

. ◦ I call the

.fit()

method on the training DataFrame. 3. Forecasting and In-Sample Predictions: ◦ I forecast using the

.forecast(h=1)

method on the test DataFrame. ◦ I also use

.forecast_fitted_values()

after fitting to retrieve in-sample predictions, and I flag anomalies based on the
level
prediction intervals by checking whether actuals fall outside the expected range. I’m doing this because I’m specifically trying to detect missed transmissions, meaning there may not be a value at a fixed point in the future—so a direct forecast of future values isn’t always meaningful. Instead, I’m comparing the model’s in-sample expectations against actuals. Additionally, when I try to introduce exogenous variables, I’m running into an invalid shape error when calling

.forecast()

. I suspect this might be because the test dataset doesn’t have the same number of rows for each

unique_id

. Would love any guidance on whether this overall workflow makes sense or how I might improve it—especially around incorporating exogenous variables or detecting anomalies more robustly. Thanks in advance for your insights!

Sergio André López Pereo

04/09/2025, 7:39 PM

Hello nixtla team, and thanks in advance for your time. I'm running into some kind of issue using the AutoTBATS model. There's some cases where the model just kinda "explodes". It makes the prediction and it seems it transformed it into something exponential for some kind of reason. There's explosion in the positive values and also in the negatives. Do you have any kind of hint of what would it be?

Mariana Menchero

04/09/2025, 7:57 PM

Hi @Sergio André López Pereo do you have a reproducible example you can share with us?

Iching Quares

04/10/2025, 2:49 PM

Hello Nixtla team, I'm not sure if I'm missing anything, but is there any way to jointly fit a arima+garch model, similar on how it's done in the rugarch R package

Copy code

spec <- ugarchspec(variance.model = list(garchOrder = c(1, 1)), 
                     mean.model = list(armaOrder = c(final.order[1], final.order[3]), include.mean = TRUE), 
                     distribution.model = "std", 
                     fixed.pars = fixed_pars_df0)

Ankit Hemant Lade

04/11/2025, 2:38 AM

In statsforecast cross validation is there any way i can give explicit cut off date?

IHAS

04/11/2025, 3:36 PM

I am using StatsForecast to process over 100 time series... Is there a way to enable a verbose mode to track which series is currently being processed, or at least estimate the remaining time for the entire process?