https://github.com/nixtla logo
Join Slack
Powered by
# statsforecast
  • m

    Maria Jose Arroyo Doria

    11/22/2024, 12:42 AM
    Hi, Is there any way to create custom metrics like the combination of BIAS% and MAE% for being used during the cross validation? def evaluate_cross_validation(df, metric): models = df.drop(columns=['unique_id', 'ds', 'cutoff', 'y']).columns.tolist() evals = [] # Calculate loss for every unique_id and cutoff. for cutoff in df['cutoff'].unique(): eval_ = evaluate(df[df['cutoff'] == cutoff], metrics=[metric], models=models) evals.append(eval_) evals = pd.concat(evals) evals = evals.groupby('unique_id').mean(numeric_only=True) # Averages the error metrics for all cutoffs for every combination of model and unique_id evals['best_model'] = evals.idxmin(axis=1) return evals evaluation_df = evaluate_cross_validation(crossvaldation_df, mse) evaluation_df.head()
  • g

    Gabe Richard

    11/22/2024, 7:10 PM
    Hi there, Is anyone aware of an issue when fugue tries to run with dask? I've been trying to use dask for distributed training since my training dataset is pretty large (~13 mil rows and also has exogenous variables). I've tried upgrading both fugue and dask to latest and even went from python 3.11 to 3.12 to see if it'll help but I keep running into the same issue. Any help or advice is greatly appreciated.
    Copy code
    FugueBug                                  Traceback (most recent call last)
    Cell In[8], line 41
         31 dask_client = Client()
         32 engine = DaskExecutionEngine(dask_client=dask_client)
         33 y_pred = sf.forecast(
         34     df=ts_train,
         35     h=ts_val.shape[0],
         36     level=[90],
         37     X_df=ts_val[[col for col in ts_val.columns if col not in ['y']]],
         38     id_col='unique_id',
         39     time_col='ds',
         40     target_col='y',
    ---> 41 ).compute()
         42 # Add actual values to forecast
         43 forecast = y_pred.merge(ts_val, how='left', on=['unique_id', 'ds'])
    
    File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/dask_expr/_collection.py:481, in FrameBase.compute(self, fuse, concatenate, **kwargs)
        479     out = out.repartition(npartitions=1)
        480 out = out.optimize(fuse=fuse)
    --> 481 return DaskMethodsMixin.compute(out, **kwargs)
    
    File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/dask/base.py:372, in DaskMethodsMixin.compute(self, **kwargs)
        348 def compute(self, **kwargs):
        349     """Compute this dask collection
        350 
        351     This turns a lazy Dask collection into its in-memory equivalent.
       (...)
        370     dask.compute
        371     """
    --> 372     (result,) = compute(self, traverse=False, **kwargs)
        373     return result
    
    File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/dask/base.py:660, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
        657     postcomputes.append(x.__dask_postcompute__())
        659 with shorten_traceback():
    --> 660     results = schedule(dsk, keys, **kwargs)
        662 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
    
    File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/dask_expr/_expr.py:3799, in _execute_internal_graph()
       3796 @staticmethod
       3797 def _execute_internal_graph(internal_tasks, dependencies, outkey):
       3798     cache = dict(dependencies)
    -> 3799     res = execute_graph(internal_tasks, cache=cache, keys=[outkey])
       3800     return res[outkey]
    
    File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/dask_expr/_groupby.py:1193, in operation()
       1191 if kwargs is None:
       1192     kwargs = {}
    -> 1193 return dask_func(
       1194     frame,
       1195     list(by),
       1196     key=_slice,
       1197     group_keys=group_keys,
       1198     args=args,
       1199     **_as_dict("observed", observed),
       1200     **_as_dict("dropna", dropna),
       1201     **kwargs,
       1202 )
    
    File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/dask_expr/_groupby.py:1229, in groupby_slice_apply()
       1218 def groupby_slice_apply(
       1219     df,
       1220     grouper,
       (...)
       1227     **kwargs,
       1228 ):
    -> 1229     return _groupby_slice_apply(
       1230         df,
       1231         grouper,
       1232         key,
       1233         func,
       1234         *args,
       1235         group_keys=group_keys,
       1236         dropna=dropna,
       1237         observed=observed,
       1238         **kwargs,
       1239     )
    
    File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/dask/dataframe/groupby.py:210, in _groupby_slice_apply()
        208 if key:
        209     g = g[key]
    --> 210 return g.apply(func, *args, **kwargs)
    
    File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/fugue_dask/execution_engine.py:151, in _map()
        149     return PandasDataFrame([], output_schema).as_pandas()
        150 pdf = pdf.reset_index(drop=True)
    --> 151 pdf = _fix_dask_bug(pdf)
        152 res = _core_map(pdf)
        153 return res.astype(output_dtypes)
    
    File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/fugue_dask/execution_engine.py:126, in _fix_dask_bug()
        125 def _fix_dask_bug(pdf: pd.DataFrame) -> pd.DataFrame:
    --> 126     assert_or_throw(
        127         pdf.shape[1] == len(input_schema),
        128         FugueBug(
        129             "partitioned dataframe has different number of columns: "
        130             f"{pdf.columns} vs {input_schema}"
        131         ),
        132     )
        133     return pdf
    
    File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/triad/utils/assertion.py:42, in assert_or_throw()
         40     raise AssertionError()
         41 if isinstance(_exception, Exception):
    ---> 42     raise _exception
         43 if isinstance(_exception, str):
         44     raise AssertionError(_exception)
    
    FugueBug: partitioned dataframe has different number of columns: Index(['__fugue_serialized_blob__', '__fugue_serialized_blob_no__',
           '__fugue_serialized_blob_name__'],
          dtype='object') vs unique_id:long,__fugue_serialized_blob__:bytes,__fugue_serialized_blob_no__:long,__fugue_serialized_blob_name__:str,__fugue_serialized_blob_dummy__:long
    j
    • 2
    • 7
  • o

    Omri Kramer

    12/17/2024, 9:44 PM
    Hi all! First of all, thank you for the great package. I was wondering if there is a bug in prediction intervals for ETS with multiplicative errors or if my understanding is wrong. I was expecting models with multiplicative errors to have wider intervals as the prediction gets larger but that doesn't really seems to be the case in the example below:
    Copy code
    import numpy as np
    import pandas as pd
    
    np.random.seed(42)
    n_periods = 98
    seasonal_period = 7
    
    dates = pd.date_range(start="2023-01-01", periods=n_periods, freq="D")
    
    seasonality = np.sin(
        np.linspace(0, np.pi / 2, seasonal_period)
    )
    seasonality_pattern = np.tile(seasonality, n_periods // seasonal_period)
    baseline_level = 100
    y_true = baseline_level * (1 + seasonality_pattern)
    
    error_amplitude = 0.05
    errors = np.random.normal(1, error_amplitude, n_periods)
    y = y_true * errors
    
    model = AutoETS(season_length=seasonal_period, model="MNM")
    train = y[:91]
    fitted_model = model.fit(train, np.arange(len(train)))
    print(fitted_model.model_["method"])
    
    forecasts = fitted_model.predict(h=7, level=[99])
    
    print(pd.DataFrame(forecasts).assign(
        d1=lambda x: x["hi-99"] - x["mean"],
        d2=lambda x: -x["lo-99"] + x["mean"],
        r1=lambda x: x["hi-99"] / x["mean"] - 1,
        r2=lambda x: 1 - x["lo-99"] / x["mean"],
        y=y[91:],
        y_true=y_true[91:],
    ))
    The result is:
    Copy code
    ETS(M,N,M)
             mean       lo-99       hi-99         d1         d2        r1  \
    0   97.220357   85.385598  109.055116  11.834759  11.834759  0.121731   
    1  125.941699  114.106940  137.776459  11.834759  11.834759  0.093970   
    2  147.232211  135.397451  159.066970  11.834759  11.834759  0.080382   
    3  172.418248  160.583488  184.253007  11.834760  11.834760  0.068640   
    4  184.106207  172.271447  195.940967  11.834760  11.834760  0.064282   
    5  198.262093  186.427333  210.096853  11.834760  11.834760  0.059693   
    6  201.371897  189.537137  213.206658  11.834760  11.834760  0.058771   
    
             r2           y      y_true  
    0  0.121731  104.843225  100.000000  
    1  0.093970  121.463115  125.881905  
    2  0.080382  147.542534  150.000000  
    3  0.068640  167.363826  170.710678  
    4  0.064282  172.947760  186.602540  
    5  0.059693  199.503335  196.592583  
    6  0.058771  202.610553  200.000000
    As you can see the difference between the intervals and the prediction stays the same while the ratio is getting smaller. I am using statsforecast 2.0.0.
    • 1
    • 3
  • a

    Anthony Giorgio

    12/18/2024, 8:56 PM
    Hi I am using statforecast MSTL model to forecast hourly temperature given a time series of three years The time series has daily and yearly seasonality. So I assume I had to put season_length=[24, 24*365] However the fit of the model runs forever without finishing at all. If i set only the season_length = [24] the model runs quickly in seconds. Is there a reason?
    m
    • 2
    • 6
  • t

    thomas delaunait

    01/09/2025, 10:48 AM
    @Mariana Menchero Hello Mariana and Nixtla Team. Happy new year to all of you! Thank you for your amazing work. I have quick question. Do you know when it will be planned to release the feat: Forward method TBATS/AutoTBATS ? its seems to be on the main branch since feb 2024. Thank you!
    đź‘€ 1
  • a

    Aravind Karunakaran

    01/13/2025, 11:19 AM
    Im using SimpleExponentialSmoothing to forecast my time-series data which seems to have no apparent trend or seasonality - the model fitted values are great but all the predicted values are the same (i.e the output values just form a straight line). Any explanations for this?
    j
    • 2
    • 1
  • m

    Makarand Batchu

    01/13/2025, 12:21 PM
    Hi #C05CAFFR22H team. I have recently upgraded statsforecast package version to the latest version (2.0.0) and cross_validation is now taking so much longer. Was any of the base models updated? Thanks in advance
    j
    • 2
    • 14
  • g

    Guillaume GALIE

    01/15/2025, 4:21 PM
    Hello I raise following exception with cross validation and MSTL Model (MSTL(season_length = [12],alias='MSTL'))
    Copy code
    File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\statsforecast\models.py:5273, in MSTL.forward(self, y, h, X, X_future, level, fitted)
       5269         res = self.trend_forecaster._add_conformal_intervals(
       5270             fcst=res, y=x_sa, X=X, level=level
       5271         )
       5272 # reseasonalize results
    -> 5273 seas_h = _predict_mstl_seas(model_, h=h, season_length=self.season_length)
       5274 seas_insample = model_.filter(regex="seasonal*").sum(axis=1).values
       5275 res = {
       5276     key: val + (seas_insample if "fitted" in key else seas_h)
       5277     for key, val in res.items()
       5278 }
    
    File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\statsforecast\models.py:4999, in _predict_mstl_seas(mstl_ob, h, season_length)
       4998 def _predict_mstl_seas(mstl_ob, h, season_length):
    -> 4999     seascomp = _predict_mstl_components(mstl_ob, h, season_length)
       5000     return seascomp.sum(axis=1)
    
    File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\statsforecast\models.py:4992, in _predict_mstl_components(mstl_ob, h, season_length)
       4990     mp = seasonal_periods[i]
       4991     colname = seasoncolumns[i]
    -> 4992     seascomp[:, i] = np.tile(
       4993         mstl_ob[colname].values[-mp:], trunc(1 + (h - 1) / mp)
       4994     )[:h]
       4995 return seascomp
    
    ValueError: could not broadcast input array from shape (10,) into shape (12,)
    Here an example of 1 time serie to reproduce KO
    Copy code
    import pandas as pd
    from statsforecast import StatsForecast
    from statsforecast.models import (Naive,MSTL) 
    from utilsforecast.data import generate_series
    freq = 'MS'
    season_length = 12
    min_length = 25
    df = generate_series(n_series=1, freq=freq, min_length=min_length, max_length=min_length)
    sffcst = StatsForecast(models = [MSTL(season_length = [season_length])], freq = freq, n_jobs=-1,fallback_model=Naive(),verbose=True)
    sf_crossvalidation_df=sffcst.cross_validation(df = df, h=12, step_size = 1, n_windows = 3, refit=False).reset_index(drop=True)
    sf_crossvalidation_df
    What is strange is that it works with less history => if you keep only 15Months of data then it doesn't crash
    Copy code
    import pandas as pd
    from statsforecast import StatsForecast
    from statsforecast.models import (Naive,MSTL) 
    from utilsforecast.data import generate_series
    
    freq = 'MS'
    season_length = 12
    min_length = 15
    
    df = generate_series(n_series=1, freq=freq, min_length=min_length, max_length=min_length)
    
    sffcst = StatsForecast(models = [MSTL(season_length = [season_length])], freq = freq, n_jobs=-1,fallback_model=Naive(),verbose=True)
    sf_crossvalidation_df=sffcst.cross_validation(df = df, h=12, step_size = 1, n_windows = 3, refit=False).reset_index(drop=True)
    sf_crossvalidation_df
  • p

    Pradyumna Mahajan

    01/16/2025, 6:54 AM
    Hey team, I am using Statsforecast. I wanted to use a custom frequency of 3 hours, but there is no offset for that. What can I do? I saw some examples of freq=1 in the docs, but what does that mean? Thanks in advance 🙂
    j
    • 2
    • 2
  • p

    Piero Danti

    01/16/2025, 2:45 PM
    Hello, I would like to perform multi timeseries forecasting using #C05CAFFR22H. Is it possible? Furthermore, can I also forecast new time-series?
    m
    • 2
    • 1
  • n

    Naren Castellon

    01/19/2025, 9:00 PM
    I have the following Model, How can I save it and how can I load it once saved, to then train it again?
    # Instantiate StatsForecast class as sf
    sf = StatsForecast(
    df = train,
    models = models,
    freq ='D',
    n_jobs = -1)
    # Train the model
    sf.fit()
    # Forecast
    Y_hat = sf.predict(horizon)
  • b

    Bersu T

    02/05/2025, 10:59 AM
    Hi team, I have a couple questions with regards to prediction intervals. Does ARIMA also use conformal predictions in statsforecast? Can we perform a calibration test (coverage probability), to see how well this interval is calibrated? And have you guys already compared prediction intervals across ARIMA, ML and NN timeseries? I am doing research for my thesis and plan on making comparisons in terms of sharpness and calibration of the prediction interval across different models and hierarchical reconciliation methods.
    j
    • 2
    • 2
  • f

    Filipa Encarnação Louzeiro

    02/10/2025, 5:09 PM
    Hi all, After generating windows for cross-validation, is there a way to calculate rmse (or any other loss function) knowing that my cross-validation dataframe is a spark dataframe? (I mean, beyond the obvious solution of programming the loss expression)
    j
    • 2
    • 3
  • s

    Simon

    02/15/2025, 1:52 PM
    Hello everyone! I am currently looking for a method to train an MSTL model, save it, and add new data without needing to retrain on the full dataset. Do you know if this functionality is supported in statsforecast?
    j
    • 2
    • 2
  • s

    Slackbot

    02/20/2025, 11:54 AM
    This message was deleted.
    j
    • 2
    • 4
  • v

    Vaibhav Gupta

    02/24/2025, 3:02 AM
    Hello Nixtla team, I have noticed a small error in the docs you have provided for the statsforecast library, may I know how to contribute to fixing it?
    j
    • 2
    • 7
  • s

    Slackbot

    02/25/2025, 10:34 AM
    This message was deleted.
    c
    • 2
    • 1
  • r

    Rodrigo Sodré

    03/09/2025, 9:22 PM
    Greetings everyone! I'm trying to predict the next steps of a time series using AutoArima. The data is composed of 2 year daily observation of 96 assets. This is how the dataframe looks after formatting it to Nixtla's input format (attached), 92928 rows Ă— 3 columns. If I use
    sf,forecast
    everything works just fine:
    pred = sf.predict(h=horizon, df=train_df)
    But for every new observation I have to append it to the dataframe and call forecast, which will train everything again. So I tried to change to `sf.fit + sf.predict`:
    sf.fit(df=train_df)
    # update train_df
    pred = sf.predict(h=horizon, X_df=train_df)
    but I'm getting the following error:
    Copy code
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    File <timed exec>:12
    
    File /opt/conda/lib/python3.11/site-packages/statsforecast/core.py:747, in _StatsForecast.predict(self, h, X_df, level)
        742     warnings.warn(
        743         "Prediction intervals are set but `level` was not provided. "
        744         "Predictions won't have intervals."
        745     )
        746 self._validate_exog(X_df)
    --> 747 X, level = self._parse_X_level(h=h, X=X_df, level=level)
        748 if self.n_jobs == 1:
        749     fcsts, cols = self.ga.predict(fm=self.fitted_, h=h, X=X, level=level)
    
    File /opt/conda/lib/python3.11/site-packages/statsforecast/core.py:692, in _StatsForecast._parse_X_level(self, h, X, level)
        690 expected_shape = (h * len(<http://self.ga|self.ga>), self.ga.data.shape[1] + 1)
        691 if X.shape != expected_shape:
    --> 692     raise ValueError(
        693         f"Expected X to have shape {expected_shape}, but got {X.shape}"
        694     )
        695 processed = ufp.process_df(X, self.id_col, self.time_col, None)
        696 return GroupedArray(processed.data, processed.indptr), level
    
    ValueError: Expected X to have shape (96, 2), but got (92928, 3)
    "*Expected X to have shape (96, 2), but got (92928, 3)*" I don't understand why the shape isn't valid for this case if it's the same I trained b4 and it's working with
    forecast
    . Am I using it incorrectly? Is there a proper way to call
    fit/predict
    ? Thanks in advance.
    c
    • 2
    • 3
  • s

    Simon

    03/10/2025, 7:31 PM
    Hi all, is it possible to find the
    unique_id
    to which an MSTL fitted model corresponds? For the following, I do not find any identification:
    Copy code
    sf.fitted_[0, 0].model_
    Thank you for any hints!
    m
    • 2
    • 2
  • a

    Ankit Hemant Lade

    03/17/2025, 3:05 PM
    Hey @Marco I am trying to extract the parameters for stats module such as alpha, beta, gamma etc. Currently, I am doing it for AutoTheta but I am not able to access any parameters from stats is there any way?
    m
    • 2
    • 6
  • b

    Bersu T

    03/18/2025, 2:56 PM
    Hi! Do I need to instantiate conformal prediction for ARIMA during training, or can I also just apply it only during prediction?
    j
    • 2
    • 1
  • s

    Sergio André López Pereo

    03/26/2025, 3:35 AM
    Hey! Good night everyone. I have a question related with the implementation. Do you have any kind of document or article about the time complexity of the AutoTBATS model in terms of the seasons array?
    o
    m
    • 3
    • 3
  • g

    GR

    03/26/2025, 12:34 PM
    Hi, Does Statsforecast support VAR models? I mean, I have multiple variables (columns) in the dataset needing TSA.
    o
    • 2
    • 1
  • a

    Alex Berry

    03/31/2025, 5:30 PM
    What might cause prediction intervals to look this jagged? I am getting intervals similar to this when using HoltWinters and ARIMA models. The lower and upper bounds are the 2.5 and 97.5 percentiles, respectively.
  • s

    Santosh Srivatsa

    04/08/2025, 2:54 AM
    Hello everyone, I’m using AutoARIMA from Nixtla’s StatsForecast to detect missed transmissions from multiple data sources, each identified by a unique
    unique_id
    . Here’s a quick overview of my workflow: 1. Data Preparation: ◦ I create two DataFrames (train and test), both with columns:
    unique_id
    ,
    ds
    , and
    y
    . â—¦ The train and test DataFrames have the same end date but different start dates. 2. Model Setup and Fitting: â—¦ I initialize the
    StatsForecast
    object with the
    AutoARIMA
    model, setting parameters like
    season_length
    and
    freq
    . â—¦ I call the
    .fit()
    method on the training DataFrame. 3. Forecasting and In-Sample Predictions: â—¦ I forecast using the
    .forecast(h=1)
    method on the test DataFrame. â—¦ I also use
    .forecast_fitted_values()
    after fitting to retrieve in-sample predictions, and I flag anomalies based on the
    level
    prediction intervals
    by checking whether actuals fall outside the expected range. I’m doing this because I’m specifically trying to detect missed transmissions, meaning there may not be a value at a fixed point in the future—so a direct forecast of future values isn’t always meaningful. Instead, I’m comparing the model’s in-sample expectations against actuals. Additionally, when I try to introduce exogenous variables, I’m running into an invalid shape error when calling
    .forecast()
    . I suspect this might be because the test dataset doesn’t have the same number of rows for each
    unique_id
    . Would love any guidance on whether this overall workflow makes sense or how I might improve it—especially around incorporating exogenous variables or detecting anomalies more robustly. Thanks in advance for your insights!
    o
    • 2
    • 1
  • s

    Sergio André López Pereo

    04/09/2025, 7:39 PM
    Hello nixtla team, and thanks in advance for your time. I'm running into some kind of issue using the AutoTBATS model. There's some cases where the model just kinda "explodes". It makes the prediction and it seems it transformed it into something exponential for some kind of reason. There's explosion in the positive values and also in the negatives. Do you have any kind of hint of what would it be?
  • m

    Mariana Menchero

    04/09/2025, 7:57 PM
    Hi @Sergio André López Pereo do you have a reproducible example you can share with us?
    s
    • 2
    • 2
  • i

    Iching Quares

    04/10/2025, 2:49 PM
    Hello Nixtla team, I'm not sure if I'm missing anything, but is there any way to jointly fit a arima+garch model, similar on how it's done in the rugarch R package
    Copy code
    spec <- ugarchspec(variance.model = list(garchOrder = c(1, 1)), 
                         mean.model = list(armaOrder = c(final.order[1], final.order[3]), include.mean = TRUE), 
                         distribution.model = "std", 
                         fixed.pars = fixed_pars_df0)
  • a

    Ankit Hemant Lade

    04/11/2025, 2:38 AM
    In statsforecast cross validation is there any way i can give explicit cut off date?
    o
    • 2
    • 1
  • i

    IHAS

    04/11/2025, 3:36 PM
    I am using StatsForecast to process over 100 time series... Is there a way to enable a verbose mode to track which series is currently being processed, or at least estimate the remaining time for the entire process?
    o
    • 2
    • 1
1Latest