thomas delaunait
01/09/2025, 10:48 AMAravind Karunakaran
01/13/2025, 11:19 AMMakarand Batchu
01/13/2025, 12:21 PMGuillaume GALIE
01/15/2025, 4:21 PMFile ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\statsforecast\models.py:5273, in MSTL.forward(self, y, h, X, X_future, level, fitted)
5269 res = self.trend_forecaster._add_conformal_intervals(
5270 fcst=res, y=x_sa, X=X, level=level
5271 )
5272 # reseasonalize results
-> 5273 seas_h = _predict_mstl_seas(model_, h=h, season_length=self.season_length)
5274 seas_insample = model_.filter(regex="seasonal*").sum(axis=1).values
5275 res = {
5276 key: val + (seas_insample if "fitted" in key else seas_h)
5277 for key, val in res.items()
5278 }
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\statsforecast\models.py:4999, in _predict_mstl_seas(mstl_ob, h, season_length)
4998 def _predict_mstl_seas(mstl_ob, h, season_length):
-> 4999 seascomp = _predict_mstl_components(mstl_ob, h, season_length)
5000 return seascomp.sum(axis=1)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\statsforecast\models.py:4992, in _predict_mstl_components(mstl_ob, h, season_length)
4990 mp = seasonal_periods[i]
4991 colname = seasoncolumns[i]
-> 4992 seascomp[:, i] = np.tile(
4993 mstl_ob[colname].values[-mp:], trunc(1 + (h - 1) / mp)
4994 )[:h]
4995 return seascomp
ValueError: could not broadcast input array from shape (10,) into shape (12,)
Here an example of 1 time serie to reproduce KO
import pandas as pd
from statsforecast import StatsForecast
from statsforecast.models import (Naive,MSTL)
from utilsforecast.data import generate_series
freq = 'MS'
season_length = 12
min_length = 25
df = generate_series(n_series=1, freq=freq, min_length=min_length, max_length=min_length)
sffcst = StatsForecast(models = [MSTL(season_length = [season_length])], freq = freq, n_jobs=-1,fallback_model=Naive(),verbose=True)
sf_crossvalidation_df=sffcst.cross_validation(df = df, h=12, step_size = 1, n_windows = 3, refit=False).reset_index(drop=True)
sf_crossvalidation_df
What is strange is that it works with less history => if you keep only 15Months of data then it doesn't crash
import pandas as pd
from statsforecast import StatsForecast
from statsforecast.models import (Naive,MSTL)
from utilsforecast.data import generate_series
freq = 'MS'
season_length = 12
min_length = 15
df = generate_series(n_series=1, freq=freq, min_length=min_length, max_length=min_length)
sffcst = StatsForecast(models = [MSTL(season_length = [season_length])], freq = freq, n_jobs=-1,fallback_model=Naive(),verbose=True)
sf_crossvalidation_df=sffcst.cross_validation(df = df, h=12, step_size = 1, n_windows = 3, refit=False).reset_index(drop=True)
sf_crossvalidation_df
Pradyumna Mahajan
01/16/2025, 6:54 AMPiero Danti
01/16/2025, 2:45 PMNaren Castellon
01/19/2025, 9:00 PM# Instantiate StatsForecast class as sf
sf = StatsForecast(
df = train,
models = models,
freq ='D',
n_jobs = -1)
# Train the model
sf.fit()
# Forecast
Y_hat = sf.predict(horizon)
Bersu T
02/05/2025, 10:59 AMFilipa Encarnação Louzeiro
02/10/2025, 5:09 PMSimon
02/15/2025, 1:52 PMSlackbot
02/20/2025, 11:54 AMVaibhav Gupta
02/24/2025, 3:02 AMSlackbot
02/25/2025, 10:34 AMRodrigo Sodré
03/09/2025, 9:22 PMsf,forecast
everything works just fine:
pred = sf.predict(h=horizon, df=train_df)
But for every new observation I have to append it to the dataframe and call forecast, which will train everything again. So I tried to change to `sf.fit + sf.predict`:
sf.fit(df=train_df)
# update train_df
pred = sf.predict(h=horizon, X_df=train_df)
but I'm getting the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File <timed exec>:12
File /opt/conda/lib/python3.11/site-packages/statsforecast/core.py:747, in _StatsForecast.predict(self, h, X_df, level)
742 warnings.warn(
743 "Prediction intervals are set but `level` was not provided. "
744 "Predictions won't have intervals."
745 )
746 self._validate_exog(X_df)
--> 747 X, level = self._parse_X_level(h=h, X=X_df, level=level)
748 if self.n_jobs == 1:
749 fcsts, cols = self.ga.predict(fm=self.fitted_, h=h, X=X, level=level)
File /opt/conda/lib/python3.11/site-packages/statsforecast/core.py:692, in _StatsForecast._parse_X_level(self, h, X, level)
690 expected_shape = (h * len(<http://self.ga|self.ga>), self.ga.data.shape[1] + 1)
691 if X.shape != expected_shape:
--> 692 raise ValueError(
693 f"Expected X to have shape {expected_shape}, but got {X.shape}"
694 )
695 processed = ufp.process_df(X, self.id_col, self.time_col, None)
696 return GroupedArray(processed.data, processed.indptr), level
ValueError: Expected X to have shape (96, 2), but got (92928, 3)
"*Expected X to have shape (96, 2), but got (92928, 3)*"
I don't understand why the shape isn't valid for this case if it's the same I trained b4 and it's working with forecast
. Am I using it incorrectly? Is there a proper way to call fit/predict
?
Thanks in advance.Simon
03/10/2025, 7:31 PMunique_id
to which an MSTL fitted model corresponds?
For the following, I do not find any identification:
sf.fitted_[0, 0].model_
Thank you for any hints!Ankit Hemant Lade
03/17/2025, 3:05 PMBersu T
03/18/2025, 2:56 PMSergio André López Pereo
03/26/2025, 3:35 AMGR
03/26/2025, 12:34 PMAlex Berry
03/31/2025, 5:30 PMSantosh Srivatsa
04/08/2025, 2:54 AMunique_id
.
Here’s a quick overview of my workflow:
1. Data Preparation:
◦ I create two DataFrames (train and test), both with columns: unique_id
, ds
, and y
.
◦ The train and test DataFrames have the same end date but different start dates.
2. Model Setup and Fitting:
◦ I initialize the StatsForecast
object with the AutoARIMA
model, setting parameters like season_length
and freq
.
◦ I call the .fit()
method on the training DataFrame.
3. Forecasting and In-Sample Predictions:
◦ I forecast using the .forecast(h=1)
method on the test DataFrame.
◦ I also use .forecast_fitted_values()
after fitting to retrieve in-sample predictions, and I flag anomalies based on the level
prediction intervals by checking whether actuals fall outside the expected range.
I’m doing this because I’m specifically trying to detect missed transmissions, meaning there may not be a value at a fixed point in the future—so a direct forecast of future values isn’t always meaningful. Instead, I’m comparing the model’s in-sample expectations against actuals.
Additionally, when I try to introduce exogenous variables, I’m running into an invalid shape error when calling .forecast()
. I suspect this might be because the test dataset doesn’t have the same number of rows for each unique_id
.
Would love any guidance on whether this overall workflow makes sense or how I might improve it—especially around incorporating exogenous variables or detecting anomalies more robustly.
Thanks in advance for your insights!Sergio André López Pereo
04/09/2025, 7:39 PMMariana Menchero
04/09/2025, 7:57 PMIching Quares
04/10/2025, 2:49 PMspec <- ugarchspec(variance.model = list(garchOrder = c(1, 1)),
mean.model = list(armaOrder = c(final.order[1], final.order[3]), include.mean = TRUE),
distribution.model = "std",
fixed.pars = fixed_pars_df0)
Ankit Hemant Lade
04/11/2025, 2:38 AMIHAS
04/11/2025, 3:36 PMSai krishna Sirikonda
05/02/2025, 9:58 AMSai krishna Sirikonda
05/05/2025, 5:32 AMFilipa Encarnação Louzeiro
05/06/2025, 12:01 PMmodels = [AutoETS()]
fcst = StatsForecast(models=models,
freq='M',
n_jobs=1,
fallback_model = SeasonalNaive(season_length = 12))
# FORECAST
df_pred = fcst.forecast(df = df_train,
h = 1,
fitted = True,
level=[90])
But then this error showed up:
PythonException:
An exception was thrown from the Python worker. Please see the stack trace below.
Traceback (most recent call last):
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-153d963c-fd2f-4eea-8c6e-a0d339c0421b/lib/python3.11/site-packages/fugue_spark/execution_engine.py", line 228, in _udf_pandas
output_df = map_func(cursor, input_df)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-153d963c-fd2f-4eea-8c6e-a0d339c0421b/lib/python3.11/site-packages/fugue/extensions/_builtins/processors.py", line 333, in run
return self.transformer.transform(df)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-153d963c-fd2f-4eea-8c6e-a0d339c0421b/lib/python3.11/site-packages/fugue/extensions/transformer/convert.py", line 346, in transform
return self._wrapper.run(
^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-153d963c-fd2f-4eea-8c6e-a0d339c0421b/lib/python3.11/site-packages/fugue/dataframe/function_wrapper.py", line 103, in run
rt = self._func(**rargs)
^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-153d963c-fd2f-4eea-8c6e-a0d339c0421b/lib/python3.11/site-packages/statsforecast/distributed/fugue.py", line 166, in _forecast_noX_fitted
model, result = self._forecast(
^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-153d963c-fd2f-4eea-8c6e-a0d339c0421b/lib/python3.11/site-packages/statsforecast/distributed/fugue.py", line 109, in _forecast
result = model.forecast(
^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-153d963c-fd2f-4eea-8c6e-a0d339c0421b/lib/python3.11/site-packages/statsforecast/core.py", line 864, in forecast
res_fcsts = self.ga.forecast(
^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-153d963c-fd2f-4eea-8c6e-a0d339c0421b/lib/python3.11/site-packages/statsforecast/core.py", line 199, in forecast
res_i = fallback_model.forecast(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-153d963c-fd2f-4eea-8c6e-a0d339c0421b/lib/python3.11/site-packages/statsforecast/models.py", line 3844, in forecast
res = _add_fitted_pi(res=res, se=sigma, level=level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-153d963c-fd2f-4eea-8c6e-a0d339c0421b/lib/python3.11/site-packages/statsforecast/models.py", line 62, in _add_fitted_pi
lo = res["fitted"].reshape(-1, 1) - quantiles * se.reshape(-1, 1)
^^^^^^^^^^
AttributeError: 'int' object has no attribute 'reshape'
File <command-993558187105440>, line 14
4 fcst = StatsForecast(models=models,
5 freq='M',
6 n_jobs=1,
7 fallback_model = SeasonalNaive(season_length = 12))
9 # fcst = StatsForecast(models=models,
10 # freq='M',
11 # n_jobs=1)
12
13 # FORECAST
---> 14 df_pred = fcst.forecast(df = df_train,
15 h = 1,
16 fitted = True,
17 level=[90])\
18 .withColumnRenamed('unique_id','CCLI_ID')\
19 .withColumnRenamed('ds','date')
File /databricks/spark/python/pyspark/sql/connect/client/core.py:2155, in SparkConnectClient._handle_rpc_error(self, rpc_error)
2140 raise Exception(
2141 "Python versions in the Spark Connect client and server are different. "
2142 "To execute user-defined functions, client and server should have the "
(...)
2151 "<https://docs.databricks.com/en/release-notes/serverless.html>" target="_blank" rel="noopener noreferrer"><https://docs.databricks.com/en/release-notes/serverless.html></a>.</span><span>"</span>
2152 )
2153 # END-EDGE
-> 2155 raise convert_exception(
2156 info,
2157 status.message,
2158 self._fetch_enriched_error(info),
2159 self._display_server_stack_trace(),
2160 ) from None
2162 raise SparkConnectGrpcException(status.message) from None
2163 else:
It's really confusing to me. The Databricks assistant suggested two different things, in different runs. The first is:
"The error occurs because the fallback_model in the StatsForecast is returning an integer instead of an array, which causes the reshape method to fail. To fix this, ensure that the fallback_model returns an array-like object that can be reshaped."
Then, in a second run, it suggested:
"The error occurs because the se variable is an integer, and the reshape method is being called on it, which is not valid. To fix this, ensure that se is a numpy array before calling reshape."
Can anyone help? I'm sort of in panic with this 😬
Many many thanks!!
Meanwhile, i removed the line fitted = True
and it worked well. But what if i really need the fitted values?Sai krishna Sirikonda
05/08/2025, 5:20 AMneuralforecast
library offers dedicated methods for saving and loading models. Does the hierarchicalforecast
library provide similar functionality for storing hierarchical forecasting models?
I would appreciate any insights on best practices for saving and restoring hierarchical forecasting models.
#save
nf.save(path='./checkpoints/test_run/',
model_index=None,
overwrite=True,
save_dataset=True)
#load
nf2 = NeuralForecast.load(path='./checkpoints/test_run/')
Y_hat_df2 = nf2.predict()
Y_hat_df2.head()
Thank you in advance for your support!