Maria Jose Arroyo Doria
11/22/2024, 12:42 AMGabe Richard
11/22/2024, 7:10 PMFugueBug Traceback (most recent call last)
Cell In[8], line 41
31 dask_client = Client()
32 engine = DaskExecutionEngine(dask_client=dask_client)
33 y_pred = sf.forecast(
34 df=ts_train,
35 h=ts_val.shape[0],
36 level=[90],
37 X_df=ts_val[[col for col in ts_val.columns if col not in ['y']]],
38 id_col='unique_id',
39 time_col='ds',
40 target_col='y',
---> 41 ).compute()
42 # Add actual values to forecast
43 forecast = y_pred.merge(ts_val, how='left', on=['unique_id', 'ds'])
File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/dask_expr/_collection.py:481, in FrameBase.compute(self, fuse, concatenate, **kwargs)
479 out = out.repartition(npartitions=1)
480 out = out.optimize(fuse=fuse)
--> 481 return DaskMethodsMixin.compute(out, **kwargs)
File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/dask/base.py:372, in DaskMethodsMixin.compute(self, **kwargs)
348 def compute(self, **kwargs):
349 """Compute this dask collection
350
351 This turns a lazy Dask collection into its in-memory equivalent.
(...)
370 dask.compute
371 """
--> 372 (result,) = compute(self, traverse=False, **kwargs)
373 return result
File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/dask/base.py:660, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
657 postcomputes.append(x.__dask_postcompute__())
659 with shorten_traceback():
--> 660 results = schedule(dsk, keys, **kwargs)
662 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/dask_expr/_expr.py:3799, in _execute_internal_graph()
3796 @staticmethod
3797 def _execute_internal_graph(internal_tasks, dependencies, outkey):
3798 cache = dict(dependencies)
-> 3799 res = execute_graph(internal_tasks, cache=cache, keys=[outkey])
3800 return res[outkey]
File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/dask_expr/_groupby.py:1193, in operation()
1191 if kwargs is None:
1192 kwargs = {}
-> 1193 return dask_func(
1194 frame,
1195 list(by),
1196 key=_slice,
1197 group_keys=group_keys,
1198 args=args,
1199 **_as_dict("observed", observed),
1200 **_as_dict("dropna", dropna),
1201 **kwargs,
1202 )
File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/dask_expr/_groupby.py:1229, in groupby_slice_apply()
1218 def groupby_slice_apply(
1219 df,
1220 grouper,
(...)
1227 **kwargs,
1228 ):
-> 1229 return _groupby_slice_apply(
1230 df,
1231 grouper,
1232 key,
1233 func,
1234 *args,
1235 group_keys=group_keys,
1236 dropna=dropna,
1237 observed=observed,
1238 **kwargs,
1239 )
File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/dask/dataframe/groupby.py:210, in _groupby_slice_apply()
208 if key:
209 g = g[key]
--> 210 return g.apply(func, *args, **kwargs)
File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/fugue_dask/execution_engine.py:151, in _map()
149 return PandasDataFrame([], output_schema).as_pandas()
150 pdf = pdf.reset_index(drop=True)
--> 151 pdf = _fix_dask_bug(pdf)
152 res = _core_map(pdf)
153 return res.astype(output_dtypes)
File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/fugue_dask/execution_engine.py:126, in _fix_dask_bug()
125 def _fix_dask_bug(pdf: pd.DataFrame) -> pd.DataFrame:
--> 126 assert_or_throw(
127 pdf.shape[1] == len(input_schema),
128 FugueBug(
129 "partitioned dataframe has different number of columns: "
130 f"{pdf.columns} vs {input_schema}"
131 ),
132 )
133 return pdf
File ~/.pyenv/versions/3.12.7/envs/env-3.12.7/lib/python3.12/site-packages/triad/utils/assertion.py:42, in assert_or_throw()
40 raise AssertionError()
41 if isinstance(_exception, Exception):
---> 42 raise _exception
43 if isinstance(_exception, str):
44 raise AssertionError(_exception)
FugueBug: partitioned dataframe has different number of columns: Index(['__fugue_serialized_blob__', '__fugue_serialized_blob_no__',
'__fugue_serialized_blob_name__'],
dtype='object') vs unique_id:long,__fugue_serialized_blob__:bytes,__fugue_serialized_blob_no__:long,__fugue_serialized_blob_name__:str,__fugue_serialized_blob_dummy__:long
Omri Kramer
12/17/2024, 9:44 PMimport numpy as np
import pandas as pd
np.random.seed(42)
n_periods = 98
seasonal_period = 7
dates = pd.date_range(start="2023-01-01", periods=n_periods, freq="D")
seasonality = np.sin(
np.linspace(0, np.pi / 2, seasonal_period)
)
seasonality_pattern = np.tile(seasonality, n_periods // seasonal_period)
baseline_level = 100
y_true = baseline_level * (1 + seasonality_pattern)
error_amplitude = 0.05
errors = np.random.normal(1, error_amplitude, n_periods)
y = y_true * errors
model = AutoETS(season_length=seasonal_period, model="MNM")
train = y[:91]
fitted_model = model.fit(train, np.arange(len(train)))
print(fitted_model.model_["method"])
forecasts = fitted_model.predict(h=7, level=[99])
print(pd.DataFrame(forecasts).assign(
d1=lambda x: x["hi-99"] - x["mean"],
d2=lambda x: -x["lo-99"] + x["mean"],
r1=lambda x: x["hi-99"] / x["mean"] - 1,
r2=lambda x: 1 - x["lo-99"] / x["mean"],
y=y[91:],
y_true=y_true[91:],
))
The result is:
ETS(M,N,M)
mean lo-99 hi-99 d1 d2 r1 \
0 97.220357 85.385598 109.055116 11.834759 11.834759 0.121731
1 125.941699 114.106940 137.776459 11.834759 11.834759 0.093970
2 147.232211 135.397451 159.066970 11.834759 11.834759 0.080382
3 172.418248 160.583488 184.253007 11.834760 11.834760 0.068640
4 184.106207 172.271447 195.940967 11.834760 11.834760 0.064282
5 198.262093 186.427333 210.096853 11.834760 11.834760 0.059693
6 201.371897 189.537137 213.206658 11.834760 11.834760 0.058771
r2 y y_true
0 0.121731 104.843225 100.000000
1 0.093970 121.463115 125.881905
2 0.080382 147.542534 150.000000
3 0.068640 167.363826 170.710678
4 0.064282 172.947760 186.602540
5 0.059693 199.503335 196.592583
6 0.058771 202.610553 200.000000
As you can see the difference between the intervals and the prediction stays the same while the ratio is getting smaller.
I am using statsforecast 2.0.0.Anthony Giorgio
12/18/2024, 8:56 PMthomas delaunait
01/09/2025, 10:48 AMAravind Karunakaran
01/13/2025, 11:19 AMMakarand Batchu
01/13/2025, 12:21 PMGuillaume GALIE
01/15/2025, 4:21 PMFile ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\statsforecast\models.py:5273, in MSTL.forward(self, y, h, X, X_future, level, fitted)
5269 res = self.trend_forecaster._add_conformal_intervals(
5270 fcst=res, y=x_sa, X=X, level=level
5271 )
5272 # reseasonalize results
-> 5273 seas_h = _predict_mstl_seas(model_, h=h, season_length=self.season_length)
5274 seas_insample = model_.filter(regex="seasonal*").sum(axis=1).values
5275 res = {
5276 key: val + (seas_insample if "fitted" in key else seas_h)
5277 for key, val in res.items()
5278 }
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\statsforecast\models.py:4999, in _predict_mstl_seas(mstl_ob, h, season_length)
4998 def _predict_mstl_seas(mstl_ob, h, season_length):
-> 4999 seascomp = _predict_mstl_components(mstl_ob, h, season_length)
5000 return seascomp.sum(axis=1)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\statsforecast\models.py:4992, in _predict_mstl_components(mstl_ob, h, season_length)
4990 mp = seasonal_periods[i]
4991 colname = seasoncolumns[i]
-> 4992 seascomp[:, i] = np.tile(
4993 mstl_ob[colname].values[-mp:], trunc(1 + (h - 1) / mp)
4994 )[:h]
4995 return seascomp
ValueError: could not broadcast input array from shape (10,) into shape (12,)
Here an example of 1 time serie to reproduce KO
import pandas as pd
from statsforecast import StatsForecast
from statsforecast.models import (Naive,MSTL)
from utilsforecast.data import generate_series
freq = 'MS'
season_length = 12
min_length = 25
df = generate_series(n_series=1, freq=freq, min_length=min_length, max_length=min_length)
sffcst = StatsForecast(models = [MSTL(season_length = [season_length])], freq = freq, n_jobs=-1,fallback_model=Naive(),verbose=True)
sf_crossvalidation_df=sffcst.cross_validation(df = df, h=12, step_size = 1, n_windows = 3, refit=False).reset_index(drop=True)
sf_crossvalidation_df
What is strange is that it works with less history => if you keep only 15Months of data then it doesn't crash
import pandas as pd
from statsforecast import StatsForecast
from statsforecast.models import (Naive,MSTL)
from utilsforecast.data import generate_series
freq = 'MS'
season_length = 12
min_length = 15
df = generate_series(n_series=1, freq=freq, min_length=min_length, max_length=min_length)
sffcst = StatsForecast(models = [MSTL(season_length = [season_length])], freq = freq, n_jobs=-1,fallback_model=Naive(),verbose=True)
sf_crossvalidation_df=sffcst.cross_validation(df = df, h=12, step_size = 1, n_windows = 3, refit=False).reset_index(drop=True)
sf_crossvalidation_df
Pradyumna Mahajan
01/16/2025, 6:54 AMPiero Danti
01/16/2025, 2:45 PMNaren Castellon
01/19/2025, 9:00 PM# Instantiate StatsForecast class as sf
sf = StatsForecast(
df = train,
models = models,
freq ='D',
n_jobs = -1)
# Train the model
sf.fit()
# Forecast
Y_hat = sf.predict(horizon)
Bersu T
02/05/2025, 10:59 AMFilipa Encarnação Louzeiro
02/10/2025, 5:09 PMSimon
02/15/2025, 1:52 PMSlackbot
02/20/2025, 11:54 AMVaibhav Gupta
02/24/2025, 3:02 AMSlackbot
02/25/2025, 10:34 AMRodrigo Sodré
03/09/2025, 9:22 PMsf,forecast
everything works just fine:
pred = sf.predict(h=horizon, df=train_df)
But for every new observation I have to append it to the dataframe and call forecast, which will train everything again. So I tried to change to `sf.fit + sf.predict`:
sf.fit(df=train_df)
# update train_df
pred = sf.predict(h=horizon, X_df=train_df)
but I'm getting the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File <timed exec>:12
File /opt/conda/lib/python3.11/site-packages/statsforecast/core.py:747, in _StatsForecast.predict(self, h, X_df, level)
742 warnings.warn(
743 "Prediction intervals are set but `level` was not provided. "
744 "Predictions won't have intervals."
745 )
746 self._validate_exog(X_df)
--> 747 X, level = self._parse_X_level(h=h, X=X_df, level=level)
748 if self.n_jobs == 1:
749 fcsts, cols = self.ga.predict(fm=self.fitted_, h=h, X=X, level=level)
File /opt/conda/lib/python3.11/site-packages/statsforecast/core.py:692, in _StatsForecast._parse_X_level(self, h, X, level)
690 expected_shape = (h * len(<http://self.ga|self.ga>), self.ga.data.shape[1] + 1)
691 if X.shape != expected_shape:
--> 692 raise ValueError(
693 f"Expected X to have shape {expected_shape}, but got {X.shape}"
694 )
695 processed = ufp.process_df(X, self.id_col, self.time_col, None)
696 return GroupedArray(processed.data, processed.indptr), level
ValueError: Expected X to have shape (96, 2), but got (92928, 3)
"*Expected X to have shape (96, 2), but got (92928, 3)*"
I don't understand why the shape isn't valid for this case if it's the same I trained b4 and it's working with forecast
. Am I using it incorrectly? Is there a proper way to call fit/predict
?
Thanks in advance.Simon
03/10/2025, 7:31 PMunique_id
to which an MSTL fitted model corresponds?
For the following, I do not find any identification:
sf.fitted_[0, 0].model_
Thank you for any hints!Ankit Hemant Lade
03/17/2025, 3:05 PMBersu T
03/18/2025, 2:56 PMSergio André López Pereo
03/26/2025, 3:35 AMGR
03/26/2025, 12:34 PMAlex Berry
03/31/2025, 5:30 PMSantosh Srivatsa
04/08/2025, 2:54 AMunique_id
.
Here’s a quick overview of my workflow:
1. Data Preparation:
â—¦ I create two DataFrames (train and test), both with columns: unique_id
, ds
, and y
.
â—¦ The train and test DataFrames have the same end date but different start dates.
2. Model Setup and Fitting:
â—¦ I initialize the StatsForecast
object with the AutoARIMA
model, setting parameters like season_length
and freq
.
â—¦ I call the .fit()
method on the training DataFrame.
3. Forecasting and In-Sample Predictions:
â—¦ I forecast using the .forecast(h=1)
method on the test DataFrame.
â—¦ I also use .forecast_fitted_values()
after fitting to retrieve in-sample predictions, and I flag anomalies based on the level
prediction intervals by checking whether actuals fall outside the expected range.
I’m doing this because I’m specifically trying to detect missed transmissions, meaning there may not be a value at a fixed point in the future—so a direct forecast of future values isn’t always meaningful. Instead, I’m comparing the model’s in-sample expectations against actuals.
Additionally, when I try to introduce exogenous variables, I’m running into an invalid shape error when calling .forecast()
. I suspect this might be because the test dataset doesn’t have the same number of rows for each unique_id
.
Would love any guidance on whether this overall workflow makes sense or how I might improve it—especially around incorporating exogenous variables or detecting anomalies more robustly.
Thanks in advance for your insights!Sergio André López Pereo
04/09/2025, 7:39 PMMariana Menchero
04/09/2025, 7:57 PMIching Quares
04/10/2025, 2:49 PMspec <- ugarchspec(variance.model = list(garchOrder = c(1, 1)),
mean.model = list(armaOrder = c(final.order[1], final.order[3]), include.mean = TRUE),
distribution.model = "std",
fixed.pars = fixed_pars_df0)
Ankit Hemant Lade
04/11/2025, 2:38 AMIHAS
04/11/2025, 3:36 PM