# statsforecast

- n
Naren Castellon

02/27/2024, 12:07 AMI am training the following model and I have the following case 1- with the following model:`season_length = 12 # Monthly data`

`sf = StatsForecast(`

`df=train,`

`models=[AutoARIMA(season_length=season_length), Naive(), TBATS(seasonal_periods=12), MSTL(season_length=[12], trend_forecaster=AutoARIMA())],`

`freq="MS",`

`n_jobs=-1,)`

error: ValueError: sample size is too short to use selected regression component

2-with the following model:`season_length = 12 # Monthly data`

`sf = StatsForecast(`

`df=train,`

`models=[AutoARIMA(season_length=season_length), Naive(), TBATS(seasonal_periods=12), MSTL(season_length=[12], trend_forecaster=AutoARIMA())],`

`freq="MS",`

`n_jobs=-1,`

`fallback_model = SeasonalNaive(season_length=season_length),`

In this case the predictions produce many null values for me`)`

j- 2
- 14

- k
Kyle Schmaus

02/27/2024, 9:40 PMIs there a way to "update" a (say) ETS model with new endogenous data, without retraining? I'm imagining there should be a way to plumb new endogenous values and update the "state" array in the

attribute. I'm not seeing anything written in the package though đ . I vaguely remember reading about a method for this, but maybe that was with a different package.`model_`

j- 2
- 2

- n
Naren Castellon

02/28/2024, 2:52 PMI'm training a couple of models, and I'm having the following problem: train_exo=set.loc[set['ds'] <= '2023-06-01'] test_exo=set.loc[set['ds'] > '2023-06-01'] test_exo.drop("y",axis=1, inplace=True) train_exo.shape, test_exo.shape Any ideas or suggestions!!!j- 2
- 2

- v
Valeriy

03/03/2024, 8:05 AMInteresting post about the speed of statsforecast, what is peopleâs experience re speed especially cross validation. A thorough open benchmarking on large scale vs R might help https://www.linkedin.com/posts/thomas-matcham-52868948_the-extreme-cost-of-python-ive-taken-over-activity-7168880575829774338-Mpq7?utm_source=share&utm_medium=member_desktopj- 2
- 3

- m
Mike C

03/05/2024, 6:11 PMHi all - I've been experimenting with MSTL and am wondering if I'm doing something wrong or just dont understand how it works behind the scenes. I tried a few different trend forecasters and was expecting to see different results in the decomposition but it seems to be giving the same back in each case.. Here's an example:Copy code

... then I'll run this:`models = [ MSTL(season_length=[24, 24*7], trend_forecaster=SimpleExponentialSmoothing(alpha=0.5), alias='SES'), MSTL(season_length=[24, 24*7], trend_forecaster=WindowAverage(window_size=24*7), alias='WinAvg'), MSTL(season_length=[24, 24*7], trend_forecaster=AutoETS(model='AAN'), alias='AutoETS'), ] sf = StatsForecast(models=models, freq='H', n_jobs=-1) sf.fit(df=subset_df)`

Copy code

and will seeing the following:`pd.concat([sf.fitted_[0,m].model_['trend'].rename(f'trend{m}') for m in range(3)], axis=1).tail(10)`

j- 2
- 4

- m
Mike C

03/05/2024, 6:12 PMIs this the intended output? - c
Clarisse Chia

03/07/2024, 2:55 AMhi all, iâm trying to get the

up and running on pyspark, but have running into the`statsforecast`

error, despite having installed`ModuleNotFoundError: No module named 'fugue'`

. i was wondering someone would be willing to help me troubleshoot/chat through what i might not be thinking about. for context, iâm running on python 3.8, pyspark 3.2.1, and scala2.12`fugue`

j- 2
- 4

- t
Tung Nguyen

03/07/2024, 10:29 AMHi Nixtla team, may I know what happens when I don't specify the season_length in the models? Do the models still try to capture seasonality if there are any? I look at one of the series and there's no clear seasonal pattern. If there is, probably monthly or quarterly at best. # Initialize the models models = [ AutoARIMA(), AutoETS(damped=True), DynamicOptimizedTheta() ]. I have weekly data. I've tried m = 52 but got error x must have 2 complete cycles requires 104 observations. x only has 96 observation(s). I think m = 26 also got similar error as well. I'm not sure about m = 13 yet. There are almost 20,000 series ranging from 5 data points to 157 data points.j- 2
- 29

- t
Thiago Vidigal

03/09/2024, 1:07 PMHi everybody! I'm have a troble whem I try to obtaim the in-sample forecasts from my model. The code works fine whithout the arg "fitted" in forecast module, but with arg the code broke and rose this exeption "*NotImplementedError*: return fitted", can anyone help me with this? This is my notebook and my data. - m
Makarand Batchu

03/14/2024, 2:19 PMHi all. I'm trying to fit a model using statsforecast and when I run the below code I get an error - "ImportError: Numba needs NumPy 1.24 or less". Is this as expected or do I need to have a lower version of numpy to get it working?Copy code`from statsforecast.models import ( MSTL ) # Create a list of models and instantiation parameters models = [ MSTL(season_length = [7,31]) ]`

j- 2
- 13

- m
Makarand Batchu

03/14/2024, 4:10 PMHi team. I have a quick question on the 'horizon' parameter of statsforecast. By default, based on the number passed for 'horizon', the model returns predicted values from the next interval from the last interval in training data. Is there a way to modify this? As in assuming that the freq param is days and my model was trained on data till 13/03. For h = 31, by default model.predict() returns predictions from 14/03. Is there a way that model.predict() returns predictions from a custom date other than 14/03? Thanks in advance!m- 2
- 2

- b
Brian Head

03/14/2024, 5:04 PMDoes statsforecast distributed work with both pyspark.sql.dataframe.DataFrame and pyspark.pandas.frame.DataFrame? I see documentation for the former. I've been trying to get the latter to work and haven't had success and I'm wondering if it is what I'm doing or it's not built to work with that.jc- 3
- 15

- v
Valeriy

03/15/2024, 7:22 PMI am getting repeated error when trying to use âfitted=Trueâ for further extraction of forecasts in sample. Pretty sure the same code worked on another dataset before. The fit is done as in â# initialise and train the model sf = StatsForecast(models=models, freq=âMâ, n_jobs=-1, fallback_model = HistoricAverage()) sf.fit(train_df)'m- 2
- 3

- c
Clarisse Chia

03/18/2024, 3:02 PMhi team,**problem context**: iâm trying to forecast for many (200k to 1M+) series with known weekly**and**holidays/special dates seasonality patterns in databricks.**what iâve tried:**for holidays that fall on the same day-of-week, iâve been able to cut ân=5 pieces of weeksâ from each year and join them together to create a artificial timeline, specifying the

*`n-week`

seasonality to try to capture the holiday/special date seasonality effect. below is how iâve set up the modeling problem (will add example of modeling code setup in thread)`7 days`

**would love advice on the following pieces:**1. how to speed up

, or more specifically, writing the`.forecast()`

output? a. context: itâs currently taking anywhere from 5 to 17 hours, depending on what exogenous features i pass in for the ~200k series, despite the shortened âartificial timelineâ (vs. full yearâs timelines for each past year) 2. how do i reframe the problem such that iâm able to capture holiday/special date effect without having to create this âartificial timelineâ? a. context: itâs working when iâm setting up the timeline to capture holiday/special dates that fall on the same âday of weekâ every year, but i worry about that same ability for holiday/special dates that do not fall on the same âday of weekâ thanks in advance!!`.forecast()`

j- 2
- 9

- m
Makarand Batchu

03/19/2024, 11:15 AMHi team. I want to understand a bit more about the

parameter in models of statsforecast. I understand that I have to pass`prediction_intervals`

which takes`ConformalIntervals`

and`horizon`

but can someone explain what this all means and how it can be used help me improve forecasts? And how it is different from when nothing is passed for`n_windows`

? Thank you in advance.`prediction_intervals`

jr- 3
- 3

- b
Brian Head

03/21/2024, 1:41 PMHi, I'm working through a conversion of using Statsforecast from the local version to using distributed processing with Spark/Fugue. I've gotten the fill_gaps, mstl_decomposition (with Jose's help), and cross-validation working. However, when I get to

I get the error in the attached screen shot. Pertinent details: âą I've been starting with samples of data (with seeds for consistency) and then will remove that when ready to scale up to the full dataframe.`forecast`

actually does work with samples under 5% (less than ~75 series with 48 monthly observations for training and 3 for forecasting). But, when I increase the frac to 0.05 I get this error. âą Given the error message, I thought it might be an issue with some of the data pulled in after the increase. However, I have done a couple of things I think rule that out âŠ Displayed the data and looked through it. Everything looked fine. âŠ Pulled it back down to regular pandas dataframe and ran everything that way. It works fine then with no errors--even when increasing the sample to 50%. Before going to our data engineers, I wanted to check if there's any other thoughts or suggestions. They are helpful with many things, but they aren't familiar with Statsforecast, so wanted to rule ou any other things before pulling them in. Thanks for any help you can provide.`forecast`

jc- 3
- 13

- b
Brian Head

03/21/2024, 1:48 PMimage.png - b
Brian Head

03/21/2024, 2:13 PMBTW, this is after the forecast function runs for ~23 minutes. Something it does locally in 1.9 seconds. - j
Jeff Tackes

03/24/2024, 2:54 AMHi All, Any insight into why i get flat forecast for ETS (and AutoETS, and near flat with AutoARIMA). I am working with 30min frequency data, and have 2 years of training data. I am loading my season_length =48*7.Copy code

My data has enough fluctuation where i would have thought there would be better "movement". When i run ETS using DARTS, i do not get a flat forecast and get cyclic patterns showing in my forecast. Additionally, when i run ETS in NIXTLA, it takes several minutes whereas in DARTS it took 26 seconds.`sf = StatsForecast( models = [ETS(season_length=48*7)], freq = "30min" ) sf.fit(ts_train, id_col = 'LCLid', time_col = 'timestamp', target_col = 'energy_consumption', ) sf.predict(h=48)`

- 1
- 1

- m
Makarand Batchu

03/25/2024, 2:43 PMHi team. I am trying out the cross validation functionality in statsforecast. Can you please explain

,`h`

and`n_windows`

with an example? As it is unclear as to how to chose these parameter values. Thanks in advance!`step_size`

jm- 3
- 2

- b
Brian Head

03/26/2024, 6:55 PMI have two questions about "local" processing vs distributed processing with spark. 1. Can anyone offer any guidance on optimizing distributed processing, in part repartitions? I've got mine ordered and then repartition trying between 50 and 150 (by unique_id) across 8-12 cores in Databrick. When the system isn't loaded with other work (from co-workers) it runs successfully. However, a. Oddly the 5 fold CV I'm using runs much faster than both the

function and`forecast`

. For example, on my local laptop the CV and forecast functions run for approximately the same amount of time and the extraction of fitted values takes only a few seconds. However, when using spark in Databricks, the forecast and forecast_fitted values functions take about 3-4 times as long as the CV. Is that normal behavior? I'm wondering if it might have anything to do with the partitioning. b. I've read some sources that say there should be 3-4 partitions per core. However, that's not realistic at all for my situation given the resources my team and I have. Is there any other guideline for the number of partitions? 2. I can understand that for non-statistical models I might get slightly different results. However, assuming I've got the exact same data, I should get the same results when training and forecasting with a statistical model no matter the processing type (e.g., local or distributed) and environment (e.g., laptop vs something like Databricks), right?`forecast_fitted_values function`

j- 2
- 40

- c
Clarisse Chia

03/27/2024, 2:53 PMHi team, I have a question regarding sudden unexpected forecasts. I have been running the forecast with the same parameters with no issues, but I have recently been getting all either

or`0`

forecasts and was wondering what might be going wrong. the dataset im working with is sensitive, but if helpful, below is the simple model setup`null`

Copy code

the model has been working quite well until recently, when i changed how one exogenous variable would look in the future forecast (within `test_x`; based on business assumptions)`from statsforecast import StatsForecast from statsforecast.models import SeasonalNaive, AutoARIMA # configure model models = [AutoARIMA(season_length=7, nmodels=5, trace=True)] statsforecast = StatsForecast(models=models, freq="D", fallback_model=SeasonalNaive(season_length=7), n_jobs=-1) # forecast horizon = test_x.select('ds').dropDuplicates().count() forecast_results = statsforecast.forecast(df=train_set, h=horizon, X_df=test_x)`

m- 2
- 3

- v
Valeriy

04/02/2024, 3:50 PMI am producing prediction intervals with specified levels array([0.95, 0.9 , 0.8 , 0.7 , 0.6 , 0.5 , 0.4 , 0.3 , 0.2 , 0.1 ]) in the results columns come with imprecise numbers for some reason.j- 2
- 3

- v
Valeriy

04/04/2024, 2:19 PMIs there anyway to get rid of this warning when importing statsforecast?j- 2
- 2

- j
Jeff Tackes

04/04/2024, 8:44 PMDo others have the same experience that THETA methods in statsforecast are very slow? I am working with 30min data, and in DARTS theta takes <1 sec. In Nixtla, it takes 2 minutes for a single time series. It is a large time series, with 35,000 records.đ 1m- 2
- 5

- c
Clarisse Chia

04/04/2024, 10:29 PMhi team, a forecast model that iâve been building has been forecasting extreme values (e.g., forecasts negative/positive quadrillion when it should be forecasting ~10 millions) and i was wondering if there is something i should understand about how

uses the exogenous variables we feed it.`AutoARIMA(season_length=7)`

**context on model setup:**1. ~4 years of complete daily sales history 2. exogenous variables a. covid indicators b. day of week indicators c. day of week * holiday indicators i. idea here is to capture sales peak for each holiday, especially when a holiday falls on a different day of week each year ii.this is where i notice, that when i have multiple holidays fall really close to each other (e.g., superbowl/st. patricks/easter), the forecasts can output some pretty extreme and unreasonable values 1. i wonder if the exogenous variables may be multiplicative (rather than additive, causing these extremely values to populate when these indicators fall on the same dates?) would really appreciate it if folks have any suggestions of what I might be missing!**[problem]**j- 2
- 1

- v
Valeriy

04/05/2024, 1:59 PMI have an issue with AutoARIMA and even ARIMA crashing due to memory issues on one time series, is this a known issue/any workarounds? Notebook crashed both on laptop and also Colab with high memory. Setup with external variables.j- 2
- 5

- v
Valeriy

04/06/2024, 1:04 PMIs there AutoSARIMAX in statsforecast?jm- 3
- 17

- v
VĂtor Barbosa

04/18/2024, 4:32 PMHi team, are there any parameters or tips to speedup AutoARIMA or AutoETS?m- 2
- 4

- n
Nils de Korte

04/19/2024, 11:07 AMHi team, I am using AutoTheta as a trend_forecaster for MSTL. It chooses the best Theta model automatically, but how do I know which one it chooses? And what the scores of the others are? Thanks!