Abishek
04/21/2024, 6:13 AMtransform(generate_data(20), forecast, partition={"num":500, "by":"unique_id"}).show()
throws error. can anyone help.
error
8 ERROR ArrowPythonRunner: Python worker exited unexpectedly (crashed)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/anaconda3/envs/bigdata/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/worker.py", line 1225, in main
eval_type = read_int(infile)
Abishek
04/21/2024, 7:06 PMBharath Vishal G
04/22/2024, 6:42 PMStatsForecast
that could help me show explainability for StatsForecast models, , Appreciate any resources/direction?Dimitris Floros
04/23/2024, 2:02 AMdf_new
in predict, is there something similar in statsforecast?Jeff Tackes
04/25/2024, 12:20 AMYan Liu
04/26/2024, 4:40 AMnmodels
from 5 to 4 significantly impact training time?
We're training AutoARIMA for 8,000 - 12,000 time series models using the AutoARIMA instance specified as below, but it takes very long time and for some instances we saw this refitting (while ARIMA is almost instantaneous)
auto_arima_model = [AutoARIMA(season_length=7, nmodels=5, trace=True)]
Marsellino Prawiro Halim
05/02/2024, 2:35 AM!pip3 install -U statsforecast==1.7.4
!pip3 install -U ray==2.20.0
!pip3 install -U dask[dataframe]==2023.8.1
import os
os.environ['NIXTLA_ID_AS_COL'] = '1'
from statsforecast.core import StatsForecast
from statsforecast.models import (
AutoARIMA,
AutoETS,
)
from statsforecast.utils import generate_series
from sklearn.preprocessing import RobustScaler
max_orders = 30
FIT_SETTINGS = dict(
d=None,
D=None,
max_order=max_orders,
max_p=max_orders,
max_d=max_orders,
max_q=max_orders,
max_P=max_orders,
max_D=max_orders,
max_Q=max_orders,
start_p=2,
start_q=2,
start_P=2,
start_Q=2,
test="kpss",
stepwise=True,
method='lbfgs',
seasonal=True,
)
n_series = 1000
horizon = 180
models = [AutoARIMA(season_length=7, **FIT_SETTINGS)]
series = generate_series(n_series, min_length=365*3, max_length=365*3)
sf = StatsForecast(
verbose=True,
models=models,
freq='D',
n_jobs=-1
)
# IT RUNNING SMOOTHLY
p_statsforecast = sf.forecast(df=series, h=horizon)
import ray
import logging
ray.init(logging_level=logging.ERROR)
series = series.reset_index()
series['unique_id'] = series['unique_id'].astype(str)
ctx = ray.data.context.DatasetContext.get_current()
ctx.use_streaming_executor = False
ray_series = ray.data.from_pandas(series).repartition(150)
#IT CAN'T BE RUN
p = sf.forecast(df=ray_series, h=horizon)
Blauzo
05/06/2024, 11:23 PMMatthew Lesko
05/07/2024, 7:46 PMfrom statsforecast import StatsForecast
from statsforecast.models import MSTL, AutoARIMA, AutoCES, AutoETS, AutoTheta
season_length = 24
models = [
AutoARIMA(season_length=season_length),
AutoCES(season_length=season_length),
AutoETS(season_length=season_length),
AutoTheta(season_length=season_length),
MSTL(
season_length=[24, 24 * 7], # (24) hours/day, (24 * 7) days/week
trend_forecaster=AutoARIMA(), # model used to forecast trend
alias='MSTL_AutoARIMA'
),
MSTL(
season_length=[24, 24 * 7], # (24) hours/day, (24 * 7) days/week
trend_forecaster=AutoCES(), # model used to forecast trend
alias='MSTL_AutoCES'
),
MSTL(
season_length=[24, 24 * 7], # (24) hours/day, (24 * 7) days/week
trend_forecaster=AutoTheta(), # model used to forecast trend
alias='MSTL_AutoTheta'
)
]
sf = StatsForecast(
models=models, # model used to fit each time series
freq='H', # frequency of the data
n_jobs=-1
)
import warnings
n_windows = 26
crossvalidation_df = sf.cross_validation(
df=features_melted_df,
h=24*7*4,
step_size=24*7,
n_windows=n_windows,
)
The output I get when I run the above in a Jupyter notebook is confusing in terms of understanding progress.Matthew Lesko
05/07/2024, 7:47 PMCross Validation Time Series 1: 0%| | 0/26 [1:11:27<?, ?it/s] 1910.70s/it]
Cross Validation Time Series 1: 54%|█████▍ | 14/26 [12:21:36<10:35:39, 3178.33s/it]
Cross Validation Time Series 1: 100%|██████████| 26/26 [17:06:26<00:00, 2368.70s/it]t]]]
Cross Validation Time Series 1: 100%|██████████| 26/26 [18:12:28<00:00, 2521.10s/it]it]
Cross Validation Time Series 1: 88%|████████▊ | 23/26 [19:03:48<3:06:55, 3738.45s/it]
Matthew Lesko
05/07/2024, 7:49 PMCross Validation Time Series 2:
and then gets overwrit with Cross Validation Time Series 1
. So it seems to be impossible to understand where progress is really at.DANIEL KIM
05/11/2024, 5:50 PMEvgeniy Riabenko
05/13/2024, 8:32 AMDavid Rice
05/20/2024, 8:55 PMMatthew Lesko
05/21/2024, 8:53 PMStatsForecast.cross_validate
to pass along exogenous variables? `StatsForecast.forecast`` has the X_df
parameter, but I'm not seeing an equivalent solution for cross validation.Stacey Christensen
05/22/2024, 3:23 PMRyan Shuhart
05/23/2024, 8:43 PMValeriy
05/30/2024, 10:42 AMTracy Teal
06/03/2024, 11:55 PMNaren Castellon
06/05/2024, 9:43 PMFarzad E
06/06/2024, 1:49 AMmodels = [
AutoCES(model='S', season_length=52)
]
sf = StatsForecast(
df=df_train,
models=models,
freq='W',
fallback_model=SeasonalNaive(season_length=52)
)
frcst_df = sf.forecast(h=52, level=[95], X_df=df_test)
Natasha Watkins
06/07/2024, 12:51 AMNaren Castellon
06/08/2024, 3:08 PMmodels = [AutoARIMA(season_length=season_length),
SeasonalNaive(season_length=season_length),
SklearnModel(Lasso()),
SklearnModel(Ridge()),
SklearnModel(RandomForestRegressor())
]
# Forecast
preds = sf.forecast(
df = train,
h = 120,
X_df = test, # Exogenous variables
prediction_intervals = ConformalIntervals(n_windows = 5, h = 120),
level = [95],
)
# Cross Validation
sf.cross_validation(df = train, h = 120, n_windows = 5)
Natasha Watkins
06/10/2024, 6:23 PMunique_id
to the fitted models? I'm trying to list out the arima_string
for every model.Makarand Batchu
06/12/2024, 4:45 PMmodel.predict()
to return the forecasts of only a specific unique_id
that it was trained on. For example, if model
was trained on data for unique_id's
1, 2, 3; is there a way for model.predict()
to return the forecasts for only a subset of the unique_ids
?Jeff Tackes
06/13/2024, 1:56 PMthomas delaunait
06/14/2024, 8:33 PMDidier Merk
06/18/2024, 12:57 PMfrom statsforecast.models import MSTL, AutoARIMA
mstl_model = [MSTL(
season_length=[7, 30, 365], # seasonalities of the time series
trend_forecaster=AutoARIMA() # model used to forecast trend
)]
sf = StatsForecast(models=mstl_model, freq='D')
sf.fit(df=Y_train_df)
# Make prediction
preds = sf.predict(h=30)
As expected, when I test this on a smaller subset of 100 time series it already takes quite some time (around 3 minutes). Is there an efficient way to do a statistical forecast on this many time series (each possibly with their own trends, seasonality, residuals)?
In the documentation it states for example that:
Use theThis links to the TimeGPT model. Is this an indication that it will simply take too long to do forecasting for this many time series? Or do you have any other recommendations for statistical benchmarks on this amount of time series? Thanks in advance!method to fit each model to each time series. In this case, we are just fitting one model to one series. Check this guide to learn how to fit many models to many series.fit
sergio lopez
06/25/2024, 12:48 AMDidier Merk
06/26/2024, 4:15 PM# Initialize the model
mstl_model = [MSTL(
season_length=[7, 30, 365], # seasonalities of the time series
trend_forecaster=AutoARIMA() # model used to forecast trend
)]
auto_arima_model = StatsForecast(models=mstl_model, freq='D')
# Fit the model
auto_arima_model.fit(df=Y_train_df)
# Make prediction
arima_model_pred = auto_arima_model.predict(h=30)
For AutoETS:
# Initialize the model
mstl_model_ets = [MSTL(
season_length=[7, 30, 365],
trend_forecaster=AutoETS(model=["Z", "Z", "N"])
)]
auto_ets_model = StatsForecast(models=mstl_model_ets, freq='D')
# Fit the model
auto_ets_model.fit(df=Y_train_df)
# Make prediction
ets_model_pred = auto_ets_model.predict(h=30)
My goal is simple, use ARIMA and Exponential Smoothing on a very large set of time series. From the documentation I am trying to determine whether this is the correct approach, however the models seem to very often give almost identical predictions (see the image below for two examples). Maybe I am not understanding the theory behind both models correctly, but is this behaviour expected? Thanks for the help and the amazing library!