Hi again : ), I wonder if statistical models from ...
# statsforecast
m
Hi again : ), I wonder if statistical models from statsforecast are suited for parallelization ? • I set up the StatsForecast object as follows:
Copy code
models_stats = [
    AutoCES(season_length=96), 
    AutoETS(season_length=96)
    ]

# Instantiate StatsForecast class with the models
sf = StatsForecast(
    models=models_stats,
    freq='15T',
    n_jobs = 7 # NOTE: n_jobs instead of num_threads
)

sf.fit(df_train)
• yet I do not see more CPUs being utilized and the fitting takes very long time. • num_threads in mlforecast works amazingly well, so I wonder maybe this is perhaps due to the nature of statsforecast algos, that they cant be parallelized as well as e.g. the lgbm ? • (I do run the analysis in jupyter notebook but I doubt that is the culprit) Thanks and have a great weekend.
k
I don’t believe there is parallelization per model. This will take 2 threads because you have 2 models. The problem is that it’s tricky to expose parallelism both for one model, and across models. Imagine you have 2 LightGBM jobs running on the same machine, and the n_jobs=-1. They can cause hanging because each job will try to occupy all the cores, causing deadlocks. You can’t really parallelize models that also run in parallel. So in this same thinking, it’s hard to parallelize
AutoETS
and
AutoCES
. Statsforecast parallelization is done across models. I think what you can do is break these up into multiple
AutoETS
jobs and
AutoCES
jobs and that should parallelize.
👀 1
nixtlablack 1
j
Just a small correction to Kevin's answer, the parallelization is done by series. So we take all the series, split them into
n_jobs
chunks and process each one in parallel. If you have less than 7 series you won't see a linear speedup and if you have one nothing will change.
k
Ah yes thanks for catching!