This message was deleted Nixtla Community #general

Join Slack

This message was deleted.

# general

Slackbot

01/19/2023, 6:06 PM

This message was deleted.

Farzad E

01/19/2023, 6:12 PM

This is my code:

Copy code

sf = StatsForecast(
    df=df,
    models=models,
    freq='W',
    n_jobs=-1,
    ray_address='10.10.10.110:6379'
)

forecasts_df = sf.forecast(h=52, level=[90])

I don't have a yaml file though. I start my ray cluster on my EC2 instance and then pass the address to StatsForecast.

fede (nixtla) (they/them)

01/19/2023, 6:38 PM

Hi @Farzad E! Thank you for using statsforecast. Speed using a cluster is typically achieved when you have many time series, usually more than available cpus. In your example, since you are handling only 10 time series, using a ray cluster may be less useful. Running your code on a c6a.8xlarge instance with

n_jobs=-1

might be best. StatsForecast uses a map reduce approach. So if you have 10 time series and 32 cores available, statsforecast will use 10 cores to train (one for each series). The training speed of those time series will depend on the models used and the length of the series. For example, models like autoarima in very long time series (more than 100 observations) are usually very slow. Models like MSTL tend to be faster.

Farzad E

01/19/2023, 7:06 PM

@fede (nixtla) (they/them) I am using AutoARIMA with weekly data of 7 years so the length of the series is 364 and my horizon is 52 weeks. Thanks a lot for your explanation but one question though. How did your example on the m5.2xlarge with 8 cores performed so well? That also used AutoARIMA and had millions of series. Is it because the forecast horizon was short in your case (7 days)?

fede (nixtla) (they/them)

01/19/2023, 7:28 PM

In that case, using the MSTL model (even the Theta or ETS) is probably better. Large seasonalities (as in the weekly case) are often detrimental in time to the autoarima model. Your first intuition is correct about the blog post: 250 EC2 instances of 8 cores each were deployed to obtain the cluster with 2000 cpus. And the experiment with the millions of time series used the 2000 cpus, hence the good performance in time. The horizon is not a problem once you fit the model.

👍 1

Farzad E

01/19/2023, 7:30 PM

Thanks a lot. That clarified the details.

👍 1

7 Views

Open in Slack

Previous Next