Hello <!here> Nixtla team -- exploring what appear...
# general
w
Hello <!here> Nixtla team -- exploring what appears to be an excellent library. I have been pleased with the consistency of results generated between
R
and
StatsForecast
but I have a question about performance. In brief, I'm seeing performance that is about 3-4x slower than
R
when applying AutoARIMA across data windows of varying sizes. This pseudocode:
Copy code
model = AutoARIMA(method="CSS-ML")
sf = StatsForecast(models=[model]l, freq="B")
for dataset in datasets:
  model_start_time = time.time()
  sf.fit(df=dataset)
  print(f"Model Fit Time: {(time.time() - model_start_time) * 1000}msec")
... gives me approximately 75 msec per loop iteration (call to
sf.fit
). This is not bad, per se, but
R
running the
forecast
auto.arima
model completes the same task in 20 msec. What surprises me is that I see the same performance of 75 msec per iteration if I initialize
sf
within the loop like this:
Copy code
for dataset in datasets:
  model_start_time = time.time()
  model = AutoARIMA(method="CSS-ML")
  sf = StatsForecast(models=[model]l, freq="B")
  sf.fit(df=dataset)
  print(f"Model Fit Time: {(time.time() - model_start_time) * 1000}msec")
I was expecting to see a slowdown due to re-initializing the StatsForecast model and the initial Numba compilation step happening each time, as compared to just once. Because I don't see any evidence of improvement due to Numba JIT compilation, I am wondering... am I missing something obvious that would make StatsForecast faster (ideally) or at least as fast a
R
?
j
Hey. This isn't exactly a 1-1 comparison, since the StatsForecast class does a lot more stuff during fit like checking the data is sorted, saving the dates, saving the ids, etc. If you just want to run that single model a better comparison would be to just use the AutoARIMA model directly on the target (a numpy array), as you're doing in the R case. The numba compilation happens once the first time the arima code runs, subsequent calls reuse the compiled code.
w
Cool, thank you!
Trying that now... i did notice that leaving off the
allowmean=True
argument speeds things up to about 40 msec per iteration... however that method is required for seeing results that most closely agree with my existing
R
code.
... test set going directly to the AutoARIMA function looks like this:
Copy code
model = AutoARIMA(max_p=5, max_q=5, allowmean=True, max_order=6, stationary=True, seasonal=False, method="CSS-ML")

for dataset in datasets:
  model_start_time = time.time()
  fit = model.fit(nixdata['y'].values)
  print(f"Model Fit Time: {(time.time() - model_start_time) * 1000}msec")
If that looks correct, my best performance is improved to 14.49 models / sec, or about 69 msec per model. Good new is StatusForecast does not introduce much overhead... bad news is I'm not seeing performance boost relative to R that we were hoping for.
j
I see. Most of the time is spent on the optimization process and scipy's minimize is a bit different than R's optim, so sometimes one converges faster than the other, it really depends on the data. If you set
trace=True/TRUE
can you see how many models are being trained? Sometimes the optimization chooses a different path and ends up training more models in our implementation
w
Sample 1 SM:
Copy code
ARIMA(2,0,2) with non-zero mean:inf
ARIMA(0,0,0) with non-zero mean:-119.13684223813658
ARIMA(1,0,0) with non-zero mean:-116.39700069281186
ARIMA(0,0,1) with non-zero mean:-116.25848571876172
ARIMA(0,0,0) with zero mean    :-120.84459834669217
ARIMA(1,0,1) with non-zero mean:-114.06636833061931
ARIMA(1,0,1) with non-zero mean:-114.06636833061931
Now re-fitting the best model(s) without approximations...
ARIMA(0,0,0) with zero mean    :-120.84459834669217
Sample 1 R:
Copy code
ARIMA(2,0,2) with non-zero mean : Inf
 ARIMA(0,0,0) with non-zero mean : -119.1368
 ARIMA(1,0,0) with non-zero mean : -116.397
 ARIMA(0,0,1) with non-zero mean : -116.3999
 ARIMA(0,0,0) with zero mean     : -120.8446
 ARIMA(1,0,1) with non-zero mean : Inf
 Best model: ARIMA(0,0,0) with zero mean
• Interesting that ARIMA(1,0,1) with non-zero mean is calculated twice by SM, and both times gets -118; whereas R gets Inf and only calculates once • SM recalculates best model "without approximations", perhaps R is not doing that?
Sample 2 SM:
Copy code
ARIMA(2,0,2) with non-zero mean:inf
ARIMA(0,0,0) with non-zero mean:-169.66599650902512
ARIMA(1,0,0) with non-zero mean:-168.12558296240783
ARIMA(0,0,1) with non-zero mean:-169.41362566072397
ARIMA(0,0,0) with zero mean    :-168.6170628653839
ARIMA(1,0,1) with non-zero mean:-168.3648565420355
ARIMA(1,0,1) with non-zero mean:-168.3648565420355
Now re-fitting the best model(s) without approximations...

ARIMA(0,0,0) with non-zero mean:-169.66599650902512
Sample 2 R:
Copy code
ARIMA(2,0,2) with non-zero mean : Inf
 ARIMA(0,0,0) with non-zero mean : -169.666
 ARIMA(1,0,0) with non-zero mean : -168.1256
 ARIMA(0,0,1) with non-zero mean : -169.4986
 ARIMA(0,0,0) with zero mean     : -168.6171
 ARIMA(1,0,1) with non-zero mean : Inf
 Best model: ARIMA(0,0,0) with non-zero mean
• Occasionally R converges in fewer iterations • R gets more Infs
j
Are you able to share a single serie to reproduce this? It's interesting that the second case starts from an entirely different model
w
Yes... data for Sample 2 forthcoming...
From R:
Copy code
Y
2013-01-03 -2.087795e-03
2013-01-04  4.853300e-03
2013-01-07 -3.128003e-03
2013-01-08 -3.247639e-03
2013-01-09  2.652345e-03
2013-01-10  7.568700e-03
2013-01-11 -4.751511e-05
2013-01-14 -9.311049e-04
2013-01-15  1.128033e-03
2013-01-16  1.969725e-04
2013-01-17  5.627061e-03
2013-01-18  3.397492e-03
2013-01-22  4.418332e-03
2013-01-23  1.506342e-03
2013-01-24  6.614662e-06
2013-01-25  5.430709e-03
2013-01-28 -1.851334e-03
2013-01-29  5.093004e-03
2013-01-30 -3.907245e-03
2013-01-31 -2.566592e-03
2013-02-01  1.000251e-02
From SM:
Copy code
0 Y 2013-01-03 00:00:00-05:00 -2.08780e-03
1 Y 2013-01-04 00:00:00-05:00 4.85330e-03
2 Y 2013-01-07 00:00:00-05:00 -3.12800e-03
3 Y 2013-01-08 00:00:00-05:00 -3.24764e-03
4 Y 2013-01-09 00:00:00-05:00 2.65235e-03
5 Y 2013-01-10 00:00:00-05:00 7.56870e-03
6 Y 2013-01-11 00:00:00-05:00 -4.75151e-05
7 Y 2013-01-14 00:00:00-05:00 -9.31105e-04
8 Y 2013-01-15 00:00:00-05:00 1.12803e-03
9 Y 2013-01-16 00:00:00-05:00 1.96973e-04
10 Y 2013-01-17 00:00:00-05:00 5.62706e-03
11 Y 2013-01-18 00:00:00-05:00 3.39749e-03
12 Y 2013-01-22 00:00:00-05:00 4.41833e-03
13 Y 2013-01-23 00:00:00-05:00 1.50634e-03
14 Y 2013-01-24 00:00:00-05:00 6.61466e-06
15 Y 2013-01-25 00:00:00-05:00 5.43071e-03
16 Y 2013-01-28 00:00:00-05:00 -1.85133e-03
17 Y 2013-01-29 00:00:00-05:00 5.09300e-03
18 Y 2013-01-30 00:00:00-05:00 -3.90724e-03
19 Y 2013-01-31 00:00:00-05:00 -2.56659e-03
20 Y 2013-02-01 00:00:00-05:00 1.00025e-02
Discovered I had a fencepost problem & had one more item in each dataframe for SM. Data samples above are correct... when I re-run Sample 2 in SM:
Copy code
ARIMA(2,0,2) with non-zero mean:inf
ARIMA(0,0,0) with non-zero mean:-169.66599650902512
ARIMA(1,0,0) with non-zero mean:-168.12558296240783
ARIMA(0,0,1) with non-zero mean:-169.41362566072397
ARIMA(0,0,0) with zero mean    :-168.6170628653839
ARIMA(1,0,1) with non-zero mean:-168.3648565420355
ARIMA(1,0,1) with non-zero mean:-168.3648565420355
Now re-fitting the best model(s) without approximations...

ARIMA(0,0,0) with non-zero mean:-169.66599650902512
• Now the AICs are essentially identical
j
Thanks. So now the remaining issue is fitting the same model twice, right?
w
Yes.. That's the only oddness I'm seeing
Do you happen to know if forecast auto.arima will recalculate the best model without approximations? Or can I disable that feature in SM to speed it up? That also appears to be a difference between the two...
j
The default in R is
approximation = (length(x) > 150 | frequency(x) > 12)
(docs) and our default is
False
.
btw, which autoarima are you using?
statsforecast.models.AutoARIMA
or
statsforecast.arima.AutoARIMA
?
w
Thank you -- I am using
statsforecast.models.AutoARIMA
j
Found a bug in the way we check if we've already tried a specific model, thanks!
w
💥
Fantastic, happy to help!
j
I think it's still a bit slower than R haha but at least we won't be doing extra work now
The approximations thing seems to come from the fact that in R the best arma is null and in our case it isn't (since we return an AIC). That may be a separate bug, I'll investigate it afterwards
I recently found a bug in a transformation of the parameters, so that may be it. We currently do it wrong so maybe the actual transformation returns nulls and since we're doing it wrong we return something
Sorry for the troubles btw, the ARIMA is a complex model haha
w
No doubt, this stuff is not easy! Let me know if there is a branch I can pull from to test any changes. Now I'm looking at aggregate results across many windows of data, and seeing another interesting difference. Across 745 windows assessed,
R
has 145 windows that generate a non-zero ARIMA model (at least one coefficient) and thus a projection.
SM
generates non-zero models for 175 windows. Each window is 21 days long (roughly a month of trading days).
Summary stats `SM`:
Copy code
WindowLength,Tests,Forecasts,PctPositiveTradingDays,PctZeroCoefficientDays,DirectionalAccuracy,MAE,MAPE
21,745,175,42.857142857142854,76.51006711409396,50.857142857142854,0.0059412257594084064,241.0885733851458
Window Length: 21  Elapsed Time: 50.54346585273743  Models: 746  Models/sec: 14.759573515863213
Total Elapsed Time: 50.54346585273743  Models: 746  Models/sec: 14.759573515863213
Summary stats `R`:
Copy code
[1] "WindowLength,Tests,Forecasts,PctPositiveTradingDays,PctZeroCoefficientDays,DirectionalAccuracy,MAE,MAPE"
[1] "21,745,145,42.8571428571429,80.5369127516778,46.2068965517241,0.00653118924401641,262.538965478021"
[1] "Window Length:  21  Elapsed Time:  14.1242530345917  Models:  746  Models/sec:  52.8169523848782"
[1] "Total Elapsed Time:  14.1268820762634  seconds; Models:  746 ; Models/sec:  52.8071230419244"
j
You can install from the
fix-arima-results-idx
branch (PR)
Also, if you're training on several series the StatsForecast class can help with multiprocessing (by setting n_jobs>0 or n_jobs=-1) so that you can parallelize the fit step across series
1
Hey. Were you able to install from that branch? I'm curious about the runtime without running the same model twice