WTL
06/18/2024, 3:28 PMR
and StatsForecast
but I have a question about performance. In brief, I'm seeing performance that is about 3-4x slower than R
when applying AutoARIMA across data windows of varying sizes. This pseudocode:
model = AutoARIMA(method="CSS-ML")
sf = StatsForecast(models=[model]l, freq="B")
for dataset in datasets:
model_start_time = time.time()
sf.fit(df=dataset)
print(f"Model Fit Time: {(time.time() - model_start_time) * 1000}msec")
... gives me approximately 75 msec per loop iteration (call to sf.fit
).
This is not bad, per se, but R
running the forecast
auto.arima
model completes the same task in 20 msec.
What surprises me is that I see the same performance of 75 msec per iteration if I initialize sf
within the loop like this:
for dataset in datasets:
model_start_time = time.time()
model = AutoARIMA(method="CSS-ML")
sf = StatsForecast(models=[model]l, freq="B")
sf.fit(df=dataset)
print(f"Model Fit Time: {(time.time() - model_start_time) * 1000}msec")
I was expecting to see a slowdown due to re-initializing the StatsForecast model and the initial Numba compilation step happening each time, as compared to just once.
Because I don't see any evidence of improvement due to Numba JIT compilation, I am wondering... am I missing something obvious that would make StatsForecast faster (ideally) or at least as fast a R
?José Morales
06/18/2024, 4:49 PMWTL
06/18/2024, 5:01 PMWTL
06/18/2024, 5:04 PMallowmean=True
argument speeds things up to about 40 msec per iteration... however that method is required for seeing results that most closely agree with my existing R
code.WTL
06/18/2024, 5:19 PMmodel = AutoARIMA(max_p=5, max_q=5, allowmean=True, max_order=6, stationary=True, seasonal=False, method="CSS-ML")
for dataset in datasets:
model_start_time = time.time()
fit = model.fit(nixdata['y'].values)
print(f"Model Fit Time: {(time.time() - model_start_time) * 1000}msec")
If that looks correct, my best performance is improved to 14.49 models / sec, or about 69 msec per model. Good new is StatusForecast does not introduce much overhead... bad news is I'm not seeing performance boost relative to R that we were hoping for.José Morales
06/18/2024, 5:40 PMtrace=True/TRUE
can you see how many models are being trained? Sometimes the optimization chooses a different path and ends up training more models in our implementationWTL
06/18/2024, 6:10 PMARIMA(2,0,2) with non-zero mean:inf
ARIMA(0,0,0) with non-zero mean:-119.13684223813658
ARIMA(1,0,0) with non-zero mean:-116.39700069281186
ARIMA(0,0,1) with non-zero mean:-116.25848571876172
ARIMA(0,0,0) with zero mean :-120.84459834669217
ARIMA(1,0,1) with non-zero mean:-114.06636833061931
ARIMA(1,0,1) with non-zero mean:-114.06636833061931
Now re-fitting the best model(s) without approximations...
ARIMA(0,0,0) with zero mean :-120.84459834669217
Sample 1 R:
ARIMA(2,0,2) with non-zero mean : Inf
ARIMA(0,0,0) with non-zero mean : -119.1368
ARIMA(1,0,0) with non-zero mean : -116.397
ARIMA(0,0,1) with non-zero mean : -116.3999
ARIMA(0,0,0) with zero mean : -120.8446
ARIMA(1,0,1) with non-zero mean : Inf
Best model: ARIMA(0,0,0) with zero mean
• Interesting that ARIMA(1,0,1) with non-zero mean is calculated twice by SM, and both times gets -118; whereas R gets Inf and only calculates once
• SM recalculates best model "without approximations", perhaps R is not doing that?WTL
06/18/2024, 6:23 PMARIMA(2,0,2) with non-zero mean:inf
ARIMA(0,0,0) with non-zero mean:-169.66599650902512
ARIMA(1,0,0) with non-zero mean:-168.12558296240783
ARIMA(0,0,1) with non-zero mean:-169.41362566072397
ARIMA(0,0,0) with zero mean :-168.6170628653839
ARIMA(1,0,1) with non-zero mean:-168.3648565420355
ARIMA(1,0,1) with non-zero mean:-168.3648565420355
Now re-fitting the best model(s) without approximations...
ARIMA(0,0,0) with non-zero mean:-169.66599650902512
Sample 2 R:
ARIMA(2,0,2) with non-zero mean : Inf
ARIMA(0,0,0) with non-zero mean : -169.666
ARIMA(1,0,0) with non-zero mean : -168.1256
ARIMA(0,0,1) with non-zero mean : -169.4986
ARIMA(0,0,0) with zero mean : -168.6171
ARIMA(1,0,1) with non-zero mean : Inf
Best model: ARIMA(0,0,0) with non-zero mean
• Occasionally R converges in fewer iterations
• R gets more InfsJosé Morales
06/18/2024, 6:25 PMWTL
06/18/2024, 6:26 PMWTL
06/18/2024, 6:28 PMY
2013-01-03 -2.087795e-03
2013-01-04 4.853300e-03
2013-01-07 -3.128003e-03
2013-01-08 -3.247639e-03
2013-01-09 2.652345e-03
2013-01-10 7.568700e-03
2013-01-11 -4.751511e-05
2013-01-14 -9.311049e-04
2013-01-15 1.128033e-03
2013-01-16 1.969725e-04
2013-01-17 5.627061e-03
2013-01-18 3.397492e-03
2013-01-22 4.418332e-03
2013-01-23 1.506342e-03
2013-01-24 6.614662e-06
2013-01-25 5.430709e-03
2013-01-28 -1.851334e-03
2013-01-29 5.093004e-03
2013-01-30 -3.907245e-03
2013-01-31 -2.566592e-03
2013-02-01 1.000251e-02
WTL
06/18/2024, 6:35 PM0 Y 2013-01-03 00:00:00-05:00 -2.08780e-03
1 Y 2013-01-04 00:00:00-05:00 4.85330e-03
2 Y 2013-01-07 00:00:00-05:00 -3.12800e-03
3 Y 2013-01-08 00:00:00-05:00 -3.24764e-03
4 Y 2013-01-09 00:00:00-05:00 2.65235e-03
5 Y 2013-01-10 00:00:00-05:00 7.56870e-03
6 Y 2013-01-11 00:00:00-05:00 -4.75151e-05
7 Y 2013-01-14 00:00:00-05:00 -9.31105e-04
8 Y 2013-01-15 00:00:00-05:00 1.12803e-03
9 Y 2013-01-16 00:00:00-05:00 1.96973e-04
10 Y 2013-01-17 00:00:00-05:00 5.62706e-03
11 Y 2013-01-18 00:00:00-05:00 3.39749e-03
12 Y 2013-01-22 00:00:00-05:00 4.41833e-03
13 Y 2013-01-23 00:00:00-05:00 1.50634e-03
14 Y 2013-01-24 00:00:00-05:00 6.61466e-06
15 Y 2013-01-25 00:00:00-05:00 5.43071e-03
16 Y 2013-01-28 00:00:00-05:00 -1.85133e-03
17 Y 2013-01-29 00:00:00-05:00 5.09300e-03
18 Y 2013-01-30 00:00:00-05:00 -3.90724e-03
19 Y 2013-01-31 00:00:00-05:00 -2.56659e-03
20 Y 2013-02-01 00:00:00-05:00 1.00025e-02
WTL
06/18/2024, 6:37 PMARIMA(2,0,2) with non-zero mean:inf
ARIMA(0,0,0) with non-zero mean:-169.66599650902512
ARIMA(1,0,0) with non-zero mean:-168.12558296240783
ARIMA(0,0,1) with non-zero mean:-169.41362566072397
ARIMA(0,0,0) with zero mean :-168.6170628653839
ARIMA(1,0,1) with non-zero mean:-168.3648565420355
ARIMA(1,0,1) with non-zero mean:-168.3648565420355
Now re-fitting the best model(s) without approximations...
ARIMA(0,0,0) with non-zero mean:-169.66599650902512
• Now the AICs are essentially identicalJosé Morales
06/18/2024, 6:44 PMWTL
06/18/2024, 6:47 PMWTL
06/18/2024, 6:48 PMJosé Morales
06/18/2024, 6:51 PMapproximation = (length(x) > 150 | frequency(x) > 12)
(docs) and our default is False
.José Morales
06/18/2024, 6:54 PMstatsforecast.models.AutoARIMA
or statsforecast.arima.AutoARIMA
?WTL
06/18/2024, 7:05 PMstatsforecast.models.AutoARIMA
José Morales
06/18/2024, 7:51 PMWTL
06/18/2024, 7:52 PMWTL
06/18/2024, 7:52 PMJosé Morales
06/18/2024, 7:54 PMJosé Morales
06/18/2024, 7:56 PMJosé Morales
06/18/2024, 7:58 PMJosé Morales
06/18/2024, 7:58 PMWTL
06/18/2024, 9:24 PMR
has 145 windows that generate a non-zero ARIMA model (at least one coefficient) and thus a projection.
SM
generates non-zero models for 175 windows.
Each window is 21 days long (roughly a month of trading days).WTL
06/18/2024, 9:27 PMWindowLength,Tests,Forecasts,PctPositiveTradingDays,PctZeroCoefficientDays,DirectionalAccuracy,MAE,MAPE
21,745,175,42.857142857142854,76.51006711409396,50.857142857142854,0.0059412257594084064,241.0885733851458
Window Length: 21 Elapsed Time: 50.54346585273743 Models: 746 Models/sec: 14.759573515863213
Total Elapsed Time: 50.54346585273743 Models: 746 Models/sec: 14.759573515863213
Summary stats `R`:
[1] "WindowLength,Tests,Forecasts,PctPositiveTradingDays,PctZeroCoefficientDays,DirectionalAccuracy,MAE,MAPE"
[1] "21,745,145,42.8571428571429,80.5369127516778,46.2068965517241,0.00653118924401641,262.538965478021"
[1] "Window Length: 21 Elapsed Time: 14.1242530345917 Models: 746 Models/sec: 52.8169523848782"
[1] "Total Elapsed Time: 14.1268820762634 seconds; Models: 746 ; Models/sec: 52.8071230419244"
José Morales
06/18/2024, 10:22 PMfix-arima-results-idx
branch (PR)José Morales
06/18/2024, 10:24 PMJosé Morales
06/20/2024, 5:02 PM