sergio lopez
06/10/2024, 3:14 AMScottfree Analytics LLC
06/10/2024, 1:46 PMThe Imperfect Perfectionista
06/11/2024, 1:47 AMmodel = StatsForecast(models=[ ADIDA(),
CrostonClassic(),
IMAPA(),
TSB(alpha_d=0.2, alpha_p=0.2)], freq='W', n_jobs=-1)
The training data contains the unique_id
, and y
columns as usual, but the ds
column contains consecutive Mondays, as I want my predictions to happen weekly, on Mondays. Basically, every Monday morning, I want to forecast for that Monday, and Tue, Wed, Thu, Fri, Sat and Sun.
But when I call the predict
method after model training, it still gives me the sales forecasts on Sundays.
Is there any parameter to change this default setting?The Imperfect Perfectionista
06/11/2024, 11:02 AMunique_id
, and y
where ds
gives daily data, what's the Nixtla recommended way to convert it into monthly data if I want to predict every month's sale for each unique_id
? I can apply apply normal dataframe groupby sum for month, year and unique_id
, but wondering if Nixla provides its own API for this kind of operation? Also, the models understand freq='M'
right?Ml Club
06/12/2024, 8:16 AMvirgilio espina
06/13/2024, 5:35 AMLuis Enrique Patiño
06/13/2024, 9:32 PMvirgilio espina
06/14/2024, 8:47 AMvirgilio espina
06/15/2024, 8:18 AMGR
06/16/2024, 12:45 PMGR
06/16/2024, 12:46 PMStephen Cox
06/16/2024, 11:48 PMWTL
06/18/2024, 3:28 PMR
and StatsForecast
but I have a question about performance. In brief, I'm seeing performance that is about 3-4x slower than R
when applying AutoARIMA across data windows of varying sizes. This pseudocode:
model = AutoARIMA(method="CSS-ML")
sf = StatsForecast(models=[model]l, freq="B")
for dataset in datasets:
model_start_time = time.time()
sf.fit(df=dataset)
print(f"Model Fit Time: {(time.time() - model_start_time) * 1000}msec")
... gives me approximately 75 msec per loop iteration (call to sf.fit
).
This is not bad, per se, but R
running the forecast
auto.arima
model completes the same task in 20 msec.
What surprises me is that I see the same performance of 75 msec per iteration if I initialize sf
within the loop like this:
for dataset in datasets:
model_start_time = time.time()
model = AutoARIMA(method="CSS-ML")
sf = StatsForecast(models=[model]l, freq="B")
sf.fit(df=dataset)
print(f"Model Fit Time: {(time.time() - model_start_time) * 1000}msec")
I was expecting to see a slowdown due to re-initializing the StatsForecast model and the initial Numba compilation step happening each time, as compared to just once.
Because I don't see any evidence of improvement due to Numba JIT compilation, I am wondering... am I missing something obvious that would make StatsForecast faster (ideally) or at least as fast a R
?WTL
06/19/2024, 11:41 PMWTL
06/19/2024, 11:42 PMWTL
06/19/2024, 11:42 PMMateo De La Roche
06/20/2024, 11:02 PMreinier
06/24/2024, 3:09 PMFaridun Mamadbekov
06/27/2024, 12:33 PMfit()
and training with cross_validation(refit=True, ..)
? In other words is it acceptable to run cross_validation(refit=True, ..)
instead of fit()
, because with refit=True
, the CV method is internally running fit()
already? I understand, in the case of cross_validation the tail of input data will be split into validation sets depending on the n_windows
, step_size
, etc. The question is more about the equivalence of the fit()
and training done inside cross_validation()
. Is it ok to omit the fit()
and go straight for cross_validation(refit=True)
if the goal is to evaluate a methods performance on data?Kayla Robinson
06/27/2024, 6:53 PMGR
07/02/2024, 4:18 AMAlex Niemi
07/08/2024, 6:30 PMBharath Vishal G
07/15/2024, 3:02 PMAfiq Johari
07/16/2024, 9:09 AMAfiq Johari
07/17/2024, 11:57 AMXubin Lou
07/18/2024, 4:15 PMJacob Levy Abitbol
07/19/2024, 11:03 AMRicardo Barros Lourenço
07/19/2024, 4:50 PMSarim Zafar
07/20/2024, 12:33 PMYu Shen
07/25/2024, 5:39 AM