This message was deleted.
# neural-forecast
s
This message was deleted.
c
Hi @Steffen ! This functionality is not yet implemented in the cross validation function. We are working on this functionality and will release it soon 🙂. The current solution to perform cross validation with retraining the model would be to code the for loop directly, with calls to the
fit
and
predict
functions updating the data passed to the
fit
method.
🙌 1
s
Hi @Cristian (Nixtla), thank you for the information. This helps me a lot! Great you are working on this 🙂 I have started working on the expanding window implementation using a for-loop, which of course will involve many rounds of retraining. My intent is to use the 'backtesting' within the hyperparameter optimization process to compute the loss and arrive at a model that generalizes well. When I am using
nf.predict(futr_df = val_set)
, I am essentially making a one-shot for forecast over the entire val set (say 6-weeks), right? While when I use
nf.cross_validation
(say,
nf.cross_validation(df = val_set, step_size= 5, n_windows= 3, use_init_models= True
), I essentially use a trained model, split the val_set in n_windows (e.g., 3), and make rolling-forecasts, in the sense that after the first window has been predicted and evaluated, these observations are added as information (input chunks) to the model, thereafter the second window is predicted using all information up to that point, ... until the the last window. Correct? It highly important for me to understand this, because retraining a model and updating a model with new information over time are two different things and I do not want to maintain a possibly wrong understanding here. Besides, ideally I would run these trials in parallel and use a powerful GPU. Which approach to parallelization would you suggest? For me, the possibilities are not completely clear in this regard. I thought of trying to use the parallelization capabilities of
Optuna
("distributed optimization" using a SQLite or MySQL database to share the results across trials) or using
ray tuner
(with which I am less familiar)? Before spending a lot of time on making either of which (eventually) work, I would highly appreciate your expert advice!
🤔 1
w
@Cristian (Nixtla) @Steffen I am also trying to do similar things using a for loop to deal with the expanding window. Currently I am only checking 1day ahead forecast and retrain the model every half a year. One thing I am not very clear is what would be the "actual training data" used in the CV? As I postulate in the following code, I am expanding the data every half a year and refit the automodel with cv. I use n_window=185 which is slightly larger than half year so I can have "oos" forecast for the entire half a year. Using the first iteration as an example. The data is 2000-01-01 to 2018-12-31. Ignoring the minor difference between 185 and actual half a year, in the cv process I am guessing we are actually using data from 2000-01-01 to 2017-06-30 to build the model and apply the model on the 2017-07-01 to 2017-12-31 data each day so we have a 185 row pred_y df? Thank you in advance. dateobj=[ ['20181201','20190101','20190701'], ['20190601','20190701','20200101'], ['20191201','20200101','20200701']] df_list = [] # need improvement for Unique ID (future works) for dtrw in dateobj: nf = NeuralForecast( models=[ AutoTFT(h=1, config=config_TFT, loss=MQLoss(), num_samples=100,verbose=False), ], freq='D' ) TrainData=ordata[ordata.datadate < dtrw[1]].copy() cv_df = nf.cross_validation(TrainData, n_windows=185) cv_df['GRP']= dtrw[1] df_list.append(cv_df) print('finish one iter') # break bigtest = pd.concat(df_list, ignore_index=True) bigtest.sort_values(['unique_id','ds','GRP'],ascending=False,inplace=True) bigtest.drop_duplicates(subset=['unique_id','ds'],keep='first',inplace=True) bigtest.to_csv(r'%s/test_results_cv.csv'%(outpath))
🤔 1
145 Views