Hi Cristian <@U03619XRWKD>, I wanted to express m...
# neural-forecast
w
Hi Cristian @Cristian (Nixtla), I wanted to express my gratitude for the nice package and your patience in explaining things in detail in the Slack channel. I have started to work with some financial time series data and have some questions based on my initial trials. Goal: Train the model in an expanding/rolling window fashion and use the retrained model for the foreacast. Data format: 6 financial futures daily returns. They each have ~ 4500 observations. Will introduce additional exogenous Xs later. Retrain frequency: Want it to be flexible like yearly. In this learning stage I am trying to go with daily retraining from nf.cross_validation. Falling back to the traditional expanding window tabular ML setup I had with some other financial data is like for year in range(2005, 2022): cutoff = str(year)+'-01-01' cutofftime = pd.to_datetime(cutoff) cutoff2 = str(year+1)+'-01-01' cutofftime2 = pd.to_datetime(cutoff2) train = MLdata[MLdata.Date < cutofftime] test = MLdata[MLdata.Date >= cutofftime] test = test[test.Date < cutofftime2] Ytrain = train[y_name].values Xtrain = train.iloc[:, x_indexs].values model.fit(Xtrain,Ytrain,eval_set=[(Xtrain,Ytrain)],early_stopping_rounds=20,verbose=False) Xtest = test.iloc[:, x_indexs].values Ytest = test[y_name].values Z = model.predict(Xtest) pred_y = Z # just my naming convention to get the pred_y into the table bigtest_temp = test.copy() bigtest_temp['pred_y'] = pred_y df_list.append(bigtest_temp) bigtest = pd.concat(df_list, ignore_index=True) My actual usage and observation: I am using the MSE loss and directly use the example code from the tutorial as nf = NeuralForecast( models=[ AutoNHITS(h=1, config=config_nhits, loss=MSE(), num_samples=100), AutoLSTM(h=1, config=config_lstm, loss=MSE(), num_samples=100) ], freq='D' ) then I go with a cv_df = nf.cross_validation(Y_df, n_windows=3000) in the hope that this is doing daily retraining and use the new model to produce a forecast for the next day. (1) the freq='D' or 'B' does not work very well with financial market data's Date as the trading days do not match with Biz-day 100% my workaround is assign 'ds' from 1 to num_observation so there is no gap in between. I am curious how the algo was generating forecasts for Monday's data when there is no data on Saturday and Sunday if we use the freq='D' option. Are we using the data forecasted on Friday in a recursive way so the Monday model is actually using y_Friday, predy_Saturday,predy_Sunday, y_Monday to forecast the y on Tuesday? (2) Assuming the nf.cross_validation is behaving as I intended I tried to play with another serial for loop to mimic the process and the speed difference is gigantic with cross_validation being 1000X times faster than my simple for loop. I saw the for loop suggestion based on Cristian and Leonie's discussion about 1 month ago. I somehow want to use the "slow" for loop to match my old ways of doing tabular ML and it also mimic my actual decision process in real life. To further simplify the process I am just using one fixed model I fitted using the entire data and my codes are: # an in-sample model based forecast just to see how things goes days = np.unique(Y_df["ds"]) ndays = len(days) # I will iterate through the days and use the expanding window to predict the next day. # I will conacaste the resulting dataframe and compare with the actual value predictions = [] for i in range(2000,ndays): newdf = Y_df[(Y_df["ds"]<=days[i]) & (Y_df["ds"]>days[i-250])] fcst_df = nf.predict(df = newdf) fcst_df['ds'] = days[i] # drop the index of fcst_df and make it a column fcst_df.reset_index(inplace=True) # fcst_df.head() predictions.append(fcst_df) predictions_df = pd.concat(predictions, ignore_index=True) Can you help me understand why this way is much slower and what I have missed in my initial trials.
c
Hi @windwine! Your support means a lot to us 🙂
cross_validation
is faster because it is not doing daily re-training. In the case of the auto model, it first do hyperparmater selection to select the best configuration. Then it uses the best model to forecast all the 3000 days. You can see it as an efficient batched for loop to make multiple predictions on your test set
If you want to test the model with re-training, then you need to do a for loop, like the one you already implemented
w
Got it and thanks. One more questions is that if I want to use one model for the next 250 days is there a way to mimic the batch prediction as in the cv function? In my trial of doing the for loop to deal with the daily prediction it was extremely slow even without refitting the model. Just like in other RNN or tabular ML I can feed in all the data for a batch prediction.