Hello! First of all, thanks for supporting open source, your library is awesome.
Secondly, I am facing a GPU out of memory issue, I will describe the issue and would appreciate any help. I am trying to prevent the need for GPU scaling.
Setup:
• I am using nf module with 8 automodels on initialization (almost all of the univariate ones with historical exogenous features capabilities according to the forecasting models doc page).
• I am using optuna as backend for hyperparameter optimization and mostly default configs, except for hist_exog, early_stop_patience_steps, input_size, val_check_steps and batch_size, which I provide as fixed values added to default configs.
• I get the 4 ETT full data sets from datasets forecast longhorizon2 and pivot them to use the exogenous features as columns instead of using them in raw format as different time series before feeding it to nf. At least this is what I believe as the correct way to handle historical exogenous features from investigating the docs.
• I also vertically duplicate the size of the dataset adding another time series which is a result from a data augmentation method over the previous original datasets, I add it as a different time series with a separate unique_id.
Running:
• With horizon set to 24 and input_size 72, the 22GB RAM from L4 GPU are enough for running all the models across all the 4 datasets with cross validation exactly as in the docs and that's perfect.
• When using horizon 96 and input size 96, the 22GB RAM from L4 GPU are not enough anymore for ETTm datasets, which are larger than ETTh datasets in number of data points. ETTh datasets still run fine. And it seems it works for some models before crashing (observing nvtop) with probably a larger model such as TFT (not sure what one since nf training logs).
Solving trials:
• First, I tried to reduce the batch size, but it did not help. Since some datasets work and others not, it is most probably related to handling the size of the datasets into GPU memory.
• Second, I followed the "large dataset handling" doc page and preprocessed the datasets generating a parquet in the specified folder structure for each of the 2 unique_id time series within each ETT dataset.
◦ Then I noticed that the cross_validation from nf module is not compatible with files_list as df parameter, only the fit method is.
◦ Then I decided to implement the cross_validation outside from nf module using fit and predict methods. I generated prediction windows for the test dataset (previously separated from train and validation large dataset), and provided it to nf.predict as a full df and not as files_list as I understood from the docs. But processing them sequentially seems to take a lot of time, since I am using step_size value as 1 and the test data set has 2000 data points.
◦ So I needed a way to process the window using the GPU, but the nf.predict method does not provide step_size parameter, so one solution I found is to iterate nf.models and run model.predict with the step_size and test_size (apparently unused parameter inside model.predict) parameters specified, after all it is needed to group results in order to evaluate.
• Third, inspired by the previous experience of "large dataset handling", I decided to use the large dataset mode to nf.fit the models and then run the nf.cross_validation with the previously separated smaller test set only, instead of the full dataset, this way I could hope the GPU memory used during cross_validation would decrease.
◦ But the problem is that, as of my understanding, the nf.cross_validation forces the models to be fit again both using the refit parameter as false (once) and as true (once for each window). This would be some kind of transfer learning since I already used the nf.fit before, what makes me think that a flag for preventing any additional training if the internal variable _fitted is True could be a solution. Maybe I could try to develop it and submit a PR.
Am I using the library as it is intended to? What is the suggested approach in this case? Is it expected to face these GPU memory issues since I see on literature that horizons of up to around 800 are common in long horizon datasets, and the ETT datasets are the ones with the least amount of possible historical exogenous features.