Hi! I have a question regarding the `cross_valida...
# neural-forecast
a
Hi! I have a question regarding the
cross_validation
method of
NeuralForecast
Is there a way to parallelize the process on multiple GPUs? For now, it works when running on one GPU, but when putting as input
devices
> 1 (4 in my case), when instantiating my TFT model, I get the following error:
Copy code
[rank1]: Traceback (most recent call last):
[rank1]:   File "/home/jules.bertrand/dp4p-ai--sales-forecasting-ml/src/modelling/training_script.py", line 159, in <module>
[rank1]:     cv_df = nf.cross_validation(
[rank1]:   File "/home/jules.bertrand/miniconda3/envs/adeo-fcst/lib/python3.10/site-packages/neuralforecast/core.py", line 981, in cross_validation
[rank1]:     return self._no_refit_cross_validation(
[rank1]:   File "/home/jules.bertrand/miniconda3/envs/adeo-fcst/lib/python3.10/site-packages/neuralforecast/core.py", line 863, in _no_refit_cross_validation
[rank1]:     model_fcsts = model.predict(
[rank1]:   File "/home/jules.bertrand/miniconda3/envs/adeo-fcst/lib/python3.10/site-packages/neuralforecast/common/_base_windows.py", line 686, in predict
[rank1]:     fcsts = trainer.predict(self, datamodule=datamodule)
[rank1]:   File "/home/jules.bertrand/miniconda3/envs/adeo-fcst/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 864, in predict
[rank1]:     return call._call_and_handle_interrupt(
[rank1]:   File "/home/jules.bertrand/miniconda3/envs/adeo-fcst/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
[rank1]:     return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
[rank1]:   File "/home/jules.bertrand/miniconda3/envs/adeo-fcst/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch
[rank1]:     return function(*args, **kwargs)
[rank1]:   File "/home/jules.bertrand/miniconda3/envs/adeo-fcst/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 903, in _predict_impl
[rank1]:     results = self._run(model, ckpt_path=ckpt_path)
[rank1]:   File "/home/jules.bertrand/miniconda3/envs/adeo-fcst/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 943, in _run
[rank1]:     self.strategy.setup_environment()
[rank1]:   File "/home/jules.bertrand/miniconda3/envs/adeo-fcst/lib/python3.10/site-packages/pytorch_lightning/strategies/ddp.py", line 153, in setup_environment
[rank1]:     super().setup_environment()
[rank1]:   File "/home/jules.bertrand/miniconda3/envs/adeo-fcst/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 129, in setup_environment
[rank1]:     self.accelerator.setup_device(self.root_device)
[rank1]:   File "/home/jules.bertrand/miniconda3/envs/adeo-fcst/lib/python3.10/site-packages/pytorch_lightning/strategies/ddp.py", line 119, in root_device
[rank1]:     return self.parallel_devices[self.local_rank]
[rank1]: IndexError: list index out of range
[rank: 1] Child process with PID 56954 terminated with code 1. Forcefully terminating all other processes to avoid zombies 🧟
src/modelling/run_modelling.sh: line 14: 55998 Killed                  python src/modelling/training_script.py --business_unit "$business_unit" --date "$date"
First script failed. Exiting.
Thanks!
j
Hey. Multiple GPUs are mostly useful for training, since the inference is usually pretty fast. My suggestion is to train the model with multiple GPUs, set
nf.models[0].trainer_kwargs.update(dict(max_steps=0 , devices=1))
and call the cross_validation method to get the predictions from the trained model using a single GPU
a
The thing is that I'd like to use the cross_validation method to build a custom hyperparameters search, in order to take into account multiple validation folds instead of one (like in
AutoTFT
). It would mean having the training (I would also like to refit the model for each validation fold) and the inference at the same time.
j
I see we have this https://github.com/Nixtla/neuralforecast/blob/b85b07da66b95366d1766d696995b05f3b6c7392/neuralforecast/common/_base_windows.py#L691-L696 which sets the predict step to use 1 GPU. Can you try providing
accelerator='gpu'
in your config?
a
In the end, after reflecting on the problem, I'm gonna go with 1 validation fold for the HP tuning part. Thus,
AutoTFT
will do the job. Many thanks for your help José!
👍 1