I asked this in the Ray Slack too but it's more re...
# neural-forecast
f
I asked this in the Ray Slack too but it's more relevant here since I'm using neuralforecast : Has anyone experienced issues when using multiple GPUs? When I switch from 1 GPU to more than 1, I get a TuneError complaining about previous errors. The only other error I see is a ValueError that refers to trial_runner.py self._on_training_result(trial, result[_Executor Event.KEY_FUTURE_RESULT]). I should add that this is running in a Jupyter notebook. If I run the .py script, it hangs in there indefinitely but gives no errors to use for troubleshooting.
I opened an issue on Ray's github in case anyone is interested: http://github.com/ray-project/ray/issues/32760
c
We also had some issues when using Tune with multiple GPUs on notebooks. I think they dont allow for "interactive environments". We fixed some bugs when training on multiple GPUs, and it should work now using scripts
f
@Cristian (Nixtla) script didn't work for me either. I converted the same notebook to .py and while it didn't give me an error, it got stuck indefinitely. I tried with multiple different EC2 instances of 2 or 4 GPUs but the result was the same. I wait to see if the Ray's team has any ideas.
c
Sounds good. I will also explore what's the issue
❤️ 1