https://github.com/nixtla logo
f

Farzad E

02/22/2023, 9:08 PM
I asked this in the Ray Slack too but it's more relevant here since I'm using neuralforecast : Has anyone experienced issues when using multiple GPUs? When I switch from 1 GPU to more than 1, I get a TuneError complaining about previous errors. The only other error I see is a ValueError that refers to trial_runner.py self._on_training_result(trial, result[_Executor Event.KEY_FUTURE_RESULT]). I should add that this is running in a Jupyter notebook. If I run the .py script, it hangs in there indefinitely but gives no errors to use for troubleshooting.
I opened an issue on Ray's github in case anyone is interested: http://github.com/ray-project/ray/issues/32760
c

Cristian (Nixtla)

02/23/2023, 4:09 PM
We also had some issues when using Tune with multiple GPUs on notebooks. I think they dont allow for "interactive environments". We fixed some bugs when training on multiple GPUs, and it should work now using scripts
f

Farzad E

02/23/2023, 4:17 PM
@Cristian (Nixtla) script didn't work for me either. I converted the same notebook to .py and while it didn't give me an error, it got stuck indefinitely. I tried with multiple different EC2 instances of 2 or 4 GPUs but the result was the same. I wait to see if the Ray's team has any ideas.
c

Cristian (Nixtla)

02/23/2023, 4:19 PM
Sounds good. I will also explore what's the issue
❤️ 1
2 Views