Who would be a good person to help with some GPU b...
# neural-forecast
c
Who would be a good person to help with some GPU bottlenecks we’re running into? We’re testing some scaling laws with respect to parameter count, dataset size, number of time series etc and we seem to be getting only 2-5% usage on the GPU. We think it may be related to the data loaders but that stuff seems a bit deep under the hood of the NeuralForecast core class. Would love to pick someone’s brain who has managed to train TCN, NHiTS, or TimesNet with > 10M parameters and > 1B data points, what infra did you use, how many GPUs etc
c
Hi @Chris Gervais! We are working on improving the library to scale training in multiple GPUs to improve training times (data parallel). We usually observed much more GPU usage, so I wonder if something else might be happening.
c
hmm okay I’ll do a bit more digging and see if I can add more details
have you noticed any bottlenecks on the data loaders when you’re using very large datasets?
c
The current limitation is that the model fits in the GPU, we do not have the capability right now to parallelize the model. We have trained NHITS with almost 100M parameters in one (large) GPU without any issue.
Yes, the dataset needs to fit in the memory. And then each training batch with the windows must fit in the GPU memory.
c
okay thanks. do you mind me asking what the memory footprint of the 100M is? we’re about a tenth that but also using the NHITS backbone
I think we’d expect the worker to crash if we were out of memory so we're probably fine there