I have a neuralforecast script that runs fine till...
# neural-forecast
i
I have a neuralforecast script that runs fine till prediction time where I just get a
Killed
message. Any idea what could be causing it? Is it moving to much data to RAM?
@Marco any ideas? I've been trying to get this script working for a few weeks to no avail.
k
I think we have a tough memory leakage in the inference code @Cristian (Nixtla) @Isaac , @Marco An intuition I have is that we are using pytorch lightning's trainer class in an unintended way. And using it inside of the models' through their fit and predict methods. My belief is that if we switch from having the Model.fit to be calling the trainer.fit, and the Model.predict to be calling the trainer.predict we might be able to solve this. The overall problem is that, we might have a crazy recursion: Trainer(Model(Trainer))
c
@José Morales
Thanks for your insights @Kin Gtz. Olivares. What you are mentioning should only be applicable for multi-GPU. Is this your case Isaac? If you are using one GPU/CPU the more likely cause is running out of memory. Isaac have you tried reducing
inference_windows_batch_size
? For the multi-GPU case we will release a new feature next week for optimized distributed training with Spark.
k
You can also reduce the
validation_batch_size
@Isaac That way you maintain your memory constrained
Another thing @Cristian (Nixtla), we might need to add
Copy code
fcsts = fcts.detach().cpu()
fcsts = torch.vstack(fcsts).numpy().flatten()
https://github.com/Nixtla/neuralforecast/blob/main/neuralforecast/common/_base_windows.py#L720 The detach operation blocks the gradients, and has solved a memory leakage for me in the past