This message was deleted.
# neural-forecast
s
This message was deleted.
j
You can try downgrading the protobuf package as the error message suggests
s
Thanks, I did that, now I have a different error:
j
Did you install it for the whole cluster? Seems like you now have two library paths
/databricks
and
/local_disk
s
Irrespective of whether I am installing it on the cluster or in the notebook, I'm getting the same error. Is there something else I can do?
j
Are you using multiple gpus? If you're not you can try setting
devices=[0]
in the model constructor
👍 1
s
Here is the error when I install it in the whole cluster -
j
Also the original error is related to logging. Are you using those logs? If you're not you can set logger=False in the model constructor and that should fix it
s
accelerator and devices are parameters for NHITS I suppose, I'm using AutoNHITS, I added parameter gpus=1, But I still get the same error. How can I set logger=False, can you share the code. Thanks.
Copy code
models = AutoNHITS(h=horizon,
                    config=nhits_config,
                    loss=HuberMQLoss(quantiles=quantiles),    # Robust Huber Loss
                    valid_loss=MQLoss(quantiles=quantiles),   # Validation signal
                    search_alg=optuna.samplers.TPESampler(),
                    backend='optuna',
                    num_samples=100, 
                    gpus=1,
                    )
j
Oh, for auto models you have to set that in the config, so something like:
Copy code
def nhits_config(trial):
      return {
      "logger": False,
      ...
      }
s
Got it, thanks I'm not getting the protobuf error when I add logger=False. But I continue to get this error : ProcessExitedException: process 1 terminated with signal SIGSEGV.
j
Can you also add
devices=[0]
in the config? I think it's trying to do multi-gpu training and failing (the
gpus
argument of the auto model is for ray, so not used in this case)
s
Yes it works, Thankyou so much! For 2 samples it took around 15 mins to run on a gpu cluster. Is there a way I can improve the performance?
j
It's currently only using the driver and one GPU. If you want to parallelize the trials you should use ray instead of optuna. Does that work for you? ray sometimes has issues
s
Sure I can try with ray, Even if I fix its version, will it still have issues? Is there a code example that I can use to implement ray instead of optuna? Thankyou for all your help!
j
You can follow this tutorial
s
Thankyou, I think I was using ray earlier, and becasue it had issues had moved to optuna. I'll see if I can move back to ray for a better performance. Thanks.
j
There's some setup you need in order to run ray on databricks. Can you try following this guide: https://docs.databricks.com/en/machine-learning/ray-integration.html?
I think it's just the setup_ray_cluster part