This message was deleted Nixtla Community #neural-forecast

Join Slack

This message was deleted.

# neural-forecast

Slackbot

03/07/2024, 5:33 PM

This message was deleted.

ProcessRaisedException .txt

José Morales

03/07/2024, 5:34 PM

You can try downgrading the protobuf package as the error message suggests

Shreya Mathur

03/07/2024, 5:47 PM

Thanks, I did that, now I have a different error:

ProcessExitedException process 0 te.txt

José Morales

03/07/2024, 5:48 PM

Did you install it for the whole cluster? Seems like you now have two library paths

/databricks

and

/local_disk

Shreya Mathur

03/07/2024, 6:13 PM

Irrespective of whether I am installing it on the cluster or in the notebook, I'm getting the same error. Is there something else I can do?

José Morales

03/07/2024, 6:14 PM

Are you using multiple gpus? If you're not you can try setting

devices=[0]

in the model constructor

👍 1

Shreya Mathur

03/07/2024, 6:15 PM

Here is the error when I install it in the whole cluster -

José Morales

03/07/2024, 6:17 PM

Also the original error is related to logging. Are you using those logs? If you're not you can set logger=False in the model constructor and that should fix it

Shreya Mathur

03/07/2024, 6:25 PM

accelerator and devices are parameters for NHITS I suppose, I'm using AutoNHITS, I added parameter gpus=1, But I still get the same error. How can I set logger=False, can you share the code. Thanks.

Copy code

models = AutoNHITS(h=horizon,
                    config=nhits_config,
                    loss=HuberMQLoss(quantiles=quantiles),    # Robust Huber Loss
                    valid_loss=MQLoss(quantiles=quantiles),   # Validation signal
                    search_alg=optuna.samplers.TPESampler(),
                    backend='optuna',
                    num_samples=100, 
                    gpus=1,
                    )

José Morales

03/07/2024, 6:27 PM

Oh, for auto models you have to set that in the config, so something like:

Copy code

def nhits_config(trial):
      return {
      "logger": False,
      ...
      }

Shreya Mathur

03/07/2024, 6:42 PM

Got it, thanks I'm not getting the protobuf error when I add logger=False. But I continue to get this error : ProcessExitedException: process 1 terminated with signal SIGSEGV.

José Morales

03/07/2024, 6:47 PM

Can you also add

devices=[0]

in the config? I think it's trying to do multi-gpu training and failing (the

gpus

argument of the auto model is for ray, so not used in this case)

Shreya Mathur

03/07/2024, 7:12 PM

Yes it works, Thankyou so much! For 2 samples it took around 15 mins to run on a gpu cluster. Is there a way I can improve the performance?

José Morales

03/07/2024, 7:15 PM

It's currently only using the driver and one GPU. If you want to parallelize the trials you should use ray instead of optuna. Does that work for you? ray sometimes has issues

Shreya Mathur

03/07/2024, 7:21 PM

Sure I can try with ray, Even if I fix its version, will it still have issues? Is there a code example that I can use to implement ray instead of optuna? Thankyou for all your help!

José Morales

03/07/2024, 7:23 PM

You can follow this tutorial

Shreya Mathur

03/07/2024, 7:25 PM

Thankyou, I think I was using ray earlier, and becasue it had issues had moved to optuna. I'll see if I can move back to ray for a better performance. Thanks.

José Morales

03/07/2024, 7:29 PM

There's some setup you need in order to run ray on databricks. Can you try following this guide: https://docs.databricks.com/en/machine-learning/ray-integration.html?

José Morales

03/07/2024, 7:29 PM

I think it's just the setup_ray_cluster part

17 Views

Open in Slack

Previous Next