Hi channel I m currently facing two errors sometimes I run t Nixtla Community #mlforecast

Hi channel, I'm currently facing two errors (somet...

Dinis Timoteo

04/16/2024, 3:44 PM

Hi channel, I'm currently facing two errors (sometimes I run the cell i get one, another times I get the other): The first picture is my instantiation of the MLForecast plus the model. The second picture is my most common error. The third picture is another that sometimes comes instead of the second one. NOTE: these errors only happen when I give the parameter max_horizon = any int(>2), if i dont give it, or if i give only 2 it runs without errors. Did anyone faced the same issue? PS: I tried running the one-model-per-step tutorial, and gives exactly the same error.

José Morales

04/16/2024, 4:28 PM

What do you get if you run

train[id_col].value_counts().min()

Dinis Timoteo

04/16/2024, 4:29 PM

José Morales

04/16/2024, 4:30 PM

And

train[id_col].value_counts().max()

Dinis Timoteo

04/16/2024, 4:30 PM

Copy code

Dinis Timoteo

04/16/2024, 4:31 PM

which are zeros.. since the data is very sparsed

José Morales

04/16/2024, 4:31 PM

I think the problem may be the max_horizon. It needs to train a model to predict 6 steps ahead but you have series with only one sample. Do you get the same error if you remove it?

Dinis Timoteo

04/16/2024, 4:32 PM

no.. only when i apply max horizon

Dinis Timoteo

04/16/2024, 4:32 PM

sorry way... i replied wrong to you

Dinis Timoteo

04/16/2024, 4:33 PM

train['unique_id'].value_counts().min() = 113 train['unique_id'].value_counts().max() = 113

Dinis Timoteo

04/16/2024, 4:34 PM

the multivariate have the same steps across

José Morales

04/16/2024, 4:37 PM

And what's the max lag that you're using? Since you're setting

dropna=True

it's possible that it's dropping several samples in each serie

Dinis Timoteo

04/16/2024, 4:38 PM

LAGS = [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111]

José Morales

04/16/2024, 4:38 PM

Yeah so you'd end up with 2 samples in each serie and you need at least 6. Can you try setting

dropna=False

or reducing the maximum lag?

Dinis Timoteo

04/16/2024, 4:41 PM

LAGS = [ 1, 2, 3, 4, 5, 6] dropna=False

José Morales

04/16/2024, 4:41 PM

Hmm. Did you define a Dataset object in your notebook?

Dinis Timoteo

04/16/2024, 4:42 PM

Not sure what you mean? I always feed it pandas dfs

José Morales

04/16/2024, 4:43 PM

I mean something like:

Dataset = something

class Dataset:

Dinis Timoteo

04/16/2024, 4:45 PM

no... simply did train = pd.read_csv(...)

José Morales

04/16/2024, 4:46 PM

Can you restart your kernel? It seems something corrupted LightGBM's Dataset

Dinis Timoteo

04/16/2024, 4:48 PM

ok... but just fyi i'm working on 4 different notebooks in 4 different machines. Colab, Sagemaker, Vm, Local machine.

Dinis Timoteo

04/16/2024, 4:50 PM

still same same

Dinis Timoteo

04/16/2024, 4:50 PM

except on colab... colab it run all the way to the end

Dinis Timoteo

04/16/2024, 4:51 PM

But it makes sense the lags was what was causing the issue. Since i only have 113 data points...

José Morales

04/16/2024, 5:05 PM

We'll add a better error message for that case

Dinis Timoteo

04/16/2024, 5:05 PM

Thank you... and thank you for the help 🙌

Dinis Timoteo

04/16/2024, 5:11 PM

could you correct me if I'm wrong in my understanding: lags.max() + max_horizon should be <= df.ds.nunique()

José Morales

04/16/2024, 5:13 PM

Yes, assuming all of your series start and end at the same timestamp, otherwise you should check the sizes

Dinis Timoteo

04/16/2024, 5:14 PM

yes yes.. assuming the dates are equal for all unique_ids

👍 1

Open in Slack

Previous Next