Hi channel, I'm currently facing two errors (somet...
# mlforecast
d
Hi channel, I'm currently facing two errors (sometimes I run the cell i get one, another times I get the other): The first picture is my instantiation of the MLForecast plus the model. The second picture is my most common error. The third picture is another that sometimes comes instead of the second one. NOTE: these errors only happen when I give the parameter max_horizon = any int(>2), if i dont give it, or if i give only 2 it runs without errors. Did anyone faced the same issue? PS: I tried running the one-model-per-step tutorial, and gives exactly the same error.
j
What do you get if you run
train[id_col].value_counts().min()
?
d
1
j
And
train[id_col].value_counts().max()
?
d
Copy code
60033
which are zeros.. since the data is very sparsed
j
I think the problem may be the max_horizon. It needs to train a model to predict 6 steps ahead but you have series with only one sample. Do you get the same error if you remove it?
d
no.. only when i apply max horizon
sorry way... i replied wrong to you
train['unique_id'].value_counts().min() = 113 train['unique_id'].value_counts().max() = 113
the multivariate have the same steps across
j
And what's the max lag that you're using? Since you're setting
dropna=True
it's possible that it's dropping several samples in each serie
d
LAGS = [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111]
j
Yeah so you'd end up with 2 samples in each serie and you need at least 6. Can you try setting
dropna=False
or reducing the maximum lag?
d
LAGS = [ 1, 2, 3, 4, 5, 6] dropna=False
j
Hmm. Did you define a Dataset object in your notebook?
d
Not sure what you mean? I always feed it pandas dfs
j
I mean something like:
Dataset = something
or
class Dataset:
d
no... simply did train = pd.read_csv(...)
j
Can you restart your kernel? It seems something corrupted LightGBM's Dataset
d
ok... but just fyi i'm working on 4 different notebooks in 4 different machines. Colab, Sagemaker, Vm, Local machine.
still same same
except on colab... colab it run all the way to the end
But it makes sense the lags was what was causing the issue. Since i only have 113 data points...
j
We'll add a better error message for that case
d
Thank you... and thank you for the help 🙌
could you correct me if I'm wrong in my understanding: lags.max() + max_horizon should be <= df.ds.nunique()
j
Yes, assuming all of your series start and end at the same timestamp, otherwise you should check the sizes
d
yes yes.. assuming the dates are equal for all unique_ids
👍 1