something bad happened to cross validation with 1 7 0 gettin Nixtla Community #neural-forecast

something bad happened to cross-validation with 1....

D N

03/28/2024, 5:27 PM

something bad happened to cross-validation with 1.7.0? getting errors KeyError: ‘unique_id’ while with 1.6.4 same code works well

José Morales

03/28/2024, 5:30 PM

Hey. Can you provide the full stacktrace?

D N

03/28/2024, 5:34 PM

Traceback (most recent call last):

File "/usr/local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3805, in get_loc

return self._engine.get_loc(casted_key)

File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc

File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc

File "pandas/_libs/hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item

File "pandas/_libs/hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'unique_id'

José Morales

03/28/2024, 5:35 PM

I mean one that includes neuralforecast code

D N

03/28/2024, 5:36 PM

That’s the only one i’m getting running the same code

D N

03/28/2024, 5:36 PM

Same environment but with 1.6.4 works well

D N

03/28/2024, 5:40 PM

simple nf.fit works fine, problem only appears with nf.cross_validation

D N

03/28/2024, 5:40 PM

Here is exact code:

cv_df = nf.cross_validation(df, n_windows = 3, step_size = 1)

José Morales

03/28/2024, 5:41 PM

Are you providing the id as the index in df?

D N

03/28/2024, 5:41 PM

yes

D N

03/28/2024, 5:42 PM

1009      X 2024-03-21  178.18

1010      X 2024-03-22  166.30

1011      X 2024-03-25  175.24

1012      X 2024-03-26  184.25

D N

03/28/2024, 5:42 PM

output of df.tail()

José Morales

03/28/2024, 5:42 PM

Can you set it as a column?

D N

03/28/2024, 5:42 PM

Wait, you mean do i have personal index? Then - no.

D N

03/28/2024, 5:43 PM

Just getting data from csv file without indexes

José Morales

03/28/2024, 5:43 PM

what are the names of your columns?

D N

03/28/2024, 5:43 PM

df = pd.read_csv(temp_dataset_path)

df = df.reset_index(drop=True)

df['ds'] = <http://pd.to|pd.to>_datetime(df['ds'])

D N

03/28/2024, 5:43 PM

unique_id, ds, y

D N

03/28/2024, 5:44 PM

Same code was working for months, no problem 🙂 with 1.7.0 release this started to happen

D N

03/28/2024, 5:44 PM

But, i still in love with neuralforecast :)))

D N

03/28/2024, 5:45 PM

with same file (same df), nf.fit works fine

José Morales

03/28/2024, 5:45 PM

If you can provide a small reproducible example I can help further, otherwise it's very hard to guess what might be going on. We test several times the cross validation in our CI

D N

03/28/2024, 5:47 PM

Oh, there is not much to provide tbh, literally very basic and simple usage where i do have file csv file with three columns. Loading file via pandas and triggering nf.cross_validation

D N

03/28/2024, 5:47 PM

It really looks strange to me, that’s why i asked here

D N

03/28/2024, 5:49 PM

Here is output of head file.csv

D N

03/28/2024, 5:49 PM

unique_id,ds,y

X,2020-03-18,23.37

X,2020-03-19,30.13

X,2020-03-20,31.8

X,2020-03-23,27.37

X,2020-03-24,34.25

X,2020-03-25,37.13

X,2020-03-26,37.33

X,2020-03-27,32.94

X,2020-03-30,32.75

José Morales

03/28/2024, 5:50 PM

The following works fine for me:

Copy code

from neuralforecast import NeuralForecast
from neuralforecast.models import NHITS
from neuralforecast.utils import AirPassengersDF

nf = NeuralForecast(models=[NHITS(h=12, input_size=24, max_steps=1)], freq='M')
nf.cross_validation(df=AirPassengersDF, n_windows=3, step_size=1)

That's a small example. If you can provide one like that where there's an error I can help further

D N

03/28/2024, 5:50 PM

The only difference is that i’m using Auto models

D N

03/28/2024, 5:51 PM

AutoMLP and AutoNHINTS

D N

03/28/2024, 5:51 PM

The rest is pretty much same

José Morales

03/28/2024, 5:51 PM

If it's pretty much the same can you please just create the example with your modifications? I don't know which arguments you're using

D N

03/28/2024, 5:52 PM

nf = NeuralForecast(

models=[

AutoMLP(h=nf_horizon, config=None, loss=HuberMQLoss(), num_samples=1, backend='optuna', verbose=False),

AutoNBEATS(h=nf_horizon, config=None, loss=HuberMQLoss(), num_samples=1, backend='optuna', verbose=False),

],

freq = nf_freq,

local_scaler_type = "robust"

D N

03/28/2024, 5:52 PM

it was NBEATS, my bad

D N

03/28/2024, 5:56 PM

It’s like something wrong with index… This output somehow feels off:

D N

03/28/2024, 5:56 PM

CV DF:                    ds     cutoff  ...  AutoNBEATS-hi-90       y

unique_id                        ...

X      2024-03-21 2024-03-20  ...        183.480927  178.18

X      2024-03-22 2024-03-20  ...        185.534363  166.30

D N

03/28/2024, 5:56 PM

like the colums order or something, but again, nf.fit works just fine with the same exact df

José Morales

03/28/2024, 5:57 PM

Your example runs fine for me as well, is the error actually coming from neuralforecast or further down the line in your code?

D N

03/28/2024, 5:58 PM

Error comes when nf.cross_validation is executed

D N

03/28/2024, 5:59 PM

I’m literally just rolling back to 1.6.4 and everything works

José Morales

03/28/2024, 6:01 PM

Yeah that doesn't help. I ran this:

Copy code

from neuralforecast import NeuralForecast
from neuralforecast.auto import AutoMLP, AutoNBEATS
from neuralforecast.utils import AirPassengersDF

def config(trial):
    return {
        'input_size': 24,
        'max_steps': 1,
    }

nf = NeuralForecast(
    models=[
        AutoNBEATS(h=12, config=config, backend='optuna', num_samples=1),
        AutoMLP(h=12, config=config, backend='optuna', num_samples=1)
    ],
    freq='M',
)
print(nf.cross_validation(df=AirPassengersDF, n_windows=3, step_size=1))

without any issues. If you can't provide an example like that where there's an error or a full stacktrace (including neuralforecast code) I can't help any further

D N

03/28/2024, 6:02 PM

Ugh, i understand… Really something strange. I really appreciate your time and help. Will go to debug more and let you know. Thank you very much!

José Morales

03/28/2024, 6:04 PM

There's an important change in 1.7.0 with respect to the index (we're deprecating it from the input and output) so if you set

os.environ['NIXTLA_ID_AS_COL'] = '1'

you'll get the id as a column instead of the index in the predict and cross_validation outputs, which might help

D N

03/28/2024, 6:04 PM

Oooooh waaait, where it is written???

D N

03/28/2024, 6:05 PM

This really seems like the issue

José Morales

03/28/2024, 6:05 PM

You should get future warnings if you're not setting it

José Morales

03/28/2024, 6:06 PM

The 1.6.4 behavior should be maintained, but it's now deprecated, so it'll stop working in a couple of versions

José Morales

03/28/2024, 6:07 PM

Oh I'm able to replicate the issue by passing the id as the index, but you said you weren't doing that

D N

03/28/2024, 6:08 PM

It seems there is some weird stuf fhere

D N

03/28/2024, 6:08 PM

The code i sent - i’m not passing id as index, becasue i’m resetting index after loading file

D N

03/28/2024, 6:09 PM

But for some reason it does not work

José Morales

03/28/2024, 6:09 PM

If I change my example to

df=AirPassengersDF.set_index('unique_id')

I get the key error. So you should be able to fix it by doing

df.reset_index()

. We'll fix it on our side as well but in the meantime that should work for you

D N

03/28/2024, 6:09 PM

i am doing that :))))))))

D N

03/28/2024, 6:09 PM

That’s the problem :)))

D N

03/28/2024, 6:10 PM

df = pd.read_csv(temp_dataset_path)

df = df.reset_index(drop=True)

D N

03/28/2024, 6:10 PM

Here is exact code

José Morales

03/28/2024, 6:10 PM

Maybe you saved the df with the index and when you drop it it goes away, so maybe just

df = df.reset_index()

D N

03/28/2024, 6:11 PM

Will try rn

D N

03/28/2024, 6:23 PM

Yeah, so the global changes to index was the issue. Fixed cross_validation, but now getting same problem with this code:

D N

03/28/2024, 6:23 PM

evaluation_df = evaluate(cv_df.loc[:, cv_df.columns != 'cutoff'], metrics=[mae, mse, mape])

evaluation_df['best_model'] = evaluation_df.drop(columns=['metric', 'unique_id']).idxmin(axis=1)

D N

03/28/2024, 6:23 PM

Pretty much going through example step by step :)))

José Morales

03/28/2024, 6:24 PM

I guess you have to reset the index of cv_df as well

D N

03/28/2024, 6:24 PM

Maaaan, we are gonna break nf documentation here together :)))

D N

03/28/2024, 6:31 PM

yup, that works.

D N

03/28/2024, 6:31 PM

You are my best guy in Nixtla officially 🙂

D N

03/28/2024, 6:31 PM

Thank you once again for your time and attention

José Morales

03/28/2024, 6:33 PM

yeah sorry for the troubles. We have this issue to update that tutorial and we'll also fix the CV for the id as index case

D N

03/28/2024, 6:35 PM

Don’t call it troubles. NF is literally the holigrail of forecasting.

D N

03/28/2024, 6:35 PM

Once again thank you very much for your time and attention. It means a lot.

❤️ 1

Scottfree Analytics LLC

03/28/2024, 7:43 PM

Wow, just came over here for this. Was getting the same KeyError: ‘unique_id’. So reset the index right?

José Morales

03/28/2024, 7:47 PM

Yes, just make sure the id is a column. We'll fix this in the next release, which should be next week at the latest

🙌 1

8 Views

Open in Slack

Previous Next