something bad happened to cross-validation with 1....
# neural-forecast
d
something bad happened to cross-validation with 1.7.0? getting errors KeyError: ‘unique_id’ while with 1.6.4 same code works well
j
Hey. Can you provide the full stacktrace?
d
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3805, in get_loc
return self._engine.get_loc(casted_key)
File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'unique_id'
j
I mean one that includes neuralforecast code
d
That’s the only one i’m getting running the same code
Same environment but with 1.6.4 works well
simple nf.fit works fine, problem only appears with nf.cross_validation
Here is exact code:
cv_df = nf.cross_validation(df, n_windows = 3, step_size = 1)
j
Are you providing the id as the index in df?
d
yes
1009      X 2024-03-21  178.18
1010      X 2024-03-22  166.30
1011      X 2024-03-25  175.24
1012      X 2024-03-26  184.25
output of df.tail()
j
Can you set it as a column?
d
Wait, you mean do i have personal index? Then - no.
Just getting data from csv file without indexes
j
what are the names of your columns?
d
df = pd.read_csv(temp_dataset_path)
df = df.reset_index(drop=True)
df['ds'] = <http://pd.to|pd.to>_datetime(df['ds'])
unique_id, ds, y
Same code was working for months, no problem 🙂 with 1.7.0 release this started to happen
But, i still in love with neuralforecast :)))
with same file (same df), nf.fit works fine
j
If you can provide a small reproducible example I can help further, otherwise it's very hard to guess what might be going on. We test several times the cross validation in our CI
d
Oh, there is not much to provide tbh, literally very basic and simple usage where i do have file csv file with three columns. Loading file via pandas and triggering nf.cross_validation
It really looks strange to me, that’s why i asked here
Here is output of head file.csv
unique_id,ds,y
X,2020-03-18,23.37
X,2020-03-19,30.13
X,2020-03-20,31.8
X,2020-03-23,27.37
X,2020-03-24,34.25
X,2020-03-25,37.13
X,2020-03-26,37.33
X,2020-03-27,32.94
X,2020-03-30,32.75
j
The following works fine for me:
Copy code
from neuralforecast import NeuralForecast
from neuralforecast.models import NHITS
from neuralforecast.utils import AirPassengersDF

nf = NeuralForecast(models=[NHITS(h=12, input_size=24, max_steps=1)], freq='M')
nf.cross_validation(df=AirPassengersDF, n_windows=3, step_size=1)
That's a small example. If you can provide one like that where there's an error I can help further
d
The only difference is that i’m using Auto models
AutoMLP and AutoNHINTS
The rest is pretty much same
j
If it's pretty much the same can you please just create the example with your modifications? I don't know which arguments you're using
d
nf = NeuralForecast(
models=[
AutoMLP(h=nf_horizon, config=None, loss=HuberMQLoss(), num_samples=1, backend='optuna', verbose=False),
AutoNBEATS(h=nf_horizon, config=None, loss=HuberMQLoss(), num_samples=1, backend='optuna', verbose=False),
],
freq = nf_freq,
local_scaler_type = "robust"
)
it was NBEATS, my bad
It’s like something wrong with index… This output somehow feels off:
CV DF:                    ds     cutoff  ...  AutoNBEATS-hi-90       y
unique_id                        ...
X      2024-03-21 2024-03-20  ...        183.480927  178.18
X      2024-03-22 2024-03-20  ...        185.534363  166.30
like the colums order or something, but again, nf.fit works just fine with the same exact df
j
Your example runs fine for me as well, is the error actually coming from neuralforecast or further down the line in your code?
d
Error comes when nf.cross_validation is executed
I’m literally just rolling back to 1.6.4 and everything works
j
Yeah that doesn't help. I ran this:
Copy code
from neuralforecast import NeuralForecast
from neuralforecast.auto import AutoMLP, AutoNBEATS
from neuralforecast.utils import AirPassengersDF

def config(trial):
    return {
        'input_size': 24,
        'max_steps': 1,
    }

nf = NeuralForecast(
    models=[
        AutoNBEATS(h=12, config=config, backend='optuna', num_samples=1),
        AutoMLP(h=12, config=config, backend='optuna', num_samples=1)
    ],
    freq='M',
)
print(nf.cross_validation(df=AirPassengersDF, n_windows=3, step_size=1))
without any issues. If you can't provide an example like that where there's an error or a full stacktrace (including neuralforecast code) I can't help any further
d
Ugh, i understand… Really something strange. I really appreciate your time and help. Will go to debug more and let you know. Thank you very much!
j
There's an important change in 1.7.0 with respect to the index (we're deprecating it from the input and output) so if you set
os.environ['NIXTLA_ID_AS_COL'] = '1'
you'll get the id as a column instead of the index in the predict and cross_validation outputs, which might help
d
Oooooh waaait, where it is written???
This really seems like the issue
j
You should get future warnings if you're not setting it
The 1.6.4 behavior should be maintained, but it's now deprecated, so it'll stop working in a couple of versions
Oh I'm able to replicate the issue by passing the id as the index, but you said you weren't doing that
d
It seems there is some weird stuf fhere
The code i sent - i’m not passing id as index, becasue i’m resetting index after loading file
But for some reason it does not work
j
If I change my example to
df=AirPassengersDF.set_index('unique_id')
I get the key error. So you should be able to fix it by doing
df.reset_index()
. We'll fix it on our side as well but in the meantime that should work for you
d
i am doing that :))))))))
That’s the problem :)))
df = pd.read_csv(temp_dataset_path)
df = df.reset_index(drop=True)
Here is exact code
j
Maybe you saved the df with the index and when you drop it it goes away, so maybe just
df = df.reset_index()
?
d
Will try rn
Yeah, so the global changes to index was the issue. Fixed cross_validation, but now getting same problem with this code:
evaluation_df = evaluate(cv_df.loc[:, cv_df.columns != 'cutoff'], metrics=[mae, mse, mape])
evaluation_df['best_model'] = evaluation_df.drop(columns=['metric', 'unique_id']).idxmin(axis=1)
Pretty much going through example step by step :)))
j
I guess you have to reset the index of cv_df as well
d
Maaaan, we are gonna break nf documentation here together :)))
yup, that works.
You are my best guy in Nixtla officially 🙂
Thank you once again for your time and attention
j
yeah sorry for the troubles. We have this issue to update that tutorial and we'll also fix the CV for the id as index case
d
Don’t call it troubles. NF is literally the holigrail of forecasting.
Once again thank you very much for your time and attention. It means a lot.
❤️ 1
s
Wow, just came over here for this. Was getting the same KeyError: ‘unique_id’. So reset the index right?
j
Yes, just make sure the id is a column. We'll fix this in the next release, which should be next week at the latest
🙌 1