I'm interested in assessing forecast prediction in...
# general
b
I'm interested in assessing forecast prediction intervals. Based on my reading of Hydman's FPP3, I think crps is probably the correct metric to use for my objective. I found this UtilsForecast page and I'm trying to figure out how to use the scaled_crps. I was initially thinking I could drop it in like other loss functions to the following function to assess on the data from the cross_validation function:
Copy code
def evaluate_cross_validation_metric(df, metric):
    models = df.drop(columns=['unique_id', 'ds', 'cutoff', 'y']).columns.tolist()
    evals = []
    # Calculate loss for every unique_id and cutoff.    
    for cutoff in df['cutoff'].unique():
        eval_ = evaluate(df[df['cutoff'] == cutoff], metrics=[metric], models=models)
        evals.append(eval_)
    evals = pd.concat(evals)
    evals = evals.groupby('unique_id').mean(numeric_only=True) # Averages the error metrics for all cutoffs for every combination of model and unique_id
    evals['best_metric'] = evals.min(axis=1)
    return evals

evaluation_crps = evaluate_cross_validation_rmse(All_crossvalidation_df4, scaled_crps).reset_index()
However, it fails. I also tried:
Copy code
models=['ARIMA','ETS']

scaled_crps(All_crossvalidation_df4, models=models, target_col='y', id_col='unique_id', quantiles=[0.05,0.95])
But, for that I get the error
Copy code
ValueError: cannot reindex on an axis with duplicate labels
Any suggestions?
j
Hey. Do you have the id as the index?
b
Hey @José Morales, when I set the unique_id to index it gives me the error:
Copy code
KeyError: 'unique_id'
j
The second approach is the correct one. Can you include a larger stack trace?
b
Copy code
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File C:\ProgramData\miniconda3\envs\py310env\lib\site-packages\pandas\core\indexes\base.py:3790, in Index.get_loc(self, key)
   3789 try:
-> 3790     return self._engine.get_loc(casted_key)
   3791 except KeyError as err:

File index.pyx:152, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:181, in pandas._libs.index.IndexEngine.get_loc()

File pandas\_libs\hashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas\_libs\hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'unique_id'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[104], line 5
      1 All_crossvalidation_df5 = All_crossvalidation_df4.copy().set_index('unique_id')
      3 models=['ARIMA','ETS']
----> 5 scaled_crps(All_crossvalidation_df5, models=models, target_col='y', id_col='unique_id', quantiles=[0.05,0.95])

File C:\ProgramData\miniconda3\envs\py310env\lib\site-packages\utilsforecast\losses.py:638, in scaled_crps(df, models, quantiles, id_col, target_col)
    636 eps = np.finfo(float).eps
    637 quantiles = np.asarray(quantiles)
--> 638 loss = mqloss(df, models, quantiles, id_col, target_col)
    639 if isinstance(loss, pd.DataFrame):
    640     loss = loss.set_index(id_col)

File C:\ProgramData\miniconda3\envs\py310env\lib\site-packages\utilsforecast\losses.py:474, in mqloss(df, models, quantiles, id_col, target_col)
    472 result = type(df)({model: loss})
    473 if isinstance(result, pd.DataFrame):
--> 474     result = result.groupby(df[id_col], observed=True).mean()
    475 else:
    476     try:

File C:\ProgramData\miniconda3\envs\py310env\lib\site-packages\pandas\core\frame.py:3896, in DataFrame.__getitem__(self, key)
   3894 if self.columns.nlevels > 1:
   3895     return self._getitem_multilevel(key)
-> 3896 indexer = self.columns.get_loc(key)
   3897 if is_integer(indexer):
   3898     indexer = [indexer]

File C:\ProgramData\miniconda3\envs\py310env\lib\site-packages\pandas\core\indexes\base.py:3797, in Index.get_loc(self, key)
   3792     if isinstance(casted_key, slice) or (
   3793         isinstance(casted_key, abc.Iterable)
   3794         and any(isinstance(x, slice) for x in casted_key)
   3795     ):
   3796         raise InvalidIndexError(key)
-> 3797     raise KeyError(key) from err
   3798 except TypeError:
   3799     # If we have a listlike key, _check_indexing_error will raise
   3800     #  InvalidIndexError. Otherwise we fall through and re-raise
   3801     #  the TypeError.
   3802     self._check_indexing_error(key)

KeyError: 'unique_id'
j
Passing the id as a column
b
I've tried:
Copy code
All_crossvalidation_df5 = All_crossvalidation_df4.copy().set_index('unique_id')

models=['ARIMA','ETS']

scaled_crps(All_crossvalidation_df5, models=models, target_col='y', id_col='unique_id', quantiles=[0.05,0.95])
and also updated last line to
Copy code
scaled_crps(All_crossvalidation_df5, models=models, target_col='y', quantiles=[0.05,0.95])
But, either way I get:
Copy code
KeyError: 'unique_id'
What am I missing?
j
Ahh I think it's because it's a cv result, thus the error about the duplicate axis. How do you want to aggregate it? By fold first and then across folds?
b
Yes, I think that's the right way to do it.
j
Here's an example on how to get the aggregate like that https://nixtla.mintlify.app/mlforecast/docs/how-to-guides/cross_validation.html#evaluate-results. We don't support multiple agg columns at the moment so you can create one for each unique ID and fold and then take the mean of that
b
Thank you @José Morales. I had used some other tutorials from the Nixla pages. I haven't had time to dig into this deeply just yet. Will give it a try though. Much appreciated.
j
Please let us know if you need further help
👍 1