Hello everyone I m liking a lot the methods and utils of sta Nixtla Community #general

Hello everyone, I'm liking a lot the methods and u...

Ramon Botella Nieto

08/04/2023, 6:23 AM

Hello everyone, I'm liking a lot the methods and utils of statsforecast and and hieararchicalforecast packages. I have a question about an specific utility that I'm finding specially useful:

hierarchicalforecast.utils.aggregate

. I have hierarchy tree quite similar to the one in the Australian tourism data example:

Copy code

spec = [
    ['Country'],
    ['Country', 'State'], 
    ['Country', 'Purpose'], 
    ['Country', 'State', 'Region'], 
    ['Country', 'State', 'Purpose'], 
    ['Country', 'State', 'Region', 'Purpose']
]

In my case, I do not want to make predictions to the bottom layer,

['Country', 'State', 'Region', 'Purpose']

. This bottom level hierarchy is of no interest and adding it to the Y_df dataframe, adds a great number of series to fit that I do not need and that will slow the process. However, when I try to suppress that last element to the list and pass it to the

aggregate

method together with my dataframe, it returns this error:

ValueError: Categorical categories must be unique

The error can be reproduced with this snippet (where I only deleted the last hierarchy list from

spec

in this example):

Copy code

import numpy as np
import pandas as pd
from hierarchicalforecast.utils import aggregate

Y_df = pd.read_csv('<https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/tourism.csv>')
Y_df = Y_df.rename({'Trips': 'y', 'Quarter': 'ds'}, axis=1)
Y_df.insert(0, 'Country', 'Australia')
Y_df = Y_df[['Country', 'Region', 'State', 'Purpose', 'ds', 'y']]
Y_df['ds'] = Y_df['ds'].str.replace(r'(\d+) (Q\d)', r'\1-\2', regex=True)
Y_df['ds'] = pd.to_datetime(Y_df['ds'])
Y_df.head()

spec = [
    ['Country'],
    ['Country', 'State'], 
    ['Country', 'Purpose'], 
    ['Country', 'State', 'Region'], 
    ['Country', 'State', 'Purpose'], 
]

Y_df, S_df, tags = aggregate(Y_df, spec)
Y_df = Y_df.reset_index()

Of course, I can always generate the Y_df with the bottom hierarchy and filter it out afterwards, but I suspect that I'll also need to filter the S_df matrix and the tags dictionary for the hierarchical methods to work later. Thank you in advance for your help.

4 Views

Open in Slack

Previous Next