https://github.com/nixtla logo
#general
Title
# general
r

Ramon Botella Nieto

08/04/2023, 6:23 AM
Hello everyone, I'm liking a lot the methods and utils of statsforecast and and hieararchicalforecast packages. I have a question about an specific utility that I'm finding specially useful:
hierarchicalforecast.utils.aggregate
. I have hierarchy tree quite similar to the one in the Australian tourism data example:
Copy code
spec = [
    ['Country'],
    ['Country', 'State'], 
    ['Country', 'Purpose'], 
    ['Country', 'State', 'Region'], 
    ['Country', 'State', 'Purpose'], 
    ['Country', 'State', 'Region', 'Purpose']
]
In my case, I do not want to make predictions to the bottom layer,
['Country', 'State', 'Region', 'Purpose']
. This bottom level hierarchy is of no interest and adding it to the Y_df dataframe, adds a great number of series to fit that I do not need and that will slow the process. However, when I try to suppress that last element to the list and pass it to the
aggregate
method together with my dataframe, it returns this error:
ValueError: Categorical categories must be unique
The error can be reproduced with this snippet (where I only deleted the last hierarchy list from
spec
in this example):
Copy code
import numpy as np
import pandas as pd
from hierarchicalforecast.utils import aggregate

Y_df = pd.read_csv('<https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/tourism.csv>')
Y_df = Y_df.rename({'Trips': 'y', 'Quarter': 'ds'}, axis=1)
Y_df.insert(0, 'Country', 'Australia')
Y_df = Y_df[['Country', 'Region', 'State', 'Purpose', 'ds', 'y']]
Y_df['ds'] = Y_df['ds'].str.replace(r'(\d+) (Q\d)', r'\1-\2', regex=True)
Y_df['ds'] = pd.to_datetime(Y_df['ds'])
Y_df.head()

spec = [
    ['Country'],
    ['Country', 'State'], 
    ['Country', 'Purpose'], 
    ['Country', 'State', 'Region'], 
    ['Country', 'State', 'Purpose'], 
]

Y_df, S_df, tags = aggregate(Y_df, spec)
Y_df = Y_df.reset_index()
Of course, I can always generate the Y_df with the bottom hierarchy and filter it out afterwards, but I suspect that I'll also need to filter the S_df matrix and the tags dictionary for the hierarchical methods to work later. Thank you in advance for your help.