Ramon Botella Nieto
08/04/2023, 6:23 AMhierarchicalforecast.utils.aggregate
. I have hierarchy tree quite similar to the one in the Australian tourism data example:
spec = [
['Country'],
['Country', 'State'],
['Country', 'Purpose'],
['Country', 'State', 'Region'],
['Country', 'State', 'Purpose'],
['Country', 'State', 'Region', 'Purpose']
]
In my case, I do not want to make predictions to the bottom layer, ['Country', 'State', 'Region', 'Purpose']
. This bottom level hierarchy is of no interest and adding it to the Y_df dataframe, adds a great number of series to fit that I do not need and that will slow the process. However, when I try to suppress that last element to the list and pass it to the aggregate
method together with my dataframe, it returns this error: ValueError: Categorical categories must be unique
The error can be reproduced with this snippet (where I only deleted the last hierarchy list from spec
in this example):
import numpy as np
import pandas as pd
from hierarchicalforecast.utils import aggregate
Y_df = pd.read_csv('<https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/tourism.csv>')
Y_df = Y_df.rename({'Trips': 'y', 'Quarter': 'ds'}, axis=1)
Y_df.insert(0, 'Country', 'Australia')
Y_df = Y_df[['Country', 'Region', 'State', 'Purpose', 'ds', 'y']]
Y_df['ds'] = Y_df['ds'].str.replace(r'(\d+) (Q\d)', r'\1-\2', regex=True)
Y_df['ds'] = pd.to_datetime(Y_df['ds'])
Y_df.head()
spec = [
['Country'],
['Country', 'State'],
['Country', 'Purpose'],
['Country', 'State', 'Region'],
['Country', 'State', 'Purpose'],
]
Y_df, S_df, tags = aggregate(Y_df, spec)
Y_df = Y_df.reset_index()
Of course, I can always generate the Y_df with the bottom hierarchy and filter it out afterwards, but I suspect that I'll also need to filter the S_df matrix and the tags dictionary for the hierarchical methods to work later.
Thank you in advance for your help.