Hi, I am using the aggregate function to generate ...
# general
g
Hi, I am using the aggregate function to generate the hierarchical data (actuals). However, I don't need to aggregate my forecasts since I generate forecasts at each level. The notebook examples use the aggregate function. Does anyone have an example of how my data should look in order to use the HierarchicalLoad function? When I run this, it triple counts everything since I have forecasts at all 3 levels:
Copy code
Y_fitted_df, S_fitted, tags_fitted = aggregate(Y_fitted_df, hiers)
When I run this, I get an error:
Copy code
Y_fitted_df, S_fitted, tags_fitted = HierarchicalData.load(Y_fitted2_df)
TypeError: load() missing 1 required positional argument: 'group'
m
Hi @Ginger Holt; the
HierarchicalLoad
function is meant for benchmarking or research purposes. The idea is to have a function that makes it easier to load famous datasets like tourism or Labour. If you already have forecasts at all different levels, you just need the
S
matrix and the
tags
to use any reconciliation method. The
aggregate
function assumes you just have the lowest-level hierarchy. Currently, we don’t have an explicit function to generate the
S
Matrix or
tags
for the case where the user already has forecasts for every level. However, you can do the following “hack” and it should work fine. 1. Create a
Y_fitted_df_bottom
with just the bottom-level ids. I.e, filter your current
Y_fitted_df
so it has just the ids from the lowest level. 2. Use the aggregate function to generate your
S
Matrix and
tags
. (You can trash the resulting df.) Something like:
Y_fitted_df_trash , S_fitted, tags_fitted = aggregate(Y_fitted_df_bottom, hiers)
3. Use your original
Y_fitted_df,
and the newly generated
S
and
tags
to call the reconciliation method of your choosing.
k
Hi @Ginger Holt This is an example where I did the hack to recover the
S_df
from a complete hierarchy, you would need to adapt this example to your needs:
Copy code
# Reading Data
Y_hat_df = pd.read_csv('Y_hat_df.csv')
Y_fitted_df = pd.read_csv('Y_fitted_df.csv')

# Creating bottom df from Y_fitted
unique_ids = Y_hat_df.unique_id.unique()
level_count = np.array([u_id.count('/') for u_id in unique_ids])
is_bottom = (level_count == max(level_count))
bottom_level =  np.array([u_id[4:] for u_id in unique_ids])
bottom_df = pd.DataFrame(dict(unique_id=unique_ids,
                              is_bottom=is_bottom,
                              total_level=['SAU']*len(unique_ids),
                              bottom_level=bottom_level))
bottom_df = bottom_df[bottom_df.is_bottom==True]
bottom_df['ds'] = 1
bottom_df['y'] = 1
bottom_df = bottom_df[['unique_id', 'total_level', 'bottom_level', 'ds', 'y']]

hierarchy_levels = [['total_level'], ['total_level', 'bottom_level']]
_, S_df, tags = aggregate(df=bottom_df, spec=hierarchy_levels)
This is a nice idea for a functionality of the library.
🙌 2