This message was deleted Nixtla Community #general

Join Slack

This message was deleted.

# general

Slackbot

09/13/2022, 9:47 PM

This message was deleted.

👀 1

fede (nixtla) (they/them)

09/14/2022, 9:01 PM

Hi @Jonathan Farland! A good example can be found here: https://nixtla.github.io/hierarchicalforecast/examples/AustralianDomesticTourism.html. If you have a set of time series of the lowest level and want to construct the

matrix and the dataset with all hierarchies, you can use the

aggregate

function (

from hierarchicalforecast.utils *import* aggregate

). The function takes the time series of the lowest level and the hierarchical structure. Please let me know if that example works for your use case. :)

Jonathan Farland

09/14/2022, 9:03 PM

Nice! That looks like exactly what I was thinking of, I'll give it a shot when I have a second and will let you know if there are any issues. Thanks so much!

🙌 1

Jonathan Farland

10/05/2022, 8:25 PM

Hi @fede (nixtla) (they/them) Thanks again for pointing me to that example previously - it looks like it got taken down? I'm running into a perplexing issue while using a toy example. I have a data set with 672 time series and I can successfully follow the docs and reproduce the forecasts, but when I try to reconcile them, I appear to have a mismatch of dimensions somewhere and I am not sure how to track it down past the sanity checks I've already done. Here's the stacktrace

Copy code

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/var/folders/01/4mysj8cx1bjg_w8rbw097jwr0000gp/T/ipykernel_22347/4254128446.py in <module>
      7 hrec = HierarchicalReconciliation(reconcilers=reconcilers)
      8 #Y_rec_df = hrec.reconcile(Y_hat_df, Y_df_train, S, tags)
----> 9 Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=Y_df_train, S=S, tags=tags)

~/opt/anaconda3/envs/ts_recon/lib/python3.9/site-packages/hierarchicalforecast/core.py in reconcile(self, Y_hat_df, S, tags, Y_df, level, bootstrap)
    148                 kwargs = {key: common_vals[key] for key in kwargs}
    149                 fcsts_model = reconcile_fn(y_hat=y_hat_model, **kwargs)
--> 150                 fcsts[f'{model_name}/{reconcile_fn_name}'] = fcsts_model['mean'].flatten()
    151                 if (pi and has_level and level is not None) or (bootstrap and level is not None):
    152                     for lv in level:

~/opt/anaconda3/envs/ts_recon/lib/python3.9/site-packages/pandas/core/frame.py in __setitem__(self, key, value)
   3653         else:
   3654             # set column
-> 3655             self._set_item(key, value)
   3656 
   3657     def _setitem_slice(self, key: slice, value):

~/opt/anaconda3/envs/ts_recon/lib/python3.9/site-packages/pandas/core/frame.py in _set_item(self, key, value)
   3830         ensure homogeneity.
   3831         """
-> 3832         value = self._sanitize_column(value)
   3833 
   3834         if (

~/opt/anaconda3/envs/ts_recon/lib/python3.9/site-packages/pandas/core/frame.py in _sanitize_column(self, value)
   4533 
   4534         if is_list_like(value):
-> 4535             com.require_length_match(value, self.index)
   4536         return sanitize_array(value, self.index, copy=True, allow_2d=True)
   4537 

~/opt/anaconda3/envs/ts_recon/lib/python3.9/site-packages/pandas/core/common.py in require_length_match(data, index)
    555     """
    556     if len(data) != len(index):
--> 557         raise ValueError(
    558             "Length of values "
    559             f"({len(data)}) "

ValueError: Length of values (4704) does not match length of index (2688)

Jonathan Farland

10/05/2022, 8:25 PM

Dimensions of training data:

Jonathan Farland

10/05/2022, 8:26 PM

Dimensions of forecast data:

Jonathan Farland

10/05/2022, 8:26 PM

More checks and calls to reconcile:

Jonathan Farland

10/05/2022, 8:32 PM

@fede (nixtla) (they/them) if no obvious issue jumps right out at you here, I can try to make a reproducible github issue if that's your suggestion

fede (nixtla) (they/them)

10/05/2022, 9:15 PM

Hey @Jonathan Farland! Thanks for sharing your code and tests. It seems that the problem is related to the

Y_df_train

data. I can see that it contains 672 base time series, but seeing the shape of the summing matrix

, it seems that it is constructed for

base time series. To create

, did you use the

aggregate

function? Perhaps we are missing something

Jonathan Farland

10/05/2022, 9:20 PM

Thanks! Here's how I created `S`:

Jonathan Farland

10/05/2022, 9:20 PM

df

looks like:

Jonathan Farland

10/05/2022, 9:21 PM

and the only other step that I think is related is :

fede (nixtla) (they/them)

10/05/2022, 9:38 PM

I see. I’m thinking that maybe some rows or time series are being deleted in that step. Would it be possible for you to share your data to explore the problem in detail? Perhaps you could mask

Jonathan Farland

10/05/2022, 9:42 PM

I can share the data and my code, the data is public and just a sample from the M5 competition. I'll package it up here. I also see now that I actually have 628 base time series (store x dept) and 45 time series (store) = gets us to the 673 and therefore S is 673 x 628

Jonathan Farland

10/05/2022, 9:43 PM

but yeah, still end up with 672 rows in

Y_df_train

Jonathan Farland

10/05/2022, 9:49 PM

Here's my notebook and toy data set. Really appreciate you taking a look, I will also continue to look for anything silly I am doing here

nixtla-M5.tar.gz

fede (nixtla) (they/them)

10/05/2022, 11:08 PM

nice! Thank you @Jonathan Farland! I found the problem. The series

store36/dept6

has only two observations. Also, there are other time series with missing values. I solved the problem by imputing the missing values with zero; since we are dealing with demand data, it makes sense. I’m imputing the missing values for each time series from the first date of the series until the last date of the whole dataset. For example, the first observation of

store36/dept6

2012-06-08

, and the last observation of the training set is

2012-10-26

, thus, that series will range from

2012-06-08

until`2012-10-26`. The second image shows this.

fede (nixtla) (they/them)

10/05/2022, 11:09 PM

I have also added (bootstrapped) prediction intervals to the final forecasts, which I think are a nice feature

❤️ 1

fede (nixtla) (they/them)

10/05/2022, 11:10 PM

Here’s the nb

nixtla-M5.ipynb

fede (nixtla) (they/them)

10/05/2022, 11:11 PM

Please let me know if you have any questions 🙂

Jonathan Farland

10/05/2022, 11:12 PM

Wow! Nice! So basically you just padded every time series to conform to the global begin and end dates, right?

Jonathan Farland

10/05/2022, 11:13 PM

Padded by imputation

Jonathan Farland

10/05/2022, 11:16 PM

Oh I re-read your note - you start from whenever the series begins, but to the global end

fede (nixtla) (they/them)

10/05/2022, 11:20 PM

you start from whenever the series begins, but to the global end

Yes, exactly

Jonathan Farland

10/05/2022, 11:22 PM

Ok I’ll take a look myself and get back to you. Thanks for taking a look - maybe we can make some add some warnings about this issue in the future

fede (nixtla) (they/them)

10/05/2022, 11:30 PM

Sure! We are also working on a library to do time series preprocessing efficiently and to detect inconsistencies in the data. That library feature, would be valuable for your use cases? Besides missing values, is there any other functionality that you would like to have? Your input will be very useful for us to prioritize development

🙌 1

Jonathan Farland

10/06/2022, 6:44 PM

Nothing comes to mind right now in terms of features, but I will say that maybe our two companies can collaborate a little here. now that you've unblocked me (thanks again), I need a little bit of time to build out what I am thinking but will be happy to share with you here, and then we can take it from there?

fede (nixtla) (they/them)

10/06/2022, 9:18 PM

Yes, sure! 🎉

2 Views

Open in Slack

Previous Next