Hey there - I’ve been toying around with the diffe...
# general
s
Hey there - I’ve been toying around with the different forecasting packages Nixtla has made available and I’m hopefully that we can start using it in production to forecast sales of our items. I have a question that I’m hoping I could get some insight on: how do you control for inconsistencies the amount of data available within a hierarchy? Over the course of four years, we have launched and discontinued both items and categories of items. In a category, we might have 30 items today, but only 20 two years ago. Similarly, we have a category that was only available for about six months until it was discontinued. I’ve padded the data with 0's so that every item has an equal number of records, but I expect this isn’t an ideal solution. Any tips?
f
Don't do that! Don't pad with zeros like that. There is a huge difference between 'zero' and 'nonexistent' data! When you do zero padding like that, you are essentially extrapolating into the past. This is very different than intermittent data where sometimes you might be able to use zero padding. When you do this, you introduce lots of zeros into the past and will end up averaging your good data with so many zeros from products that didn't even exist back then! My recommendation is to try to forecast at a higher level of granularity where you have more data. For instance, think of clustering some products together to get more available data for forecasting.
m
Hi Stephen, I have had to deal with this problem before when generating a forecast for a large retailer (pet supplies). In general, it is a hard problem, but what has worked for me is to use a hierarchical approach based on forecasted proportions. You can find a description of this method in the following paper by Hyndman et al. (See section 3.3). https://robjhyndman.com/papers/hiertourism.pdf With this approach, you don't need to remove discontinued items or categories. Just make sure their forecast is zero before doing the splashing. We currently don't have this approach in Nixtla's hierarchical forecast, but you can generate a forecast for a middle and a bottom level using any of Nixtla's models and then do the splashing described in the paper above. Happy to help if you want to discuss this more in detail.
🤯 1
👍 1
s
My recommendation is to try to forecast at a higher level of granularity where you have more data. For instance, think of clustering some products together to get more available data for forecasting.
While I expect this could correct the flaw in assumption, this wouldn’t meet the requirements of the project, since we need to forecast individual items not a category of items.
With this approach, you don’t need to remove discontinued items or categories. Just make sure their forecast is zero before doing the splashing.
When you use the word “splashing”, are you referring to distributing a forecast down a hierarchy based on proportion? I couldn’t find that phrase in the paper and just wanted to make sure I understand. Repeating it back so I’m clear. In this scenario, records wouldn’t be padded and forecasts would only be produced for an item when data is available. However, once they are converted to a distribution, they would be padded to facilitate the “splashing”. Meaning, there would always be a bottom level proportion for each base level category, however items or categories which either (a) have not yet launched or (b) have been discontinued would be given a fixed 0% for the distribution.
f
@Stephen Witkowski Forecasting is not magic as Max said once. If you have items that aren't launched or have been around for only a month or two, there is just not enough data to understand their behavior. You can try different methods but under the hood they all have to make the inference by looking at other items that have more data. I know communicating these types of scientific facts with the business is often challenging and the requirements might be unreasonable. Wish you good luck with that. Let us know if you find some method that works.
m
Hi Stephen, Yes, by "splashing" I meant the process of distributing the forecast down. And yes, in this scenario, the discounted items or categories are used as part of the history, but end up with a zero forecast.