Hi I have a question regarding the attainment of conformal p Nixtla Community #mlforecast

Hi, I have a question regarding the attainment of ...

Johannes Emme

06/29/2024, 2:46 PM

Hi, I have a question regarding the attainment of conformal prediction intervals. It's a bit tricky to explain in writing, but I'll do my best to convey the message. In short, the issue I'm experiencing is that the current creation of prediction intervals is horizon-based and does not account for "where during the hour/day/week" the error occurs. What do I mean by that? Let me show you what I did to encounter this issue. First, I should mention that I am working with hourly time series data. I trained a model using the conformal_distribution with 10 windows of a conformal interval size (h) of 24*4 (96). In the plot 1, you can see the resulting

cs_df

and the true target plotted against each other. From this plot, it can be seen that my model is okay at predicting the weekends but has clear difficulties in predicting the Mondays. However, when I used the model for predictions (see plot 2), the uncertainty for the weekends was very large, and the Mondays had small uncertainty. (In the plot2 I have forgotten legends: black = true, blue = mean prediction, purple = 10th and 90th percentiles) What I have come to realize is that the problem arises from a misalignment between the conformal horizon and the horizon of when I am predicting. With a conformal horizon of 96, the errors collected for a specific timestep are not “belonging to the same timeslot.” For instance, the first error in the first window corresponds to Monday 00:00, while for the next window, the first hour is Friday 00:00, then Tuesday 00:00, and so on. Hence, when I predict the consumption during Saturday, the quantiles are based on several different days and hours and not “Saturday hour errors.” To overcome this issue, I set the conformal horizon to 24*7 (168) so that my conformal windows start with the same day as when I am predicting. Then I get the following result (see plot 3 and 4), where the uncertainty is low for the weekends and high for the Mondays. However, I do not believe this is a sustainable solution. Unfortunately, I don't have a very great alternative either. Currently, I have simply for my case rewritten the

_add_conformal_distribution_intervals

function by: 1. Requiring that n_windows*h >= 168 to have all hours in the week represented. 2. Joining the

cs_df

and

fcst_df

day_of_week

and

hour

. 3. Subtracting and adding the mean to get a distribution around each hour, and then calculating the quantiles I am very curious to hear your thoughts on this. Best regards, Johannes

José Morales

07/01/2024, 6:06 PM

Hey. Your approach makes sense to me. With the recursive approach the errors are mostly correlated with how far ahead you're predicting, but in your case since the main drivers are the day and hour I think it's ok to compute the scores based on that

2 Views

Open in Slack

Previous Next