In <MSTL> doc, all the percentile confidence inter...
# general
k
In MSTL doc, all the percentile confidence intervals are the same as the expected value, I used it with my own data as well, it's all the same. Is there a bug?
👍 1
m
Hi @Ken Lee Thanks for bringing this issue to our attention. Can you help us by opening an issue on statsforecast? I've used the MSTL recently and got proper prediction intervals, so I'm curious about your use case that is showing the same problem as the tutorial.
👍 1
k
But you can see in this documents all predictions are same values right?
Copy code
MSTL	MSTL-lo-95	MSTL-lo-80	MSTL-hi-80	MSTL-hi-95
m
yes, no question about that. Will check that example ASAP. But it's interesting you are experiencing the same issue
k
the graphs as well.
so... from your experience.. it should work and doc is wrong is what I'm gathering.
m
With the AirPassengers dataset, for example, I'm getting the correct intervals. Here's a reproducible example:
Copy code
import os 
from statsforecast import StatsForecast
from statsforecast.models import MSTL 
from statsforecast.utils import AirPassengersDF as ap 
os.environ['NIXTLA_ID_AS_COL'] = '1'

sf = StatsForecast(
  models=[MSTL(season_length=[12])], 
  freq = 'MS'
)

fc = sf.forecast(df=ap, h=12, level=[80,95])

StatsForecast.plot(ap, fc, level=[80,95])
yeah, as I said, I'm not sure what is going on with that doc, but the intervals shouldn't look like that
k
I confirmed your example do work with right intervals...
m
there's indeed an error in the tutorial. The
seasonal_length
is set to an absurd number (8766). With the correct value (24*7), the prediction intervals look ok. It should be in main by Monday 🙂
k
I concluded the same, somehow, the longer your seasonal (relative to training data), the smaller the confidence intervals. This is counter intuitive though to what we understand about conformal predictions, maybe it's indicative of deeper bug.
👍 1
Copy code
import os 
from statsforecast import StatsForecast
from statsforecast.models import MSTL 
from statsforecast.utils import AirPassengersDF as ap 
import numpy as np
os.environ['NIXTLA_ID_AS_COL'] = '1'


evaluation = []
grid = np.arange(10, ap.shape[0], 10)

for g in grid:

    sf = StatsForecast(
    models=[MSTL(season_length=[g])], 
    freq = 'MS'
    )
    fc = sf.forecast(df=ap, h=12, level=[80,95], id_col="unique_id")
    evaluation.append((g, np.mean(fc["MSTL-hi-95"] - fc["MSTL"])))


pd.DataFrame(evaluation, columns = ["seasonal_length", "distance"]).plot.scatter(x="seasonal_length", y="distance", title="seasonal length vs. confidence interval distance")
m
Tbh, I had never considered this issue. I would argue that the seasonal length shouldn't vary that much, since at the end of the day, it's tied to whatever phenomena the series is representing. And it's always worth checking out the code of a model, although maybe some more examples are needed to point out where the error could be, if there is any.
k
I think it's not hard to stumble this problem, assume you know that there is 24 seasonality, then weekly seasonality, and end of the month seasonality (assume we are taking about some ride share business, it could be anything really) Then you would have seasonal represented in hours =
[24, 24 * 7, 24 * 7 *30]
. this is not too wild right, now.... you train a model but for some time series, you only have 30 days worth of data.... then you hit this diminishing CI bug.... where your 24 * 7 * 30 completely makes your CI went away. The user has no way to know why this is not working, this model worked on time series with a year worth of data, but all of a sudden the CI shrink for a shorter time frame... 🧩
👍 1
v
I think plain vanilla implementation of conformal prediction might not be optimal for seasonality, perhaps Nixtla can consider more suitable implementation using some ideas from here for example https://arxiv.org/abs/2406.16766