Slackbot
12/13/2023, 11:30 AMJosé Morales
12/13/2023, 5:02 PMAntoine SCHWARTZ -CROIX-
12/13/2023, 5:16 PMmlf = MLForecast(
models=[model],
freq="W",
lags=list(range(1, 53)),
lag_transforms = {
1: [(rolling_mean, 51), (rolling_std, 51)],
26: [(rolling_mean, 26), (rolling_std, 26)],
40: [(rolling_mean, 12), (rolling_std, 12)],
44: [(rolling_mean, 8), (rolling_std, 8)],
48: [(rolling_mean, 4), (rolling_std, 4)],
}
)
But it seems like it works if I let at least 2 * 52 + 1 points by series after the preprocess, by setting lags=list(range(1, 52))
(excluding lag 52).José Morales
12/13/2023, 5:18 PMJosé Morales
12/13/2023, 5:22 PMfrom mlforecast import MLForecast
from utilsforecast.data import generate_series
freq = 'W'
series = generate_series(1, min_length=1_000, max_length=2_000, freq=freq)
fcst = MLForecast(models=[], freq=freq, lags=[52])
prep = fcst.preprocess(series, dropna=False)
min_samples = prep.isnull().sum().max() + 1
min_samples
Antoine SCHWARTZ -CROIX-
12/13/2023, 5:25 PMdf.groupby("unique_id")["ds"].count().min()
==> 156
prep.groupby("unique_id")["ds"].count().min()
==> 104
prep.isnull().sum().max() + 1
==> 1Antoine SCHWARTZ -CROIX-
12/13/2023, 5:27 PMprediction_intervals
process who seems to need more than `n_windows`*`h` observationsJosé Morales
12/13/2023, 5:33 PMAntoine SCHWARTZ -CROIX-
12/13/2023, 5:50 PMAntoine SCHWARTZ -CROIX-
12/13/2023, 5:53 PMJosé Morales
12/13/2023, 6:01 PMfrom sklearn.impute import SimpleImputer
from sklearn.pipeline import make_pipeline
model = make_pipeline(
SimpleImputer(strategy='constant', fill_value=0),
LinearRegression() # or any other model you're using
)
fcst = MLForecast(models=[model], ...)
that would fill the nulls with zeros before passing them to the model, which is easier to implement I thnkJosé Morales
12/13/2023, 6:03 PMAntoine SCHWARTZ -CROIX-
12/13/2023, 6:10 PMJosé Morales
12/13/2023, 6:11 PMJosé Morales
12/13/2023, 6:13 PMclass MyPredictionIntervals:
def __init__(self, h, n_windows, method = 'conformal_distribution'):
self.h = h
self.n_windows = n_windows
self.method = method
and then provide that to fit:
fcst.fit(..., prediction_intervals=MyPredictionIntervals(n_windows=1, h=52))
I believe that'd work. Just don't tell anyone I'm suggesting this hahaAntoine SCHWARTZ -CROIX-
12/13/2023, 6:15 PMJosé Morales
12/13/2023, 6:17 PMAntoine SCHWARTZ -CROIX-
12/14/2023, 8:31 AMAntoine SCHWARTZ -CROIX-
12/15/2023, 9:37 AMprediction_intervals = MyPredictionIntervals(h=52, n_windows=1)
and I have the same error Input X contains NaN
with the linearRegression model.
I also tested to set h=1
, and I had the same result. I can’t figure out what’s going on in this CV...Antoine SCHWARTZ -CROIX-
12/18/2023, 4:12 PMJosé Morales
12/18/2023, 4:21 PMAntoine SCHWARTZ -CROIX-
12/18/2023, 4:22 PMJosé Morales
12/18/2023, 4:29 PMAntoine SCHWARTZ -CROIX-
12/18/2023, 4:46 PMgenerate_series
...
My dataset can contain series with quite sparse values and large peaks. From what I've seen, 6 series out of 32,000 are involved.José Morales
12/18/2023, 4:46 PMAntoine SCHWARTZ -CROIX-
12/18/2023, 4:47 PMJosé Morales
12/18/2023, 4:52 PMJosé Morales
12/18/2023, 4:55 PM%debug
magic in IPython you could check which columns contain the NaNs, that'd help a lot. Just running something like X.isnull().sum()
should produce the NaNs by columnAntoine SCHWARTZ -CROIX-
12/18/2023, 4:55 PMrolling_mean
& rolling_std
José Morales
12/18/2023, 4:58 PMAntoine SCHWARTZ -CROIX-
12/18/2023, 4:59 PMJosé Morales
12/18/2023, 5:00 PMAntoine SCHWARTZ -CROIX-
12/18/2023, 5:01 PM