This message was deleted.
# mlforecast
s
This message was deleted.
j
Can you try running a CV with that configuration on a single model? e.g.
Copy code
mlf = MLForecast(models=[LinearRegression()], freq='MS', lags=[1, 2, 3])
mlf.cross_validation(df, h=18, n_windows=3)
I suspect you may have gaps in your series, so when CV is performed to get the prediction intervals that error is raised
n
When training the following model, if I remove the prediction_intervals from the fit() method, it trains well, then the problem happens when the CV is performed. Now if I do a Perform time series cross-validation, it also sends me an error:
Cross validation result produced less results than expected. Please verify that the frequency set on the MLForecast constructor matches your series' and that there aren't any missing periods.
j
Can you try running the following:
Copy code
from utilsforecast.preprocessing import fill_gaps

filled = fill_gaps(df, freq='MS', start='per_serie', end='per_serie')
print(df.shape[0], filled.shape[0])
If those two numbers are different then you have gaps in your series
n
If I use the function
fill_gaps(df, freq='MS')
to see the null values, impute them separately, the model works well, the idea is to apply the
make_pipeline
with the
SimpleImputer
parameter and see if it does the job correctly.
j
The important thing is that you have consecutive dates. You can leave the null values if you want to and let the SimpleImputer handle them, but the imputer won't add the missing dates like the fill_gaps function does
n
I did the test adding
filled = fill_gaps(df, freq='MS', start='per_series', end='per_series')
the missing data, and when I want to see the processing with
mlf.preprocess(df)
it sends me the error:
ValueError: and column contains null values.
The Mlforecast and statsforecast method in its construction before the fit() method does not allow null values in the target and therefore when entering the model it will give an error
j
Why don't you want to fill them before?
n
As I mentioned before, the idea is to test the make_pipeline with MLforecast with the stacking model, but if my df has null values at the time of building the model, it does not do it. For example if I do this: l`inear_preprocessor = make_pipeline(StandardScaler(), SimpleImputer(strategy="mean", add_indicator=True))`
estimators= make_pipeline(linear_preprocessor, RandomForestRegressor(random_state=42))
stacking_regressor = StackingRegressor(estimators=estimators, final_estimator=RidgeCV())
Before you get here or when you do the step that would be the last step in building the model, there should no longer be null data.
mlf = MLForecast(models=stacking_regressor,
freq='MS',
lags=[1,2,3],
lag_transforms={1: [expanding_mean],7: [(rolling_mean, 7)] },
#target_transforms=[ Differences([1]), LocalStandardScaler ()],
date_features=["year", "month", "day"],
num_threads=2
)
Now if the problem of null data is not resolved here when I apply the
mlf.fit(df, fitted)
It will send me the error:
ValueError: y column contains null values.
j
Why do you want the imputer to do it instead of doing it yourself? The imputer will do it wrong because it will use the full column to compute the mean and you should probably use the mean by id
Also since these are time series you should probably use the expanding mean instead