This message was deleted Nixtla Community #mlforecast

Join Slack

This message was deleted.

# mlforecast

Slackbot

02/09/2024, 3:23 PM

This message was deleted.

José Morales

02/09/2024, 4:23 PM

Can you try running a CV with that configuration on a single model? e.g.

Copy code

mlf = MLForecast(models=[LinearRegression()], freq='MS', lags=[1, 2, 3])
mlf.cross_validation(df, h=18, n_windows=3)

I suspect you may have gaps in your series, so when CV is performed to get the prediction intervals that error is raised

Naren Castellon

02/09/2024, 4:46 PM

When training the following model, if I remove the prediction_intervals from the fit() method, it trains well, then the problem happens when the CV is performed. Now if I do a Perform time series cross-validation, it also sends me an error:

Cross validation result produced less results than expected. Please verify that the frequency set on the MLForecast constructor matches your series' and that there aren't any missing periods.

José Morales

02/09/2024, 4:47 PM

Can you try running the following:

Copy code

from utilsforecast.preprocessing import fill_gaps

filled = fill_gaps(df, freq='MS', start='per_serie', end='per_serie')
print(df.shape[0], filled.shape[0])

If those two numbers are different then you have gaps in your series

Naren Castellon

02/09/2024, 4:55 PM

If I use the function

fill_gaps(df, freq='MS')

to see the null values, impute them separately, the model works well, the idea is to apply the

make_pipeline

with the

SimpleImputer

parameter and see if it does the job correctly.

José Morales

02/09/2024, 4:56 PM

The important thing is that you have consecutive dates. You can leave the null values if you want to and let the SimpleImputer handle them, but the imputer won't add the missing dates like the fill_gaps function does

Naren Castellon

02/09/2024, 6:09 PM

I did the test adding

filled = fill_gaps(df, freq='MS', start='per_series', end='per_series')

the missing data, and when I want to see the processing with

mlf.preprocess(df)

it sends me the error:

ValueError: and column contains null values.

The Mlforecast and statsforecast method in its construction before the fit() method does not allow null values in the target and therefore when entering the model it will give an error

José Morales

02/09/2024, 6:12 PM

Why don't you want to fill them before?

Naren Castellon

02/09/2024, 6:21 PM

As I mentioned before, the idea is to test the make_pipeline with MLforecast with the stacking model, but if my df has null values at the time of building the model, it does not do it. For example if I do this: l`inear_preprocessor = make_pipeline(StandardScaler(), SimpleImputer(strategy="mean", add_indicator=True))`

estimators= make_pipeline(linear_preprocessor, RandomForestRegressor(random_state=42))

stacking_regressor = StackingRegressor(estimators=estimators, final_estimator=RidgeCV())

Before you get here or when you do the step that would be the last step in building the model, there should no longer be null data.

mlf = MLForecast(models=stacking_regressor,

freq='MS',

lags=[1,2,3],

lag_transforms={1: [expanding_mean],7: [(rolling_mean, 7)] },

#target_transforms=[ Differences([1]), LocalStandardScaler ()],

date_features=["year", "month", "day"],

num_threads=2

Now if the problem of null data is not resolved here when I apply the

mlf.fit(df, fitted)

It will send me the error:

ValueError: y column contains null values.

José Morales

02/09/2024, 6:26 PM

Why do you want the imputer to do it instead of doing it yourself? The imputer will do it wrong because it will use the full column to compute the mean and you should probably use the mean by id

José Morales

02/09/2024, 6:26 PM

Also since these are time series you should probably use the expanding mean instead

3 Views

Open in Slack

Previous Next