This message was deleted Nixtla Community #mlforecast

Join Slack

This message was deleted.

# mlforecast

Slackbot

10/18/2023, 2:54 PM

This message was deleted.

José Morales

10/18/2023, 4:27 PM

That's indeed caused by the transformations. If you have a serie with 10 samples and want to use lag15 you'll get a NaN. If you're fine with that you can define a pipeline instead like:

Copy code

from sklearn.impute import SimpleImputer
from sklearn.pipeline import make_pipeline

rf = make_pipeline(SimpleImputer(), RandomForestRegressor())  # use this as your model instead

Or you can use a before predict callback that fills nans

Copy code

def fill_na(df):
    return df.fillna(0)  # or some other value

MLforecast.predict(h, before_predict_callback=fill_na)

Francisco

10/18/2023, 4:28 PM

but all my series have 45 samples

Francisco

10/18/2023, 4:31 PM

why does XGBRegressor and LGBMRegressor dont throw an error? I also did the .preprocess and it didnt show any NAs so where are the NAs?

José Morales

10/18/2023, 4:41 PM

But you're using the rolling mean of the lag 12 over a window of size 12, which needs 25 samples. Since you're using h=6, step_size=12 and n_windows=2 the first training set removes 6 + 12 = 18 samples. So if you have 45 then the first training set has 45 - 18 = 27. If you're using the first difference that drops another one, so you end up with 26. Are you sure all your series have 45? From the errors seems like some end up with less samples

José Morales

10/18/2023, 4:41 PM

The other models don't throw errors because they can handle null values in the features, so they're still getting NaNs but they produce their predicions with that

Francisco

10/18/2023, 4:43 PM

hmm I see, that does make a lot of sense

Francisco

10/18/2023, 4:44 PM

but what about the warning when it says that it found NAs on the lag 1, lag 2 etc ?

Francisco

10/18/2023, 4:44 PM

Copy code

Found null values in lag1, lag2, lag3, lag6, lag12, expanding_mean_lag1,

Francisco

10/18/2023, 4:44 PM

what could be causing those errors?

Francisco

10/18/2023, 4:45 PM

yes, indeed all of my series have exactly 45 observations each

José Morales

10/18/2023, 4:55 PM

Can you provide a reproducible example? I just tried this and it runs fine:

Copy code

import numpy as np
from mlforecast import MLForecast
from mlforecast.target_transforms import GlobalSklearnTransformer, LocalStandardScaler, Differences
from mlforecast.utils import generate_series
from numba import njit
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import FunctionTransformer
from window_ops.expanding import expanding_mean
from window_ops.rolling import rolling_mean

@njit
def rolling_mean_12(x):
    return rolling_mean(x, window_size=12)

sk_log1p = FunctionTransformer(np.log1p, np.expm1)
series = generate_series(2, min_length=45, max_length=45)

lags = [1,2,3,6,12]
lag_transforms = {
    1: [expanding_mean, rolling_mean_12],
    2: [expanding_mean, rolling_mean_12],
    3: [expanding_mean, rolling_mean_12],
    6: [expanding_mean, rolling_mean_12],
    12: [expanding_mean, rolling_mean_12]
}
# Crear pipeline de pronosticos
ml = MLForecast(
    models=[LinearRegression()], #3 modelos de ML
    freq='D', #Month start
    lags=lags,
    lag_transforms=lag_transforms, #Lag transforms definidas anteriormente
    target_transforms=[GlobalSklearnTransformer(sk_log1p), LocalStandardScaler(), Differences([1])], #El orden de las
    #transformaciones importa (log, estandarizo y primera diferencia)
    num_threads=1,
    date_features=['month']
ml.cross_validation(df=series, h=6, step_size=12, n_windows=2)
)

Francisco

10/18/2023, 5:49 PM

yes

20 Views

Open in Slack

Previous Next