The issue seems to be when generating the lag and ...
# mlforecast
f
The issue seems to be when generating the lag and lag transforms, that the missing values are created, but still I havent been able so fix the error
j
That's indeed caused by the transformations. If you have a serie with 10 samples and want to use lag15 you'll get a NaN. If you're fine with that you can define a pipeline instead like:
Copy code
from sklearn.impute import SimpleImputer
from sklearn.pipeline import make_pipeline

rf = make_pipeline(SimpleImputer(), RandomForestRegressor())  # use this as your model instead
Or you can use a before predict callback that fills nans
Copy code
def fill_na(df):
    return df.fillna(0)  # or some other value

MLforecast.predict(h, before_predict_callback=fill_na)
f
but all my series have 45 samples
why does XGBRegressor and LGBMRegressor dont throw an error? I also did the .preprocess and it didnt show any NAs so where are the NAs?
j
But you're using the rolling mean of the lag 12 over a window of size 12, which needs 25 samples. Since you're using h=6, step_size=12 and n_windows=2 the first training set removes 6 + 12 = 18 samples. So if you have 45 then the first training set has 45 - 18 = 27. If you're using the first difference that drops another one, so you end up with 26. Are you sure all your series have 45? From the errors seems like some end up with less samples
The other models don't throw errors because they can handle null values in the features, so they're still getting NaNs but they produce their predicions with that
f
hmm I see, that does make a lot of sense
but what about the warning when it says that it found NAs on the lag 1, lag 2 etc ?
Copy code
Found null values in lag1, lag2, lag3, lag6, lag12, expanding_mean_lag1,
what could be causing those errors?
yes, indeed all of my series have exactly 45 observations each
j
Can you provide a reproducible example? I just tried this and it runs fine:
Copy code
import numpy as np
from mlforecast import MLForecast
from mlforecast.target_transforms import GlobalSklearnTransformer, LocalStandardScaler, Differences
from mlforecast.utils import generate_series
from numba import njit
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import FunctionTransformer
from window_ops.expanding import expanding_mean
from window_ops.rolling import rolling_mean

@njit
def rolling_mean_12(x):
    return rolling_mean(x, window_size=12)

sk_log1p = FunctionTransformer(np.log1p, np.expm1)
series = generate_series(2, min_length=45, max_length=45)

lags = [1,2,3,6,12]
lag_transforms = {
    1: [expanding_mean, rolling_mean_12],
    2: [expanding_mean, rolling_mean_12],
    3: [expanding_mean, rolling_mean_12],
    6: [expanding_mean, rolling_mean_12],
    12: [expanding_mean, rolling_mean_12]
}
# Crear pipeline de pronosticos
ml = MLForecast(
    models=[LinearRegression()], #3 modelos de ML
    freq='D', #Month start
    lags=lags,
    lag_transforms=lag_transforms, #Lag transforms definidas anteriormente
    target_transforms=[GlobalSklearnTransformer(sk_log1p), LocalStandardScaler(), Differences([1])], #El orden de las
    #transformaciones importa (log, estandarizo y primera diferencia)
    num_threads=1,
    date_features=['month']
ml.cross_validation(df=series, h=6, step_size=12, n_windows=2)
)
f
yes