i recently get the following error when wrapping optuna arou Nixtla Community #mlforecast

i recently get the following error when wrapping o...

jan rathfelder

04/19/2024, 8:56 PM

i recently get the following error when wrapping optuna around nixtla using xgboost. it seems there is an issue with the matrix creation within xgboost and i wonder if anybody else has encountered this or knows what to do. because even if I add a try: before my ts cross-validation loop my code breaks: File "/usr/local/lib/python3.8/site-packages/mlforecast/core.py", line 762, in predict preds = self._predict_recursive( File "/usr/local/lib/python3.8/site-packages/mlforecast/core.py", line 613, in _predict_recursive predictions = model.predict(new_x) File "/usr/local/lib/python3.8/site-packages/xgboost/sklearn.py", line 897, in predict test = DMatrix( File "/usr/local/lib/python3.8/site-packages/xgboost/core.py", line 506, in inner_f return f(**kwargs) File "/usr/local/lib/python3.8/site-packages/xgboost/core.py", line 616, in init handle, feature_names, feature_types = dispatch_data_backend( File "/usr/local/lib/python3.8/site-packages/xgboost/data.py", line 763, in dispatch_data_backend return _from_numpy_array(data, missing, threads, feature_names, File "/usr/local/lib/python3.8/site-packages/xgboost/data.py", line 178, in _from_numpy_array _check_call( File "/usr/local/lib/python3.8/site-packages/xgboost/core.py", line 218, in _check_call raise XGBoostError(py_str(_LIB.XGBGetLastError()))

jan rathfelder

04/19/2024, 8:59 PM

and my hypothesis it is due to gblinear having conflicts with other params i tune. but anyways, any tips are welcomed

José Morales

04/19/2024, 9:04 PM

Hey. What's the error that you get? There should be some message from XGBoost in it

jan rathfelder

04/19/2024, 9:15 PM

"/usr/local/lib/python3.8/site-packages/mlforecast/core.py", line 762, in predict

preds = self._predict_recursive(

File "/usr/local/lib/python3.8/site-packages/mlforecast/core.py", line 613, in _predict_recursive

predictions = model.predict(new_x)

File "/usr/local/lib/python3.8/site-packages/xgboost/sklearn.py", line 897, in predict

test = DMatrix(

File "/usr/local/lib/python3.8/site-packages/xgboost/core.py", line 506, in inner_f

return f(**kwargs)

File "/usr/local/lib/python3.8/site-packages/xgboost/core.py", line 616, in __init__

handle, feature_names, feature_types = dispatch_data_backend(

File "/usr/local/lib/python3.8/site-packages/xgboost/data.py", line 763, in dispatch_data_backend

return _from_numpy_array(data, missing, threads, feature_names,

File "/usr/local/lib/python3.8/site-packages/xgboost/data.py", line 178, in _from_numpy_array

_check_call(

File "/usr/local/lib/python3.8/site-packages/xgboost/core.py", line 218, in _check_call

raise XGBoostError(py_str(_LIB.XGBGetLastError()))

José Morales

04/19/2024, 9:16 PM

there must be a string at the top with a description

José Morales

04/19/2024, 9:16 PM

this line:

raise XGBoostError(py_str(_LIB.XGBGetLastError()))

should show something like:

XGBoostError("hello")

jan rathfelder

04/19/2024, 9:18 PM

i also see this in aws logs:

Stack trace:

[bt] (0) /usr/local/lib/python3.8/site-packages/xgboost/lib/libxgboost.so(+0x1135b9) [0x7f0849a4e5b9]

[bt] (1) /usr/local/lib/python3.8/site-packages/xgboost/lib/libxgboost.so(+0x1340dd) [0x7f0849a6f0dd]

[bt] (2) /usr/local/lib/python3.8/site-packages/xgboost/lib/libxgboost.so(+0x155489) [0x7f0849a90489]

[bt] (3) /usr/local/lib/python3.8/site-packages/xgboost/lib/libxgboost.so(+0x125875) [0x7f0849a60875]

[bt] (4) /usr/local/lib/python3.8/site-packages/xgboost/lib/libxgboost.so(XGDMatrixCreateFromDense+0x24f) [0x7f08499dd0ef]

[bt] (5) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f09c12458ee]

[bt] (6) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x22f) [0x7f09c12452bf]

[bt] (7) /usr/local/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(+0xd702) [0x7f09be77d702]

[bt] (8) /usr/local/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so(+0x127d5) [0x7f09be7827d5]

jan rathfelder

04/19/2024, 9:19 PM

and this also, but the aws logs are a bit shitty: `xgboost.core.XGBoostError: [203441] ../src/data/data.cc981 Check failed: valid: Input data contains

inf

or `nan``

José Morales

04/19/2024, 9:19 PM

ah yes, that's the xgboost error

jan rathfelder

04/19/2024, 9:22 PM

this has never happened to me running tons of models in my notebook using mlforecast 0.12.0 but in production we for some reason used the 0.12.1 version and i wonder if it is related to this. also i just ran the tuning twice with mlforecast 0.12.0 and the error did not happen, but this might have been just two lucky runs. the problem is that we really have this (or want to put it) in production and the run can't fail 🙂

José Morales

04/19/2024, 9:23 PM

it seems that

booster='gblinear'

doesn't support Inf in the input array and raises that exact error

José Morales

04/19/2024, 9:24 PM

there was a bug in coreforecast that was fixed recently. Are you using target transformations?

jan rathfelder

04/19/2024, 9:26 PM

i am testing differencing

José Morales

04/19/2024, 9:27 PM

Can you check if you have coreforecast>=0.0.8? that can be the source of the error

jan rathfelder

04/19/2024, 9:27 PM

but also i have a try around my ts-cv loop. also i sometimes test very long window aggregations that would result in inf/nan values for some customers, but the try avoids the loop from breaking. in the case above it still breakes, this is so weird

José Morales

04/19/2024, 9:30 PM

I was able to reproduce the exception with this:

Copy code

import xgboost as xgb
from sklearn.datasets import make_regression

X, y = make_regression(1_000, n_features=4)
bst = xgb.XGBRegressor(booster='gblinear').fit(X, y)
X2 = X[:5].copy()
X2[0, 0] = float('inf')
try:
    bst.predict(X2)
except Exception as e:
    print('in except')
    print(e)

which is what is happening within mlforecast, but the try/except catches it correctly

jan rathfelder

04/19/2024, 9:32 PM

_def_ objective(_self_, _trial_: optuna.trial.Trial) -> float:

"""

Objective function for hyperparameter tuning with Optuna.

Parameters:

- trial (Trial): A single trial of the hyperparameter tuning process.

Returns:

- float: The mean squared error of the validation set, which Optuna will attempt to minimize.

"""

# validation_length = self.validation_fc_horizon * self.validation_steps

all_time_index = _self_.df_train.ds.unique()

max_number_of_validation_steps = np.floor(

len(all_time_index) / _self_.validation_fc_horizon

validation_steps_final = min(

_self_.validation_steps, int(max_number_of_validation_steps) + 1

print('max number of validation steps:', max_number_of_validation_steps)

# create the regressor object

lags = _trial_.suggest_int("lags", 2, 15, _step_=1)

seasonal_rolling = _trial_.suggest_int("seasonal_rolling", 2, 50, _step_=1)

seasonal_rolling_month = _trial_.suggest_int('seasonal_rolling_month', 1, 12, _step_=1)

rolling_std = _trial_.suggest_int("rolling_std", 7, 112, _step_=7)

rolling_mean_short_term_window = _trial_.suggest_int(

"rolling_mean_short_term_window", 7, 28, _step_=7

rolling_mean_mid_term_window = _trial_.suggest_int(

"rolling_mean_mid_term_window", 28, 70, _step_=7

rolling_mean_long_term_window = _trial_.suggest_int(

"rolling_mean_long_term_window", 70, 112, _step_=7

differencing_order = _trial_.suggest_int("differencing_order", 0, 1, _step_=1)

alpha_weighted_mean = _trial_.suggest_uniform("alpha_weighted_mean", 0.1, 0.9)

#apply_boxcox = trial.suggest_categorical(

#    "apply_boxcox", [True, False]

#)  # Suggest whether to apply Box-Cox transformation

params = {

"verbosity": 0,

"objective": "reg:squarederror",

"booster": _trial_.suggest_categorical(

"booster", ["gbtree", 'gblinear']

),  # 'dart' an additional option 'gblinear',

"lambda": _trial_.suggest_loguniform("lambda", 1e-3, 10.0),

"alpha": _trial_.suggest_loguniform("alpha", 1e-3, 10.0),

"subsample": _trial_.suggest_uniform("subsample", 0.5, 1.0),

"colsample_bytree": _trial_.suggest_uniform("colsample_bytree", 0.5, 1.0),

"learning_rate": _trial_.suggest_uniform("learning_rate", 0.001, 0.3),

"n_estimators": _trial_.suggest_int("n_estimators", 100, 5000),

"max_depth": _trial_.suggest_int("max_depth", 3, 50),

"min_child_weight": _trial_.suggest_int("min_child_weight", 2, 30),

"gamma": _trial_.suggest_uniform("gamma", 0, 0.8),

"grow_policy": _trial_.suggest_categorical(

"grow_policy", ["depthwise", "lossguide"]

),

# Determine transformations based on trial suggestion

#if apply_boxcox:

# Box-Cox requires strictly positive data, ensure this or shift data accordingly

#    target_transforms = [boxcox_global, Differences([differencing_order])]

#else:

# Use your existing setup or no transformation

#    target_transforms = [Differences([differencing_order])]

target_transforms = [Differences([differencing_order])]

regressor = XGBRegressor(**params)

scores = []

executed_validation_steps = 0

# adapt validation_fc horizon for short series:

train_max_date = _self_.df_train.ds.max() - timedelta(

_days_=_self_.validation_fc_horizon

delta_days = (train_max_date - _self_.df_train.ds.min()).days

if delta_days < 250:

day_correction = 250 - delta_days

print("change validation fc horizon")

_self_.validation_fc_horizon = (

_self_.validation_fc_horizon - day_correction

) - 1

print(_self_.validation_fc_horizon)

for t in tqdm(

range(0, validation_steps_final), _desc_="Validation time-windows loop"

):

try:

df_validation = _self_.df_train[

_self_.df_train.ds

<= _self_.df_train.ds.max()

- timedelta(_days_=_self_.validation_fc_horizon) * t

& (

_self_.df_train.ds

> _self_.df_train.ds.max()

- timedelta(_days_=_self_.validation_fc_horizon) * (t + 1)

].copy()

X_validation = df_validation[_self_.features_for_validation].copy()

# Define the training set to include data up to the start of the validation window

df_train_temp = _self_.df_train[

_self_.df_train.ds <= df_validation.ds.min() - timedelta(_days_=1)

].copy()

# Check if the training period is at least as long as the validation period

if len(df_train_temp.ds.unique()) < _self_.validation_fc_horizon:

print(_f_"Skipping validation step {t+1} due to short training data.")

continue  # Skip this validation step

executed_validation_steps += 1  # Increment counter for executed steps

model = MLForecast(

_models_=regressor,

_freq_="D",

_lags_=[7 * (i + 1) for i in range(lags)],  # + [363, 364, 365],

_date_features_=[

"year",

"month",

"dayofweek",

"quarter",

"week",

"dayofyear",

"is_leap_year",

"is_year_end",

"is_month_end",

"is_month_start",

],

_lag_transforms_={

1: [

ExponentiallyWeightedMean(_alpha_=alpha_weighted_mean)

],  # noqa: F601

1: [ExpandingStd()],  # noqa: F601

1: [ExpandingMin()],  # noqa: F601

1: [ExpandingMax()],  # noqa: F601

7: [RollingStd(_window_size_=rolling_std)],  # noqa: F601

7: [SeasonalRollingMean(7, seasonal_rolling)],  # noqa: F601

7: [SeasonalRollingStd(7, seasonal_rolling)],  # noqa: F601

30: [SeasonalRollingMean(30, seasonal_rolling_month)],  # noqa: F601

30: [SeasonalRollingStd(30, seasonal_rolling_month)],  # noqa: F601

7: [RollingMean(rolling_mean_short_term_window)],  # noqa: F601

28: [RollingMean(rolling_mean_mid_term_window)],  # noqa: F601

84: [RollingMean(rolling_mean_long_term_window)],  # noqa: F601

},

_target_transforms_=target_transforms,  # ,

# fit model

model.fit(

df_train_temp, _static_features_=_self_.static_features, _as_numpy_=True

# print(model.ts.features_order_)

# predict model

p = model.predict(_h_=_self_.validation_fc_horizon, _X_df_=X_validation)

p = p.merge(df_validation, _on_=["unique_id", "ds"], _how_="left")

p = p.fillna(0)

score = mean_squared_error(p.y, p["XGBRegressor"])

scores.append(score)

except Exception as e:

print(_f_"An error occurred in validation step {t+1}: {e}")

continue

# Compute the average score over all time periods

average_score = np.mean(scores)

return average_score  # Optuna aims to minimize this value

jan rathfelder

04/19/2024, 9:32 PM

not sure this is readable...

jan rathfelder

04/19/2024, 9:33 PM

i have my try here: for t in tqdm( range(0, validation_steps_final), desc="Validation time-windows loop" ): try: df_validation = self.df_train[ ( self.df_train.ds <= self.df_train.ds.max() - timedelta(days=self.validation_fc_horizon) * t ) & ( self.df_train.ds > self.df_train.ds.max() - timedelta(days=self.validation_fc_horizon) * (t + 1) ) ].copy() X_validation = df_validation[self.features_for_validation].copy() # Define the training set to include data up to the start of the validation window df_train_temp = self.df_train[ self.df_train.ds <= df_validation.ds.min() - timedelta(days=1) ].copy() # Check if the training period is at least as long as the validation period if len(df_train_temp.ds.unique()) < self.validation_fc_horizon: print(_f_"Skipping validation step {t+1} due to short training data.") continue # Skip this validation step executed_validation_steps += 1 # Increment counter for executed steps model = MLForecast( models=regressor, freq="D", lags=[7 * (i + 1) for i in range(lags)], # + [363, 364, 365], _date_features_=[ "year", "month", "dayofweek", "quarter", "week", "dayofyear", "is_leap_year", "is_year_end", "is_month_end", "is_month_start", ], _lag_transforms_={ 1: [ ExponentiallyWeightedMean(alpha=alpha_weighted_mean) ], # noqa: F601 1: [ExpandingStd()], # noqa: F601 1: [ExpandingMin()], # noqa: F601 1: [ExpandingMax()], # noqa: F601 7: [RollingStd(_window_size_=rolling_std)], # noqa: F601 7: [SeasonalRollingMean(7, seasonal_rolling)], # noqa: F601 7: [SeasonalRollingStd(7, seasonal_rolling)], # noqa: F601 30: [SeasonalRollingMean(30, seasonal_rolling_month)], # noqa: F601 30: [SeasonalRollingStd(30, seasonal_rolling_month)], # noqa: F601 7: [RollingMean(rolling_mean_short_term_window)], # noqa: F601 28: [RollingMean(rolling_mean_mid_term_window)], # noqa: F601 84: [RollingMean(rolling_mean_long_term_window)], # noqa: F601 }, _target_transforms_=target_transforms, # , ) # fit model model.fit( df_train_temp, _static_features_=self.static_features, _as_numpy_=True ) # print(model.ts.features_order_) # predict model p = model.predict(h=self.validation_fc_horizon, _X_df_=X_validation) p = p.merge(df_validation, on=["unique_id", "ds"], how="left") p = p.fillna(0) score = mean_squared_error(p.y, p["XGBRegressor"]) scores.append(score) except Exception as e: print(_f_"An error

jan rathfelder

04/19/2024, 9:38 PM

so i am checking which coreforecast version is being installed. problem is that in our req.txt we dont specify a version, so i am not sure which is installed, also we use a hack and run a req.txt on-top of our req.txt inside the docker image... dont ask... 🙂

José Morales

04/19/2024, 9:38 PM

haha well 0.0.8 it's the latest, so maybe it is already pulling that one

jan rathfelder

04/19/2024, 9:38 PM

but cool! thanks a lot for your help so far, i think i can really fix this based on your input

jan rathfelder

04/19/2024, 9:39 PM

ah and using an earlier version might help? this is why i did not see that error when using mlforecast 0.12.0 maybe? i guess coreforecast is installed inside mlforecast, right?

José Morales

04/19/2024, 9:39 PM

starting from 0.12.0 it became the default "engine", so the bug would've been on that version as well

jan rathfelder

04/19/2024, 9:40 PM

ah, ok 🙂 haha

José Morales

04/19/2024, 9:41 PM

although now that I remember it, it only affected the fitted values, so it shouldn't affect a regular fit+predict

jan rathfelder

04/19/2024, 9:41 PM

but this is all i need to know. but then again it is si weird that sometimes it happens during tuning and sometimes not... 🙂

jan rathfelder

04/19/2024, 9:41 PM

mhhh

José Morales

04/19/2024, 9:43 PM

it's weird that the try/except doesn't catch the exception though, there may be some wrong indentation or similar in there

jan rathfelder

04/19/2024, 9:45 PM

you criticising my code? 😉 hehe, good point. i can check this again. but i know that it captures other cases which created a failure and this never happens now again. but i will check that again, just to be sure

José Morales

04/19/2024, 9:46 PM

haha it looks ok here, I'm just saying maybe there's an unwanted tab or something in the file that's running

jan rathfelder

04/19/2024, 9:55 PM

so it is coreforecast 0.0.8 as you said

jan rathfelder

04/19/2024, 9:56 PM

i guess i will continue here over the next days. btw, just ran 3 times using mlforecast 0.12.0 and no error, but again, this could be just luck

José Morales

04/19/2024, 9:57 PM

Here are the release notes for 0.12.1. I don't think any of that could be impacting your workflow

jan rathfelder

04/19/2024, 9:59 PM

i also thought that. but i thought maybe they are importing some newer xgboost stuff or so. but i agree, just reading the release notes doesnt look like it could have an impact

jan rathfelder

04/20/2024, 4:35 PM

@José Morales, so i tested so much now and i still cant figure out why that error comes up. i thought it might be because i use transformations with windows/lags longer than my series, not nothing like that. also i am using coreforecast 0.0.8, so you said this would fix the error, right? but i still get it (in all my last experiments the try/expect caught it, but so many runs become useless due to that error). i am about to save all the good and bad runs with all params and train a classification model on that to understand where the bug is coming from... 🙂

jan rathfelder

04/20/2024, 4:41 PM

and i also get with booster tree.

jan rathfelder

04/20/2024, 4:50 PM

there should not be inf/nan, but there is. i think what i will do next, is also return all transformations and check for inf/nan manually. and ma hypothesis is that there could be somethign wrong with the transformations, or there is something happening, which i am not aware of, which could also be ofcourse

jan rathfelder

04/20/2024, 10:49 PM

so this caused everything, i also have a seasonalrolling from day 7 on, but that works. but the moment i add this, the inf error occurs (here the seasonal_rolling_month can take on values between 1 to 3)

30: [SeasonalRollingMean(30, seasonal_rolling_month)],  # noqa: F601

30: [SeasonalRollingStd(30, seasonal_rolling_month)],  # noqa: F601

José Morales

04/22/2024, 3:07 PM

For seasonal_rolling_month=1 that transformation requires 60 samples and for 3 it requires 120, so that'll produce a lof ot NaNs. You can try setting

min_samples=1

which would make it require only 31 (because of the lag30)

jan rathfelder

04/23/2024, 11:38 AM

ah, i did not know! thanks

2 Views

Open in Slack

Previous Next