This message was deleted Nixtla Community #mlforecast

Join Slack

This message was deleted.

# mlforecast

Slackbot

08/03/2023, 1:27 PM

This message was deleted.

👀 2

José Morales

08/04/2023, 4:08 PM

Hi. The features are created in the order you pass them, so this difference is only due to the order of the features. You can verify it with something like the following:

Copy code

df1 = fcst.preprocess(df)
df2 = fcst2.preprocess(df)

# make sure they features are the same except for the order
pd.testing.assert_frame_equal(df1, df2[df1.columns])

X_drop_cols = ['unique_id', 'ds', 'y']
coefs1 = LinearRegression().fit(df1.drop(columns=X_drop_cols), df1['y']).coef_
coefs2 = LinearRegression().fit(df2.drop(columns=X_drop_cols), df2['y']).coef_

# the following fails
np.testing.assert_allclose(
    np.sort(coefs1),
    np.sort(coefs2),
)

Please let us know if this helps

Guillaume GALIE

08/04/2023, 4:37 PM

Hello José Thank you but it's not crystal clear. For LinearRegression I agree it gives same prediction so it's ok but why for xgboost and random forest it's not the case? my input dataframe is the same and the features also => only order of the features columns is different

José Morales

08/04/2023, 4:48 PM

LinearRegression doesn't give the same prediction either, because it learns slightly different coefficients. For random forest the default for max features is sqrt, so when computing the subset of features to use for a split it probably ends up with different ones. For xgboost it's probably something similar

Guillaume GALIE

08/07/2023, 7:11 AM

ok thank you

Open in Slack

Previous Next