https://github.com/nixtla logo
#mlforecast
Title
# mlforecast
g

Guillaume GALIE

08/03/2023, 1:27 PM
Hello Is there any specific reason why we can't reproduce the same prediction only if we change lag list order ?
👀 2
j

José Morales

08/04/2023, 4:08 PM
Hi. The features are created in the order you pass them, so this difference is only due to the order of the features. You can verify it with something like the following:
Copy code
df1 = fcst.preprocess(df)
df2 = fcst2.preprocess(df)

# make sure they features are the same except for the order
pd.testing.assert_frame_equal(df1, df2[df1.columns])

X_drop_cols = ['unique_id', 'ds', 'y']
coefs1 = LinearRegression().fit(df1.drop(columns=X_drop_cols), df1['y']).coef_
coefs2 = LinearRegression().fit(df2.drop(columns=X_drop_cols), df2['y']).coef_

# the following fails
np.testing.assert_allclose(
    np.sort(coefs1),
    np.sort(coefs2),
)
Please let us know if this helps
g

Guillaume GALIE

08/04/2023, 4:37 PM
Hello José Thank you but it's not crystal clear. For LinearRegression I agree it gives same prediction so it's ok but why for xgboost and random forest it's not the case? my input dataframe is the same and the features also => only order of the features columns is different
j

José Morales

08/04/2023, 4:48 PM
LinearRegression doesn't give the same prediction either, because it learns slightly different coefficients. For random forest the default for max features is sqrt, so when computing the subset of features to use for a split it probably ends up with different ones. For xgboost it's probably something similar
g

Guillaume GALIE

08/07/2023, 7:11 AM
ok thank you