Hello Is there any specific reason why we can't r...
# mlforecast
g
Hello Is there any specific reason why we can't reproduce the same prediction only if we change lag list order ?
👀 2
j
Hi. The features are created in the order you pass them, so this difference is only due to the order of the features. You can verify it with something like the following:
Copy code
df1 = fcst.preprocess(df)
df2 = fcst2.preprocess(df)

# make sure they features are the same except for the order
pd.testing.assert_frame_equal(df1, df2[df1.columns])

X_drop_cols = ['unique_id', 'ds', 'y']
coefs1 = LinearRegression().fit(df1.drop(columns=X_drop_cols), df1['y']).coef_
coefs2 = LinearRegression().fit(df2.drop(columns=X_drop_cols), df2['y']).coef_

# the following fails
np.testing.assert_allclose(
    np.sort(coefs1),
    np.sort(coefs2),
)
Please let us know if this helps
g
Hello José Thank you but it's not crystal clear. For LinearRegression I agree it gives same prediction so it's ok but why for xgboost and random forest it's not the case? my input dataframe is the same and the features also => only order of the features columns is different
j
LinearRegression doesn't give the same prediction either, because it learns slightly different coefficients. For random forest the default for max features is sqrt, so when computing the subset of features to use for a split it probably ends up with different ones. For xgboost it's probably something similar
g
ok thank you