Question Scikit learn s RandomForrestRegressor and some othe Nixtla Community #general

Question: Scikit-learn's RandomForrestRegressor an...

rahul bahadur

09/09/2024, 2:33 AM

Question: Scikit-learn's RandomForrestRegressor and some others support

continuous-multioutput

predictions - essentially, allowing one to predict something of the shape (n_samples, n_outputs). Is there a possibility that this can be used to predict multi-horizon forecasts instead of the current recursive and multi-models (one for each horizon) strategy? @Nixtla Team

José Morales

09/09/2024, 3:31 PM

Is this question about a specific library?

rahul bahadur

09/09/2024, 3:39 PM

Yes, if we want to use scikit-learn RandomForrestRegressor , for example, in the model

Copy code

MLForecast(models=[RandomForrestRegressor], freq=D, lag_features=['lag7', 'lag14', 'expanding_mean_lag1', 'rolling_mean_lag7_window_size28'], date_features=['dayofweek'], num_threads=1)

## fit the model
fcst.fit(series)

And then use this fitted model to generate predictions

Copy code

predictions = fcst.predict(14)
predictions

RandomForrestRegressor offers a way to output an array instead of a single 'Y' value for each observation. I was wondering if that could be used instead of the current 2 alternatives of recursive and multiple models to generate forecasts. That way, a single model can generate outputs for the forecast horizon and there would be no compounding of errors. Let me know if that makes sense

José Morales

09/09/2024, 3:40 PM

scikit-learn's multi output regressor internally fits one estimator to each of the targets, which is what we do with the max_horizon setting, so there would be no difference

José Morales

09/09/2024, 3:42 PM

Here's the relevant part in their docs:

This strategy consists of fitting one regressor per target

rahul bahadur

09/09/2024, 4:02 PM

Thanks for sharing this. My understanding is that RandomForrestRegressor and some others natively support multioutput regressions. This is not the same as using the multioutput regressor that indeed fits a separate model for each output. There is, in fact, a comparison between the 2 approaches where there is 1 RandomForrestRegressor predicting multioutput and another RandomForrestRegressor that is wrapped inside multioutput.MultiOutputRegressor(). Both lead to different results Link here: https://scikit-learn.org/stable/auto_examples/ensemble/plot_random_forest_regression_multioutput.html

José Morales

09/09/2024, 5:03 PM

That's a weird example to use, since random forest has native "randomness" and it seems like the multi output and native approaches only differ because of that. The only reason to use the native implementation seems to be due to speed only. Running that example in jupyterlite using Ridge instead I get the same results with both approaches

rahul bahadur

09/09/2024, 5:04 PM

There is also this paper that talks more about multioutput regression trees https://arxiv.org/pdf/2201.05340

rahul bahadur

09/09/2024, 5:04 PM

But, I will investigate it further

José Morales

09/09/2024, 5:07 PM

Can you try to use it and see if it improves your forecasting error? You can use preprocess to generate the targets for the multi horizon, manually train the model with that, use an approach like here to generate the features for the next step and compare with the approach using

max_horizon

```with fcst.ts._maybe_subset(None), fcst.ts._backup():

fcst.ts._predict_setup()

next_feats = fcst.ts._get_features_for_next_step()```

I'm ok with implementing it, I just want to see if it really produces different (better) results

rahul bahadur

09/09/2024, 5:10 PM

will try it out

2 Views

Open in Slack

Previous Next