Hey. The conformity scores are computed using CV, so they would be wrong if they were computed on different subsets of the data. Is the preprocess step taking too long?
e
Evan Miller
02/14/2024, 10:01 PM
No I mean I want to go through the entire process of predicting, generating probabilistic forecasts etc, on the same data set, using different sample, ala the winning solution in M5 (which made forecasts by store, store-cat, and store-dept and then averaged those). I would like to only do the preprocessing once and then use the same processed data set on those divisions
j
José Morales
02/14/2024, 10:06 PM
I think you're better off just calling fit + predict on each subsample, otherwise the scores for the intervals may not be accurate
👍 1
e
Evan Miller
02/15/2024, 7:44 PM
I realized that the other reason to do preprocessing first is that I want to sample the data to reduce the number of rows and speed up training time. If I sample before preprocessing, then the rows will be missing and features will not be calculated correctly. If I preprocess first, then I can sample and just run fit_models and all the features will be calculated correctly. Is there a way to tell MLForecast.fit or the regressor to sample a certain max or percentage of the row count?
j
José Morales
02/15/2024, 8:00 PM
I think you could use a pipeline for that, where you define an estimator that samples in the data in fit_transform and in transform just passes through