This message was deleted Nixtla Community #general

Join Slack

This message was deleted.

# general

Slackbot

07/20/2023, 8:22 PM

This message was deleted.

José Morales

07/20/2023, 8:27 PM

Hi. We currently rely on pickle (more info here), please let us know if this doesn't suit your needs

Wen Yao

07/20/2023, 8:29 PM

thanks! this looks fine. Do you have any experience deploying such models in sagemaer endpoint?

José Morales

07/21/2023, 12:24 AM

Sorry, I haven't done it in a long time. Maybe @fede (nixtla) (they/them) has a more recent experience

Wen Yao

07/21/2023, 7:27 PM

thanks! @fede (nixtla) (they/them) let me know if you have done that before.

Wen Yao

07/25/2023, 5:28 PM

wow just realize this pickle file is more than 5G

Wen Yao

07/25/2023, 5:56 PM

is it possible to just save the lightgbm (~10mb) and later initialize the MLForecast object with the lightgbm model that’s been fitted?

José Morales

07/26/2023, 12:05 AM

I think the features_ attribute may be too heavy. I'm planning on removing it soon but that may help, can you try deleting it before saving to see if it reduces the size? e.g.

del fcst.features_

and then pickling

👀 1

Wen Yao

07/26/2023, 6:13 PM

@João Matias thanks! this is super helpful. With

del(fcst.ts.features_)

, the model size reduces from ~5G to 250MB. I’ll check if loading still works fine. Any possibility its size can be further reduced?

José Morales

07/26/2023, 6:20 PM

Nice! There are a couple more attributes which sizes could be reduced. Which lag features are you using?

Wen Yao

07/26/2023, 6:21 PM

Copy code

lag_features=['lag1', 'lag2', 'lag3', 'lag4', 'lag5', 'lag6', 'lag7', 'lag8', 'lag9', 'lag10', 'lag11', 'lag12', 'lag13', 'lag14', 'lag15', 'lag16', 'lag17', 'lag18', 'lag19', 'lag20', 'lag21', 'lag22', 'lag23', 'lag24', 'lag25', 'lag26', 'lag27', 'lag28', 'lag29', 'lag30', 'lag31', 'lag32', 'lag33', 'lag34', 'lag35', 'lag36', 'lag37', 'lag38', 'lag39', 'lag40', 'lag41', 'lag42', 'lag43', 'lag44', 'lag45', 'lag46', 'lag47', 'lag48', 'lag49', 'lag50', 'lag51', 'lag52', 'lag53', 'lag54', 'lag55', 'lag56', 'lag57', 'lag58', 'lag59', 'lag60']

Wen Yao

07/26/2023, 6:21 PM

I also have a bunch of

date_features

with custom functions

José Morales

07/26/2023, 6:24 PM

In that case you can keep only the last 60 values of each serie (right now we store all the history). I think something like:

<http://fcst.ts.ga|fcst.ts.ga> = fcst.ts.ga.take_from_groups(slice(-60,None))

José Morales

07/26/2023, 6:24 PM

And the same for fcst.ts._ga would reduce the size further

Wen Yao

07/26/2023, 6:24 PM

cool, thanks! will try and update here 🙂

👍 1

Wen Yao

07/26/2023, 8:02 PM

Thanks it helps! It reduces to 170~mb but still not comparable to native lightgbm or xgboost model, which is only ~10mb.

Wen Yao

07/26/2023, 8:03 PM

I guess there’s still data in the forecast object being serialized, in addition to the native model artifact and featurization functions?

José Morales

07/26/2023, 8:39 PM

Hmm. The other things I can think about are the models themselves. Can you remove them just to check?

del fcst.models_

José Morales

07/26/2023, 8:40 PM

Oh before you do that you can serialize them separately, e.g.

fcst.models_['LGMBRegressor'].booster_.save_model('model.txt')

José Morales

07/26/2023, 8:41 PM

You'd then have to restore it at load time but that may help with the siE

José Morales

07/26/2023, 8:41 PM

Size*

Wen Yao

07/26/2023, 8:41 PM

yeah,

fcst.models_['LGMBRegressor'].booster_.save_model('model.txt')

is very small.

Wen Yao

07/26/2023, 8:42 PM

if I go with this path, what else needs to be saved separately in order to restore the forecast object? I would like to have the benefit of getting the preprocessing capability from MLForecast

José Morales

07/26/2023, 8:44 PM

For the preprocess you only need the feature functions which are still there, so it should work. The part where you'd need the models is the predict

Wen Yao

07/26/2023, 8:45 PM

right, but

predict

needs to be aware of the feature processing in order to make forecast.

José Morales

07/26/2023, 8:52 PM

Yes, that's all saved in attributes of the ts object. I can take a look later at the other possibly heavy attributes that we may not need

José Morales

07/26/2023, 8:52 PM

Please share the size without the models if possible

Wen Yao

07/26/2023, 8:53 PM

you mean the size with just the model by doing this

fcst.models_['LGMBRegressor'].booster_.save_model('model.txt')

José Morales

07/26/2023, 8:56 PM

I mean saving it as you are doing, deleting the models_ attribute from the forecast object and then serializing it

José Morales

07/26/2023, 8:58 PM

In order for it to work when loading you'd have to load the booster object and save it back to the models attribute. Like

Copy code

bst = lgb.Booster(model_file='model.txt')
fcst.models_ = {'LGMBRegressor': bst}

Wen Yao

07/26/2023, 8:59 PM

175MB without model

Wen Yao

07/26/2023, 9:00 PM

I think model is pretty small. It’s the other attributes (probably data) that need to be deleted. Appreciate your help and spending time looking into this!

José Morales

07/26/2023, 9:02 PM

Damn haha. Thank you for raising this, it's important for the serialized forecast object to be as small as possible and we haven't focused on that

José Morales

07/26/2023, 9:05 PM

I think it's probably the static features (a dataframe), but we need that to predict. Can you check the shape?

fcst.ts.static_features_

Wen Yao

07/26/2023, 10:44 PM

len(fcst.ts.features)

is 89

Wen Yao

07/26/2023, 10:45 PM

fcst.ts.static_features.shape

-> 119299

Wen Yao

07/26/2023, 10:46 PM

unique identifier is static_feature right? I think serializing this data is also pretty heavy but we kind of need it if forecasting for the same id, instead of new data.

José Morales

07/26/2023, 11:50 PM

How many columns does it have? The unique ID can be a static feature if you defined it as one, otherwise it's just there for merging the predictions. If that df is big you could try saving it as parquet, deleting the attribute and restoring it on load.

Wen Yao

07/28/2023, 10:00 PM

are you talking about the original columns, or after featurization?

Wen Yao

07/28/2023, 10:01 PM

the unique ID is just an identifier. In that case, it doesn’t make sense to make it a static feature, right?

José Morales

08/01/2023, 12:59 AM

I meant the

fcst.ts.static_features_

attribute. If you have an older version it may be

ts.ts.static_features

. That attribute holds the values for the static features for each serie (if you don't have any it stores only the unique ID). The unique id can be used as a static feature if the model can use it, for example LightGBM

13 Views

Open in Slack

Previous Next