This message was deleted Nixtla Community #general

Join Slack

This message was deleted.

# general

Slackbot

08/30/2023, 6:05 PM

This message was deleted.

José Morales

08/30/2023, 6:33 PM

Hey, thanks for using mlforecast. 1. Many things are saved from the fit step because the initial use case was forecasting the same series (the new_df argument was added later). If you're not going to forecast the same series you can safely delete some attributes, off the top of my head you can probably delete all of the following attributes from

MLForecast.ts

before saving:

ga, _ga, static_features_, uids, last_dates, restore_idxs

. The fitted models are saved in the

MLForecast.models_

attribute, so something like

MLForecast.models_['LGBMRegressor'].booster_

would give you the trained booster object. 2. The assumption is that the training series are complete, so if you have a gap the lags, etc will be wrong. You may find the fill_gaps function useful, it will produce the full panel and you can then fill with any method you want. 3. All forecasts start from the respective ends of each serie, because the assumption is that those are the latest values you have and you want to forecast ahead. If you've seen new values of a serie you can use the update method as described here. 4. Not yet, this is something we have on our roadmap but at the moment you have to do it manually. This issue has a good way of doing it right now. 5. There aren't any hidden processings, we prefer not to give surprises, so you have to be explicit on what you want to happen. If you want to perform scaling for example you can use target_transforms (guide). 6. The column identifiers are mainly used to perform the feature engineering. The id and dates aren't passed to LightGBM, but you can pass the id if you specify it in the static_features argument and if your dates are integers you can specify a date_feature that is just the identity. If you're wondering in which order the features are going to be passed you can access the

MLForecast.ts.features_order_

attribute after calling fit. You can also perform just the feature engineering with preprocess, then manually training the model in any way you like and then assigning it to the

MLForecast.models_

attribute or using the

MLForecast.ts.predict

method like here. 7. They serve the same purpose. The dynamic_dfs argument is legacy and will be removed soon, it was meant to be used for saving memory by not having repeated values in a single dataframe but the X_df is easier to reason about and makes the predict step faster.

Evan Miller

08/30/2023, 6:44 PM

Hey thanks so much for the prompt reply! To clarify on question 2, the lags will be calculated based off nearest previous timestamps? So I will get [ds, y, lag_1; 1, 4, NaN; 2, 5, 4; 4,6,5] Is that right? Also, will the series be ordered (by unqiue_id, ds) if it isn't initially?

José Morales

08/30/2023, 6:46 PM

That's right, the last lag will be wrong. Yes, they're ordered by id and date first, although the preprocess returns the dataframe in the same order you passed it (the ordering is just done internally)

Evan Miller

08/30/2023, 7:00 PM

ok last question (and perhaps not directly Nixtla related but I can't find it elsewhere), it seems like referring to the columns by name will be safer then trying to do it by number, and I'm struggling to figure out how to do that in the python. According to the LightGBM documentation: *add a prefix

name:

for column name, e.g.

categorical_feature=name:c1,c2,c3

means c1, c2 and c3 are categorical features How does this look in python? Should I have something like: lgb_params = { 'categorical_feature': 'names: "col1", "col2"' }

José Morales

08/30/2023, 7:05 PM

No worries. That documentation refers to LightGBM in general, for python you should go to the Python API section. For LGBMRegressor they should be a list of ints (with indices) or list of str (feature names). So you should use something like:

categorical_feature=['col1', 'col2']

José Morales

08/30/2023, 7:06 PM

Btw if it's

'auto'

(the default) it will use the data types of the columns and if they're categorical it will set them automatically

19 Views

Open in Slack

Previous Next