This message was deleted.
# general
s
This message was deleted.
j
Hey, thanks for using mlforecast. 1. Many things are saved from the fit step because the initial use case was forecasting the same series (the new_df argument was added later). If you're not going to forecast the same series you can safely delete some attributes, off the top of my head you can probably delete all of the following attributes from
MLForecast.ts
before saving:
ga, _ga, static_features_, uids, last_dates, restore_idxs
. The fitted models are saved in the
MLForecast.models_
attribute, so something like
MLForecast.models_['LGBMRegressor'].booster_
would give you the trained booster object. 2. The assumption is that the training series are complete, so if you have a gap the lags, etc will be wrong. You may find the fill_gaps function useful, it will produce the full panel and you can then fill with any method you want. 3. All forecasts start from the respective ends of each serie, because the assumption is that those are the latest values you have and you want to forecast ahead. If you've seen new values of a serie you can use the update method as described here. 4. Not yet, this is something we have on our roadmap but at the moment you have to do it manually. This issue has a good way of doing it right now. 5. There aren't any hidden processings, we prefer not to give surprises, so you have to be explicit on what you want to happen. If you want to perform scaling for example you can use target_transforms (guide). 6. The column identifiers are mainly used to perform the feature engineering. The id and dates aren't passed to LightGBM, but you can pass the id if you specify it in the static_features argument and if your dates are integers you can specify a date_feature that is just the identity. If you're wondering in which order the features are going to be passed you can access the
MLForecast.ts.features_order_
attribute after calling fit. You can also perform just the feature engineering with preprocess, then manually training the model in any way you like and then assigning it to the
MLForecast.models_
attribute or using the
MLForecast.ts.predict
method like here. 7. They serve the same purpose. The dynamic_dfs argument is legacy and will be removed soon, it was meant to be used for saving memory by not having repeated values in a single dataframe but the X_df is easier to reason about and makes the predict step faster.
e
Hey thanks so much for the prompt reply! To clarify on question 2, the lags will be calculated based off nearest previous timestamps? So I will get [ds, y, lag_1; 1, 4, NaN; 2, 5, 4; 4,6,5] Is that right? Also, will the series be ordered (by unqiue_id, ds) if it isn't initially?
j
That's right, the last lag will be wrong. Yes, they're ordered by id and date first, although the preprocess returns the dataframe in the same order you passed it (the ordering is just done internally)
e
ok last question (and perhaps not directly Nixtla related but I can't find it elsewhere), it seems like referring to the columns by name will be safer then trying to do it by number, and I'm struggling to figure out how to do that in the python. According to the LightGBM documentation: *add a prefix
name:
for column name, e.g.
categorical_feature=name:c1,c2,c3
means c1, c2 and c3 are categorical features How does this look in python? Should I have something like: lgb_params = { 'categorical_feature': 'names: "col1", "col2"' }
j
No worries. That documentation refers to LightGBM in general, for python you should go to the Python API section. For LGBMRegressor they should be a list of ints (with indices) or list of str (feature names). So you should use something like:
categorical_feature=['col1', 'col2']
Btw if it's
'auto'
(the default) it will use the data types of the columns and if they're categorical it will set them automatically