another quick question regarding featurization, if I use
, is it treated as numerical value or does the model know it’s categorical?
It's treated as numerical by default, but you can use MLForecast.preprocess to get the transformed data, cast to categorical and fit the models. You'd need a before predict callback to set the categories when predicting though. Let me know if you'd like a full example to show how
got it. yeah if you have an example, that will be great! I kind of encoded this categorical feature instead of using the callback
Another possibly easier way would be by wrapping your model in a scikit learn pipeline and define a transformer that transforms the columns as categories, the fit step would save the mappings and the transform would apply them. That way they will be automatically applied in the predict step
Here's the example using a pipeline:
from lightgbm import LGBMRegressor
from mlforecast import MLForecast
from mlforecast.utils import generate_daily_series
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import make_pipeline

class CategoricalEncoder(BaseEstimator, TransformerMixin):
    def __init__(self, cols_to_encode):
        self.cols_to_encode = cols_to_encode
    def fit(self, df, y=None):
        self.dtypes_ = df[self.cols_to_encode].astype('category').dtypes
        return self
    def transform(self, df, y=None):
        return df.astype(self.dtypes_)

series = generate_daily_series(2)
pipe = make_pipeline(CategoricalEncoder('dayofweek'), LGBMRegressor(n_estimators=2))
mlf = MLForecast(models={'reg': pipe}, freq='D', date_features=['dayofweek'])
# the following contains the categories used by lightgbm, which match the days of the week