another quick question regarding featurization, if...
# general
w
another quick question regarding featurization, if I use
dayofweek
, is it treated as numerical value or does the model know it’s categorical?
j
It's treated as numerical by default, but you can use MLForecast.preprocess to get the transformed data, cast to categorical and fit the models. You'd need a before predict callback to set the categories when predicting though. Let me know if you'd like a full example to show how
w
got it. yeah if you have an example, that will be great! I kind of encoded this categorical feature instead of using the callback
j
Another possibly easier way would be by wrapping your model in a scikit learn pipeline and define a transformer that transforms the columns as categories, the fit step would save the mappings and the transform would apply them. That way they will be automatically applied in the predict step
Here's the example using a pipeline:
Copy code
from lightgbm import LGBMRegressor
from mlforecast import MLForecast
from mlforecast.utils import generate_daily_series
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import make_pipeline

class CategoricalEncoder(BaseEstimator, TransformerMixin):
    def __init__(self, cols_to_encode):
        self.cols_to_encode = cols_to_encode
        
    def fit(self, df, y=None):
        self.dtypes_ = df[self.cols_to_encode].astype('category').dtypes
        return self
        
    def transform(self, df, y=None):
        return df.astype(self.dtypes_)

series = generate_daily_series(2)
pipe = make_pipeline(CategoricalEncoder('dayofweek'), LGBMRegressor(n_estimators=2))
mlf = MLForecast(models={'reg': pipe}, freq='D', date_features=['dayofweek'])
mlf.fit(series)
# the following contains the categories used by lightgbm, which match the days of the week
mlf.models_['reg'].named_steps['lgbmregressor'].booster_.pandas_categorical