https://github.com/nixtla logo
#general
Title
# general
w

Wen Yao

07/20/2023, 8:30 PM
another quick question regarding featurization, if I use
dayofweek
, is it treated as numerical value or does the model know it’s categorical?
j

José Morales

07/21/2023, 12:25 AM
It's treated as numerical by default, but you can use MLForecast.preprocess to get the transformed data, cast to categorical and fit the models. You'd need a before predict callback to set the categories when predicting though. Let me know if you'd like a full example to show how
w

Wen Yao

07/21/2023, 7:25 PM
got it. yeah if you have an example, that will be great! I kind of encoded this categorical feature instead of using the callback
j

José Morales

07/21/2023, 9:50 PM
Another possibly easier way would be by wrapping your model in a scikit learn pipeline and define a transformer that transforms the columns as categories, the fit step would save the mappings and the transform would apply them. That way they will be automatically applied in the predict step
Here's the example using a pipeline:
Copy code
from lightgbm import LGBMRegressor
from mlforecast import MLForecast
from mlforecast.utils import generate_daily_series
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import make_pipeline

class CategoricalEncoder(BaseEstimator, TransformerMixin):
    def __init__(self, cols_to_encode):
        self.cols_to_encode = cols_to_encode
        
    def fit(self, df, y=None):
        self.dtypes_ = df[self.cols_to_encode].astype('category').dtypes
        return self
        
    def transform(self, df, y=None):
        return df.astype(self.dtypes_)

series = generate_daily_series(2)
pipe = make_pipeline(CategoricalEncoder('dayofweek'), LGBMRegressor(n_estimators=2))
mlf = MLForecast(models={'reg': pipe}, freq='D', date_features=['dayofweek'])
mlf.fit(series)
# the following contains the categories used by lightgbm, which match the days of the week
mlf.models_['reg'].named_steps['lgbmregressor'].booster_.pandas_categorical