https://github.com/nixtla logo
#mlforecast
Title
# mlforecast
j

Jason Gofford

10/06/2023, 2:00 PM
More questions (sorry) -- what's the recommended way of handling categorical or boolean exogenous variables? It's not clear if this is technically static or not in the mlforecast docs. Lets say I'm using LightGBM that has native support for categorical features, and I have a column called "is_missing" which is boolean for if the data is missing/imputed, and I have a column called "holidays" which is the name of the holiday that occurs on a given date. As their
dtype
is
category
I have to put them in as a static feature. Do I need to handle this differently, or is that correct?
j

José Morales

10/06/2023, 4:02 PM
That's correct. Since they don't change over time they're static and since you set the dtype to category LightGBM will treat them as such when training
j

Jason Gofford

10/08/2023, 11:32 AM
On the static vs dynamic, it makes sense for something like "store_id" to be static because a store is a store. Repeating a static value is the only way to handle it. But for dynamic features (where my understanding of a dynamic feature is that it's something that varies over time or by date but is not related to the unique_id) wouldn't missingness and holidays be dynamic? A given date can be a holiday or not, and the date can change by year, or it might be missing or not. The convention isn't clear to me.
j

José Morales

10/09/2023, 3:54 PM
The definition is the following: • Static: same value for a single id across all timestamps • Dynamic: more than one unique value for a single id If you have dynamic features they can also be categorical, the only difference is that you have to provide them through
X_df
when using predict, whereas static features are just repeated automatically for you. Does that make sense?
j

Jason Gofford

10/09/2023, 4:00 PM
So here, a holiday is dynamic, right? It can be 1 or 0, or many values
j

José Morales

10/09/2023, 4:06 PM
Yes
j

Jason Gofford

10/09/2023, 4:10 PM
If that's the case, how should it be passed to training? I get "can't cast to {integer or float}" errors if I dont put it as a "static feature".
j

José Morales

10/09/2023, 4:11 PM
Do those errors come from the model?
j

Jason Gofford

10/09/2023, 5:10 PM
I don't think so. It's lightgbm, so it's quite happy with categories. And if they're set as static it's fine, but if they're not then I get an error.
I encounter the problem when including a categorical column (
category
type) in an input df, but not setting it as static. It should be relatively easy to reproduce with a basic lightgbm model.
j

José Morales

10/09/2023, 5:21 PM
Can you paste the stacktrace of the error?
j

Jason Gofford

10/09/2023, 5:51 PM
I'll reproduce it tomorrow
I figured this out. When doing a train vs test comparison I need to pass the
test
dataset (without the static columns) to the predict method. This wasn't clear from the error message at all.
The error message is raised as
Copy code
-> 6178 raise KeyError(f"{not_found} not in index")
KeyError: "['holidays'] not in index"
The error could be improved to explicitly state that this is related to the
X_df
set.
j

José Morales

10/12/2023, 3:04 PM
Thanks for the feedback. Were you not providing
X_df
at all or was it just missing that column?
j

Jason Gofford

10/12/2023, 3:04 PM
not provided at all, which makes sense in hindsight but puzzled me for a while.
j

José Morales

10/12/2023, 4:05 PM
I think we can add some more errors in there, I'll work on that
2 Views