Hi all, Firstly, thank you for the wonderful libr...
# neural-forecast
l
Hi all, Firstly, thank you for the wonderful library that makes forecasting with NN so much easier, especially with the Auto models. I am familiar with classifications and regressions, and am very new to time series forecasting. I spent the past 2 days reading the documentation and have the following questions below.My dataset contains 22 time series of different lengths, and with 100+ numerical features, 20 categorical features and 10 binary features. It is weekly data where every Monday is the start of week i.e. soweekdate and this is the 'ds'. The target variable is the Sales_Qty, which is what I am trying to forecast 8 weeks ahead and is the 'y'. I intend to perform feature selection using Light GBM regression tree. 1. If I am using the Auto models, can I specify the horizon, input size, futr_exog_list, hist_exog_list, stat_exog_list and etc in the configs of the Auto Models on top of the required model parameters? 2. Since I am forecasting 8 weeks ahead, I change the freq="W" in nf = NeuralForecast(models=models, freq='M')? 3. On scaler_type, what is the default for the Auto models? When it scales, does it do it for individual time series in the group or do it for all columns as a group? Is there a way to turn this off as I might want to scale by individual time times outside of Neural Forecast. 4. The 20 categorical features are time invariant so I will be creating dummy variables for them and putting them into the static_df. As I am forecasting 8 weeks ahead and I have both historical and future values for the 100+ numerical features. Should I put these numeric features into the hist_exog_list, futr_exog_list and futr_df accordingly? The column names might be the same but the underlying values are different for each id and each ds. 5. As for the 10 binary features, they are in the similar situation as in the numeric features where I have both historical and future values for each time series, should I also include these binary features into the hist_exog_list, futr_exog_list and futr_df accordingly? Thanks in advance, Will
c
Hi @lobbie lobbie. Thanks for using our library! 1. In the configuration dictionary you can specify all the parameters you mention expect for horizon, which must be specified when initializing the
Auto
model, for example:
Copy code
model = AutoNHITS(h=8,
                  loss=MAE(),
                  config=config,
                  search_alg=HyperOptSearch(),
                  num_samples=20)
The reason for this is that models with different
h
are not directly comparable. 2. No, the frequency should always be the sampling of your data, in this case
W-MON
. To specify forecasting 8 weeks ahead set the horizon
h
to 8. 3. The default
scaler_type
varies depending on the base model. If you do not specify the
scaler_type
in the config dictionary, it will use the default of the base model (same behavior for all parameters). You can turn off scaling by setting
scaler_type=None
. 4. Yes, all numeric features that change in time should be either a
hist
or
futr
variable. BOTH set of features should be present in the historic data, during both training and with the
predict
method. In the
futr_df
of the
predict
method, only add the
futr
variables. We just fixed a bug regarding exogenous variables, so I recommend updating your code with the latest changes of the main branch. 5. Yes, binary and numeric variables are treated equally by the models.
I strongly recommend to start your pipeline without any exogenous variables to create a simpler baseline. Only add exogenous variables when your initial pipeline is ready and working. When adding 100+ exogenous variables models can grow drastically, specially if you have a large
input_size
, increasing training times. You can start adding the most informative variables (if you know them), and compare the performance and training times as you add more variables.
l
@Cristian (Nixtla), thanks for your advise on exogenous features. I will definitely try modelling without these first. Unfortunately, I do not know how to get the latest changes from the main branch. Should I just do pip install again? In addition, I am following the steps from the documentation and got the
Copy code
TypeError: __init__() got an unexpected keyword argument 'stat_exog_list'
My first attempt of writing the code is
Copy code
horizon = 8
 
models=[
   AutoRNN(h = horizon
           , config = None
           , stat_exog_list = mystat_exog_list # <- Static exogenous variables
           , hist_exog_list = myhist_exog_list # <- Historical exogenous variables  
           , futr_exog_list = myfutr_exog_list # <- Future exogenous variables 
           , loss = MQLoss()
           , num_samples = 2
           #, search_alg=HyperOptSearch()
           , cpus=4
           , scaler_type = 'robust')
   , AutoTFT(h = horizon
           , config = None  
           , stat_exog_list = mystat_exog_list # <- Static exogenous variables
           , hist_exog_list = myhist_exog_list # <- Historical exogenous variables  
           , futr_exog_list = myfutr_exog_list # <- Future exogenous variables 
           , loss = MQLoss()
           , num_samples = 2
           #, search_alg=HyperOptSearch()
           , cpus=4
           , scaler_type = 'robust')
 ]
 
nf = NeuralForecast(models = models, freq = 'W-MON')  
 
# fit the model
 nf.fit(df = historic_df, static_df=static_df)
Any ideas where I am doing wrong?
c
what model is causing the error? RNN or TFT?
l
both when I ran AutoRNN and AutoTFT individually.
c
The problem is that all those arguments should be in the
config
, not directly in the
Auto
class. Here is the documentation: https://nixtla.github.io/neuralforecast/examples/automatic_hyperparameter_tuning.html.
the only parameters of the
Auto
class are the
h
,
config
,
loss
,
search_alg
,
num_samples
.
l
I see....I will have a look at the documentation and try out as suggested i.e. build a baseline first without the exog features. thanks.
Hi @Cristian (Nixtla), for
"random_seed": tune.randint(1, 10),
how do I set a seed for reproducible research? It seems that the global seed is always different every time i run the model. thanks.