Hi, I am wondering what is the proper way to perfo...
# neural-forecast
y
Hi, I am wondering what is the proper way to perform in-sample evaluation? For example, the transformer-based model is trained to learn within window behavior, where we are given data
input_size
, but asked to make prediction on a data of
h
. The goal is to make evaluate over randomly chosen window on the validation dataset. Currently, I am thinking of doing resampling myself, but I am wondering if there is in-built features for insample evaluation. I think
predict_insample
might be doing this, but I am confused about what the output of predict_insample is. Does it only contain the predicted value? Is it a fair metrics to simply compute accuracy of predict_insample with the original df as a measurement of method.
k
Hey @Yang Guo, The insample validation evaluation is a feature that we have not implemented. It would require to modify BaseWindows validation step to sample within.
y
Could you please expand on that? I thought I could simply use the
nf.predict
and directly apply subset of the dataset.
c
Hi @Yang Guo, we have other options as well. For example, the
cross_validation
method uses a validation set (of size
val_size
) for model selection, and then automatically produces the forecast for the entire test set (of size
test_size
or
n_windows
of size
h
).
Alternatively, you can use
predict_insample
(run it after the
fit
or
cross_validation
) method to recover the forecasts for the entire train AND validation sets. You can then filter the forecasts however you want.
The
predict_insample
already returns the true values in the
y
column as well, here is the tutorial: https://nixtla.github.io/neuralforecast/examples/predictinsample.html
Be careful on focusing only on the validation set, it is not ideal to do model selection on the train set.
And depending on your use case,
cross_validation
is already doing model selection for you. It essentially covers the entire pipeline. (train model on the train set, select on validation set, predict on test set).
y
I think the issue is that predict_insample can only apply to the default dataset. So I think you point is that after performing the cross validation, then apply predict_insample?
Could you please explain the cross validation a little bit? It seems to require re-training (call model.fit multiple times)
c
Can you provide more details on the pipeline you are trying to build? No,
cross_validation
only trains the model once.
y
Actually, let me rephrase my question. Cross_validation performs actual training on the dataset though with some additional features. To evaluate the model, I cannot use the same dataset used in the cross_validation dataset. However, predict_insample does not take dataset as input.
c
If your objective is to compare between models on historic data,
cross_validation
is the way to go. This is the function we have used in our published research, and it is the standard way of comparing the performance of models. The historic data is separated chronologically in train/val/test. Models are trained on the train set, and then it uses the validation set for model selection and hyperparameter tuning (for example, if you use an
auto
model such as the
AutoPatchTST
). Finally, it returns the forecast on the test set, which was never seen by the model during training.
We compare the performance on the test set.
The more common use case of the
predict_insample
is to recover the forecasts for the train set and validation set. Was this useful? Let me know if you have additional doubts, we can chat using direct messages as well.
👍 1
y
I agree with this pipeline, but is there any way that I can simply evaluate a trained model? The evaluation is done in a similar way to predict_insample but on a different dataset.
c
Ok, I understand you point now. So essentially you want to train a model on one dataset and predict on a new one? (not necessarily with the same time series)
y
Yep
c
You should use the
predict
method for that, but as you said, it can only forecast one window. You just need to send
predict(df=new_df)
. We actually have a tutorial on transfer learning with this use case here: https://nixtla.github.io/neuralforecast/examples/transfer_learning.html
If you want more windows, you can hack it. For example, after training, set
nf.models[0].max_steps=0
. Then pass the new dataset to the
fit
method (set
use_init_models=False
), then call
predict_insample
.