Hey there, this might be a bit of a newbie questio...
# statsforecast
b
Hey there, this might be a bit of a newbie question, but I'm a bit confused on cross-validation. I tried the AI in the docs but it seemed to contradict itself. When setting a fixed test size, what does de n_windows parameter do? What would be the number of folds in the cross-validation?
m
Hi @Blauzo the
n_windows
is somehow the equivalent to the number of folds. The difference of course is that the windows need to be one after another due to the sequential nature of the data. We have a visualization in the neuralforecast tutorial for cross-validation that I think explains it better. See here
the orange rectangles are the windows
b
I see, so if I want to test over 20% of the data I would set the test size to 20% of the length. The cross-validation would then chain window over the step size until it reaches the of the test size? Therefore the number of folds would be test-legnth // step_size?
m
Let me give you a concrete example to illustrate how
n_windows
and
step_size
interact. Suppose you have hourly data and you set
n_windows=3
and
step_size=24
. Then you will have 3 test sets: 1. The first test set uses all the data except the last 72 hours. 2. The second test set uses all the data except the last 48 hours. 3. The third test set uses all the data except the last 24 hours.
If your goal is to test on 20% of your data, you’ll need to adjust
n_windows
and
step_size
accordingly. In the previous example, I used 24 hours = 1 day. However, you also need to consider the context of your data since sometimes you may want to test a specific portion of it. For example, if you are working with retail data, you might want one of the windows to include a promotional period.