#neural-forecast

Title

# neural-forecast

p

Phil

09/22/2023, 3:14 PMI have a *model understanding* question concerning the NHITS model. I am hoping to get some clarifications about the connection between two hyper-parameters:

`n_freq_downsample`

and `n_pool_kernel_size`

. My question may also stem from how these parameters interact with one another.
From my understanding, the higher the value in `n_freq_downsample`

, the less datapoints we will have after downsampling. i.e. if we have daily data and we downsample with `n_freq_downsample = 7`

, we will have 1/7 the amount of data.
`n_pool_kernel_size`

also has a "downsampling effect", if we have an input size of `N`

and `n_pool_kernel_size = 2`

, then we will now have `N / 2`

datapoints.
First of all, is this understanding correct? And if so, it seems like we would want these two hyperparameters be inversely proportional. I would not want to downsample by a large number and then apply large kernel size pooling layer. These would dramatically decrease the amount of information being fed into my model?
The reason I am asking this question is because I was looking at the AutoNHITS parameter space
Copy code

```
"n_pool_kernel_size": tune.choice(
[[2, 2, 1], 3 * [1], 3 * [2], 3 * [4], [8, 4, 1], [16, 8, 1]]
),
"n_freq_downsample": tune.choice(
[
[168, 24, 1],
[24, 12, 1],
[180, 60, 1],
[60, 8, 1],
[40, 20, 1],
[1, 1, 1],
]
),
```

Intuitively, I would have assumed that the last two choices for `n_pool_kernel_size`

would have been reversed. I.E. `[1, 4, 8], [1, 8, 16]`

?c

Cristian (Nixtla)

09/25/2023, 7:49 PMHi **@Phil**! You understanding on both parameters is correct. Regarding the inverse, not necessarily. The

`n_pool_kernel_size`

downsamples only the inputs, and the `n_freq_downsample`

only the output. Having both at the same time do not compound, because they affect different parts of the architecture. Here is the diagram of the paper, the kernel controls the Maxpool im the inputs of the MLP stack, and `n_freq_downsample`

controls the output dimension of theta (points in the forecasting window)The intuition of having both larger at the same time is that to output a lower dimensional output (higher

`n_freq`

) you need less information from the inputs, so kernel is largerWith that said, we have observed that larger kernel sizes only help in very high frequency data, and usually keeping a value of 1 or 2 is the best. That is why in our default config we kept the option of no downsampling ([1,1,1])

p

Phil

09/25/2023, 7:54 PMI see that makes more sense! Thank you! Have you observed any effects of varying the number of blocks across frequencies. or keeping them constant is roughly equivalent in performance

c

Cristian (Nixtla)

09/25/2023, 9:56 PMWe recommend increasing the blocks with larger datasets. For example, the NBEATS uses 30 blocks in total for each frequency of the M4 dataset, with around 30k series

✅ 1