Hi all, congrats on the amazing library! I've been...
# neural-forecast
m
Hi all, congrats on the amazing library! I've been experimenting a bit with NHiTS on a dataset with weekly frequency, strong seasonality and trend. Question: I'm still trying to understand how to define the
n_pool_kernel_size
and
n_freq_downsample
parameters. Are there any guidelines for this, in relation to the frequency of the series? It would be great if someone can provide some intuition on how these affect the model. Would these be reasonable values? "n_pool_kernel_size": [2, 2, 2], # MaxPooling Kernel size "n_freq_downsample": [52, 24, 1], # Interpolation expressivity ratios I'm getting a reasonable result but wanted to see if it's possible to improve. This is an example of the data and predictions.
c
Hi @Martin Bel! In our
AutoNHITS
class we have a predefined list of values for those hyperparameters to explore. For the
n_pool_kernel_size
we recommend exploring constant values across stacks (
[1,1,1]
,
[2,2,2]
, etc) or exponentially decreasing values (
[8,4,1]
,
[16,8,1]
). With
[1,1,1]
the model is not downsampling the input. The
n_freq_downsample
controls how much the output dimension is decreased in the blocks of each stack. For a particular stack, the output dimension of the MLP follows:
output_dim = h/n_freq_downsample
, where h is the forecasting horizon. For example, if you have hourly data and you are forecasting a week, h=168. By setting
n_freq_downsample=24
, each MLP of the stack will output 7 points (168/24), one for each day of the week. In your case, you are forecasting an year of weekly data. You can use
n_freq_downsample=13
to aggregate forecasts by quarter (52/13=4), and
n_freq_downsample=4
to have approximately one output for each month. In this case the final parameter will be
n_freq_downsample=[13,4,1]
.
In both cases I recommend you to use the
AutoNHITS
to define a grid for these hyperparameter
m
Perfect! Thanks for the explanation! I saw the AutoNHITS example but the logic of how the parameters were being chosen wasn't clear. Just to clarify, in this model I'm passing as an argument
n_freq_downsample
= [13, 4, 1] but I don't see this changing the output dimension. These are all the parameters I used:
Copy code
h = 52
nbr_blocks = 3
linear_dim = 512

config_nhits = {
    "h":h,
    "input_size": h * 8,                                      # Length of input window
    "n_blocks": nbr_blocks*[1],                               # Length of input window
    "mlp_units": nbr_blocks*[[linear_dim, linear_dim]],       # Length of input window
    "n_pool_kernel_size": [2, 2, 2],                          # MaxPooling Kernel size
    "n_freq_downsample": [13, 4, 1],                           # Interpolation expressivity ratios
    "learning_rate": 1e-3,                                    # Initial Learning rate
    "scaler_type": "invariant",                                  # Scaler type
    "activation": "ReLU",
    "max_steps": 500,                                        # Max number of training iterations
    "batch_size": 128,                                        # Number of series in batch
    "windows_batch_size": None,                               # Number of windows in batch
    "random_seed": 123,                                       # Random seed
}

model = NHITS(**config_nhits)
c
Yes, the final outputs do not change, the model is still producing a forecast of size h. What is changing is the "expressivity"/frequency of each stack. By reducing the output dimension it learns lower frequency patterns. The intermediate values are interpolated to produce the complete forecast (hence the name of our model). Here is a diagram:
Thanks for the feedback, we will improve the documentation to make it more clear!
m
No problem! Thanks for the clarification