This message was deleted.
# neural-forecast
s
This message was deleted.
c
Thanks for the tip @Manuel! Indeed there are several strategies in the HPO literature like the one you mention (succesive halving, hyperband, etc). I wonder if your current configuration of the TFT might be stuck in a suboptimal region given that the random seed has a large effect on the performance (for example too large lr)
m
@Cristian (Nixtla) Yes, it is exactly inspired by Hyperband but focuses on random seed. Unfortunately, even when decreasing the learning rate, the behavior remains similar. In particular, with some seeds, I cannot predict certain seasonal peaks in the time series. Instead, I obtain flatter predictions, even when the number of heads or the size of the hidden layers increases.
P.S. Obviously the random seed and the size of the hidden layers interact, therefore the best seeds can change by changing the size of the hidden layers (I keep the size of the hidden layers fixed for the most promising seeds)
c
Very interesting, thanks for sharing 🙂. This is why it is also very common to create ensembles based on the forecasts produced by multiple random seeds. In the NBEATS paper they ensamble more than 100 models 😅
m
It makes sense, although in my case the model with the best seed would risk being made worse by an ensemble with models with bad seeds