https://github.com/nixtla logo
c

Chris Gervais

05/11/2023, 2:41 PM
On the topic of TFT, any suggestions on speeding up the training (besides GPUs obviously)? There seems to be some bottlenecks, maybe on the dataset loader side?
c

Cristian (Nixtla)

05/11/2023, 2:50 PM
Hi Chris! Yes, TFT is an intrinsically slow model. We analyzed the cost of each component and the model itself accounts for almost all the computation (not the loader). For each batch it has to unroll the LSTMs and then it has the attention layer, with quadratic cost on the size of the window). There are many hyperparameters you can tweak to reduce the cost/time: ā€¢ Reduce
windows_batch_size
(less windows per batch) ā€¢ Reduce
input_size
(shorter input window) ā€¢ Reduce
hidden_size
(smaller model)
If you have a validation set (set
val_size>0
), you can also set
early_stop_patience_steps
larger than 0, to stop the training if the validation loss is not improving.
c

Chris Gervais

05/12/2023, 5:36 PM
Hey Cristian, thanks for running this to ground. We'll try these suggestions and let you know how it goes. lol TFT looks like a snail next to TCN šŸ˜„
šŸ”„ 1
šŸ™ƒ 1
k

Kin Gtz. Olivares

05/13/2023, 5:48 PM
It is, TFT is a WindowsBased approach TCN uses the forking sequences optimization
They have a huge tradeoff between GPU memory and computational speed
If we have some time we can develop a Transformer based algorithm that uses forking sequences