Hi everyone! I’d like to have some advice on HP t...
# neural-forecast
a
Hi everyone! I’d like to have some advice on HP tuning for TFT. My goal is to forecast weekly sales of products (the granularity is week x store x product) with a horizon of 52. At the moment, my
stat_exog_list
is composed of 6 features,
hist_exog_list
composed of 2 features, and
futr_exog_list
composed of 3 features. The overall list of features will be augmented in the future. I have a training set composed of more or less two years of data, and a validation set composed of one year of data. Here is the configuration that I am using at the moment for the HP tuning process:
Copy code
num_samples: 20
 float_hp:
  dropout:
   base_value: 0.1
   lower: 0.1
   upper: 0.3
   step: 0.1
  attn_dropout:
   base_value: 0.1
   lower: 0.1
   upper: 0.3
   step: 0.1
 integer_hp:
  input_size:
   base_value: 52
   lower: 26
   upper: 104
   step: 26
  hidden_size:
   base_value: 64
   lower: 64
   upper: 768
   step: 64
 categorical_hp:
  scaler_type:
   base_value: robust
   choices:
    - robust
    - standard
  learning_rate:
   base_value: 0.001
   choices:
    - 0.01
    - 0.001
    - 0.0001
    - 0.00001
  n_head:
   base_value: 4
   choices:
    - 2
    - 4
    - 8
 epochs: 50
 batch_size: 128
The thing is that, for certain combinations of HP, I have the following overflow error, linked to the GPU memory:
Copy code
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 488.00 MiB. GPU 2 has a total capacity of 14.57 GiB of which 422.75 MiB is free. Including non-PyTorch memory, this process has 14.15 GiB memory in use. Of the allocated memory 13.69 GiB is allocated by PyTorch, and 251.46 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (<https://pytorch.org/docs/stable/notes/cuda.html#environment-variables>)
I am running the code on a VM with 4 Tesla T4, with ~15GiB memory each. The main parameters that are draining the memory are the
batch_size
,
hidden_size
,
input_size
and
n_head
if I am not mistaken. I tried different combinations of HP in order to see when the GPUs overflow and when it is not, and here are different combinations where the overflow happen: -
batch_size = 128
,
hidden_size = 320
,
input_size = 104
and
n_head = 2
-
batch_size = 128
,
hidden_size = 256
,
input_size = 104
and
n_head = 8
And for the following ones, I don’t have any issues: -
batch_size = 128
,
hidden_size = 384
,
input_size = 52
and
n_head = 8
(14,4 GiB per GPU) -
batch_size = 256
,
hidden_size = 256
,
input_size = 104
and
n_head = 2
(14,3 GiB per GPU) -
batch_size = 128
,
hidden_size = 192
,
input_size = 104
and
n_head = 8
(13,2 GiB per GPU) Considering the list of features that will increase in the future, and the problem framing, would you have any recommendation on the range of HP to choose (maybe some of them can be reduced to benefit others)? And do you think that the used hardware is in the right range? That would be greatly appreciated. Thank you!
c
Hi Arthur. Check the inference batch sizes as well, because the memory error might be in the validation step
a
Hi @Cristian (Nixtla)! Thanks a lot for your answer! The overload happens directly after starting the training, so my guess is that it does not happen in the validation step unfortunately.
Hey @Cristian (Nixtla), would you have any other recommendation maybe? Thanks a lot for taking the time!