Does anyone know why `neuralforecast`'s `TemporalN...
# neural-forecast
a
Does anyone know why `neuralforecast`'s
TemporalNorm
uses one set of normalization per sequence (basically InstanceNorm for sequence data without learnable parameters)? I would have expected the offset and scale to be calculated over the entire training dataset and kept constant for every sequence @Cristian (Nixtla)
c
Hi @Alex Wang, this type of normalization is increasingly common in time series. We have seen consistent good results with this approach, in particular in settings with large scale variation within and across time series.
However, we know that it has some limitations (for instance, loss of information). We are working on adding a separate pre-processing step that performs normalization over the complete time series
a
Gotcha this is incredibly helpful! Thank you! Is there a reference that evaluates this type of temporal norm?
c
For example, we just saw that our normalization can drop vital information in healthcare settings (your area?). Normalizing in a short window removes the information of the medication dosage, since it transform it to essentially a dummy (0-1)
a
Exactly, that's what I was worried about since sometimes the baseline level is very important information
c
We discuss TemporalNormalization here: https://arxiv.org/abs/2305.07089
One alternative is to simply normalize the data before, and set
scaler_type=None
Finally, some models such as the NHITS and NBEATS are extremely robust to the scale. We have seen amazing results without any normalization
a
thanks a lot for these references and pointers 🙂