If sequences shorter than input_size are automatic...
# neural-forecast
m
If sequences shorter than input_size are automatically padded with 0 at the beginning, and future exogenous variables are also automatically padded with 0 at the beginning (is this true?), in order to distinguish those zeros from real zeros of the time series, could it make sense to create a "real_data" exogenous variable with value 1 for the time series points, so that its value is automatically set to 0 for padding data and we can tell to the model that padding data have real_data=0 and true data have real_data=1? Basically we would set that variable to 1 for all time series points and then leverage the all-zero padding to create real_data=0. Could the scaler (e.g. robust) negatively impact this "real_data" indicator since future exogenous variables get scaled? Is the exogenous variables scaling performed before or after the automatic zero-padding? Thanks
c
Hi @Manuel! Yes, all variables are padded at the beginning. We have not tried the "real_data" dummy, but it sounds intuitive. We previously had an "available_mask" dummy to directly mask missing/padded data with zeros. This is a little bit different but related. The scaling is done after padding the variable, but it shouldn't affect the information provided by the variable
👍 1
Also note that most models use only the information in the input window of size
input_size
. If the time series have a lot of history then it should not be necessary to add the "real_data" variable.
m
@Cristian (Nixtla) Thank you! Yes the problem in my specific case is that many time series have a limited history which is shorter than the input_size I need for modeling the yearly seasonality in an acceptable way.