Hey maybe I've overlooked something but you guys don't have a rnn encoder with rnn decoder right? So
Encode historic + categorical data with lstm_1 to obtain a h and c
Run lstm_2 with initialized h and c from encoder other inputs future + categorical data.
Possibility to mix encoded h into lstm 2 at the input too, perhaps with a horizon time embedding
Possibility to use gru/lstm and recent slstm with improved gating