Have you guys considered adding large-scale pre-trained transfer learning N-HiTs trained on large retail Dunhumby dataset? It would probably be more powerful than one trained on M-comps. I have a link here to the dataset here
https://www.dunnhumby.com/source-files/