I think that if you're inverse transforming your predictions and measuring performance on the original scale of the data, then the comparison is valid. The scaler is just a preprocessing step. Note that by default, the scaler for NHITS is "identity".