Hey! Just want to show some progress on including ...
# squads
m
Hey! Just want to show some progress on including Shap values in neuralforecast (I finally found some time for that!). I think it will be useful to provide explainability for TimeGPT that doesn't rely on LightGBM. Right now, this works:
Copy code
mlp = MLP(
    h=12,
    input_size=24,
    max_steps=200,
    alias="MLP"
)
nf = NeuralForecast(models=[mlp], freq="ME")
nf.fit(df=Y_train_df)
explanation = nf.explain(background_size=50, target_samples=5)
it returns a dict with all the values necessary for SHAP to make plots and interpret:
Copy code
{'MLP': {'shap_values': array([[[-6.10306270e+01, -9.39689115e+00,  1.97515033e+01,
            1.17820769e+01,  1.51813563e+01,  2.06414518e+01,
            1.89970556e+01,  1.91416997e+01,  2.06474227e+01,
           -5.95401884e+00, -4.46718962e+00, -2.44429846e+01],
          ...
  'feature_names': ['y_lag_1',
   'y_lag_2',
   'y_lag_3',
   'y_lag_4',
   'y_lag_5',
   'y_lag_6',
   'y_lag_7',
   'y_lag_8',
   'y_lag_9',
   'y_lag_10',
   'y_lag_11',
   'y_lag_12',
   'y_lag_13',
   'y_lag_14',
   'y_lag_15',
   'y_lag_16',
   'y_lag_17',
   'y_lag_18',
   'y_lag_19',
   'y_lag_20',
   'y_lag_21',
   'y_lag_22',
   'y_lag_23',
   'y_lag_24'],
  'background_data': array([[640., 618., 662., 648., 663., 735., 791., 805., 704., 659., 610.,
          637., 660., 642., 706., 696., 720., 772., 848., 859., 763., 707.,
          662., 705.]], dtype=float32),
  'target_data': array([[340., 318., 362., 348., 363., 435., 491., 505., 404., 359., 310.,
          337., 360., 342., 406., 396., 420., 472., 548., 559., 463., 407.,
          362., 405.]], dtype=float32),
  'base_values': array([732.64758301, 707.97583008, 755.71826172, 754.11199951,
         787.41577148, 853.30413818, 948.47595215, 924.73815918,
         838.97302246, 759.78320312, 718.87841797, 737.99725342]),
  'model_name': 'MLP',
  'model_alias': 'MLP'}}
So then, we can make plots like the one attached. It's still in progress. Next step is to also include exogenous features, not just past lags, but just wanted to get early feedback, in case you don't think it's worth continuing working on it
🙌 5
h
but we are still using lgbm to handle exog vars right?
o
This has nothing to do with TimeGPT @Han Wang, it's a feature for NeuralForecast
h
i see, and that is why we can't use that on exog variables and those features are predetermined?
o
I have no clue what that means. Shapley values provide a way of explaining predictions. It works by attributing a score to each model input feature. A feature to a model is any model input.
m
I decided to go with shap directly instead of Captum. It felt like Captum was wrapping around shap anyway, so I didn't want to abstract too much. Also, I didn't see yet how layer or neuron attribution would benefit users/clients (but maybe I'm wrong). So for now, plain SHAP
o
Ah ok! Which explainer did you use?
m
Kernel, but I think we can easily add others as well
h
I remember kernel shap is slow, is there any concern on speed?
n
Just curious, Would there be a way to find the combined explainability of features, e.g. I want to group all lag features into a category called
autoreg
, group price and discount related features as a category called
price_related
, etc. and find the importance/contribution of each category?
h
shap values are additive, right?
✅ 1
m
@Han Wang speed also depends on how much samples we use to calculate the Shap values. But yes, speed could be an issue and other explainers can be faster. @Nikhil Gupta, I guess you could, since they are additive, but for now, it would be a manipulation to be done once the results are returned
o
@Nikhil Gupta We should probably group (ie. just add them together) all features over the temporal dimension anyways, as Shap doesn't understand auto-correlation and generally suffers when features are highly correlated. So Shap results across timesteps is more or less meaningless due to the high correlation.
But maybe there are already methods tackling this, not too up-to-date on the latest explainability methods, especially in the time domain
n
Understood. Yes, it would be applied at each future time step but by clubbing (adding) the Shap values. Thanks!
h
so i think shap is the bridge between LTM and LLM, we could just provide the ungrouped shap value with meaningful feature descriptions and tell LLM that the values are additive, and let the LLM to decide how to interpret them