2

Recently, I came across the paper Robust and Stable Black Box Explanations, which discusses a nice framework for global model-agnostic explanations.

I was thinking to recreate the experiments performed in the paper, but, unfortunately, the authors haven't provided the code. The summary of the experiments are:

  1. use LIME, SHAP and MUSE as baseline models, and compute fidelity score on test data. (All the 3 datasets are used for classification problems)

  2. since LIME and SHAP give local explanations, for a particular data point, the idea is to use K points from the training dataset, and create K explanations using LIME. LIME is supposed to return a local linear explanation. Now, for a new test data point, using the nearest point from K points used earlier and use the corresponding explanation to classify this new point.

  3. measure the performance, using fidelity score (% of points for which $E(x) = B(x)$, where $E(x)$ is the explanation of the point and $B(x)$ is the classification of the point using the black box.

Now, the issue is, I am using LIME and SHAP packages in Python to achieve the results on baseline models.

However, I am not sure how I'll get a linear explanation for a point (one from the set K), and use it to classify a new test point in the neighborhood.

Every tutorial on YouTube and Medium discusses visualizing the explanation for a given point, but none talks about how to get the linear model itself and use it for newer points.

nbro
  • 39,006
  • 12
  • 98
  • 176
user294142
  • 21
  • 1
  • Hello. Although this is an old post, to me, it's not fully clear what your question is. Could you please edit it you clarify what it is and write it explicitly (in the title)? – nbro Jun 04 '21 at 11:41

1 Answers1

1

For LIME, the local model that is trained can be found at lime.lime_base.explain_instance_with_data under the name "easy_model".

Saurav Maheshkar
  • 756
  • 1
  • 7
  • 20
Hajar
  • 11
  • 1