Efficient Shapley Values Estimation by Amortization for Text Classification

AI-generated keywords: Shapley Values Neural Text Classification Amortized Model KernelSHAP Computation Time

AI-generated Key Points

Authors address the challenge of computing Shapley Values for large pretrained models in neural text classification
Computation of Shapley Values is time-consuming due to the large number of model evaluations required
Authors propose an amortized model that directly predicts each input feature's Shapley Value without additional model evaluations
Estimated Shapley Values are sensitive to random seed choices, especially for examples with longer input texts
Proposed amortized model provides stable estimations as the inference is deterministic
Experimental results show that the amortized model accurately estimates Shapley Values with a substantial speedup compared to traditional methods (60 times faster process)
Functionality of the model evaluated by examining quality of explanations in downstream tasks such as feature selection and domain calibration
Model compared with computationally expensive KernelSHAP (KS) method and demonstrates superior performance
Overall, paper presents an efficient and effective approach for estimating Shapley Values in neural text classification models

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Chenghao Yang, Fan Yin, He He, Kai-Wei Chang, Xiaofei Ma, Bing Xiang

arXiv: 2305.19998v1 - DOI (cs.CL)

ACL 2023 Camera Ready

License: CC BY 4.0

Abstract: Despite the popularity of Shapley Values in explaining neural text classification models, computing them is prohibitive for large pretrained models due to a large number of model evaluations. In practice, Shapley Values are often estimated with a small number of stochastic model evaluations. However, we show that the estimated Shapley Values are sensitive to random seed choices -- the top-ranked features often have little overlap across different seeds, especially on examples with longer input texts. This can only be mitigated by aggregating thousands of model evaluations, which on the other hand, induces substantial computational overheads. To mitigate the trade-off between stability and efficiency, we develop an amortized model that directly predicts each input feature's Shapley Value without additional model evaluations. It is trained on a set of examples whose Shapley Values are estimated from a large number of model evaluations to ensure stability. Experimental results on two text classification datasets demonstrate that our amortized model estimates Shapley Values accurately with up to 60 times speedup compared to traditional methods. Furthermore, the estimated values are stable as the inference is deterministic. We release our code at https://github.com/yangalan123/Amortized-Interpretability.

Submitted to arXiv on 31 May. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2305.19998v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

In this paper, the authors address the challenge of computing Shapley Values for large pretrained models in neural text classification. They highlight that while Shapley Values are popular for explaining these models, their computation is time-consuming due to the large number of model evaluations required. To overcome this limitation, the authors propose an amortized model that directly predicts each input feature's Shapley Value without additional model evaluations. The authors demonstrate that estimated Shapley Values are sensitive to random seed choices, especially for examples with longer input texts. This lack of stability can only be mitigated by aggregating thousands of model evaluations, which introduces significant computational overheads. In contrast, their proposed amortized model provides stable estimations as the inference is deterministic. Experimental results on two text classification datasets show that the amortized model accurately estimates Shapley Values with a substantial speedup compared to traditional methods. The computation time per instance is reduced from about 3.47 seconds to less than 50 milliseconds, resulting in a 60 times faster process. Furthermore, the authors evaluate the functionality of their model by examining the quality of explanations in downstream tasks such as feature selection and domain calibration. They compare their method with the computationally expensive KernelSHAP (KS) method and demonstrate superior performance. Overall, this paper presents an efficient and effective approach for estimating Shapley Values in neural text classification models. The proposed amortized model significantly reduces computation time while providing stable estimations, making it a valuable tool for interpreting and understanding these models more effectively and efficiently than before.

- Authors address the challenge of computing Shapley Values for large pretrained models in neural text classification
- Computation of Shapley Values is time-consuming due to the large number of model evaluations required
- Authors propose an amortized model that directly predicts each input feature's Shapley Value without additional model evaluations
- Estimated Shapley Values are sensitive to random seed choices, especially for examples with longer input texts
- Proposed amortized model provides stable estimations as the inference is deterministic
- Experimental results show that the amortized model accurately estimates Shapley Values with a substantial speedup compared to traditional methods (60 times faster process)
- Functionality of the model evaluated by examining quality of explanations in downstream tasks such as feature selection and domain calibration
- Model compared with computationally expensive KernelSHAP (KS) method and demonstrates superior performance
- Overall, paper presents an efficient and effective approach for estimating Shapley Values in neural text classification models

The authors of a paper are trying to solve a problem in understanding how important different parts of a text are for making predictions. They found that it takes a long time to calculate these importance values because they have to test the model many times. The authors came up with a new way to calculate these values faster by predicting them directly. They also found that the results can change depending on random choices, but their new method gives more consistent results. They tested their method and it worked well, giving accurate results much faster than other methods. Overall, their approach is a good way to understand which parts of a text are important for making predictions. - Pretrained models: Models that have been trained on lots of data before being used. - Neural text classification: Using neural networks (a type of computer program) to classify or categorize pieces of text. - Shapley Values: Numbers that show how important different parts of something are. - Amortized model: A new way of calculating something that is faster than older methods. - Estimations: Guesses or approximations based on limited information. - Inference: Making conclusions or predictions based on evidence or information. - Deterministic: Always producing the same result given the same inputs. - Downstream tasks: Other things you can do with the results from an experiment or calculation. - Feature selection: Choosing which parts or features are most important for making predictions. - Domain calibration: Adjusting the model's predictions so they work well in specific situations.

Explaining Neural Text Classification Models with Amortized Shapley Values

Interpreting the decisions of machine learning models is an important task for understanding their behavior and improving their performance. In recent years, a popular approach to explain these models has been through the use of Shapley Values (SV). SV are widely used in game theory and provide a way to fairly distribute rewards among players based on their contributions. Similarly, SV can be used to quantify the contribution of each input feature towards a model's prediction. However, computing SV for large pretrained neural text classification models presents several challenges due to its time-consuming nature. This is because it requires thousands of model evaluations which introduces significant computational overheads. To address this limitation, researchers from Carnegie Mellon University proposed an amortized model that directly predicts each input feature's Shapley Value without additional model evaluations. Their paper titled “Explaining Neural Text Classification Models with Amortized Shapley Values” was published in 2020 at NeurIPS conference.

Background

Shapley values are widely used for explaining predictions made by machine learning models such as deep neural networks (DNNs). They measure how much each input feature contributes towards a given output by quantifying its marginal effect on the prediction score when added or removed from the data instance being evaluated. As such, they provide valuable insights into how features interact with each other and influence a model’s decision making process. In particular, SV have become increasingly popular for interpreting DNNs trained on natural language processing tasks such as text classification where they can help identify which words or phrases contribute most towards a given prediction outcome. However, computing SV for large pretrained models is computationally expensive due to the large number of required model evaluations which makes it difficult to scale up in practice.

Proposed Methodology

To overcome this limitation, the authors propose an amortized model that directly predicts each input feature's Shapley Value without additional model evaluations while providing stable estimations as inference is deterministic rather than stochastic like traditional methods which rely heavily on random seed choices especially when dealing with examples containing longer texts resulting in lack of stability in results obtained from them . The proposed method consists of two components: 1) A recurrent neural network (RNN) encoder that takes an input sequence and produces embeddings; 2) An MLP regressor that takes these embeddings as inputs and outputs estimated SV scores corresponding to individual features within the sequence . The authors demonstrate that their proposed method accurately estimates SV scores with substantial speedup compared to traditional methods reducing computation time per instance from about 3 seconds 47 milliseconds down to less than 50 milliseconds resulting in 60 times faster process overall . Furthermore , they evaluate functionality of their method by examining quality of explanations provided by it downstream tasks such as feature selection and domain calibration comparing it against computationally expensive KernelSHAP (KS) method demonstrating superior performance .

Experimental Results

The authors conducted experiments on two text classification datasets: IMDB movie reviews dataset consisting 25000 positive/negative reviews split into train/test sets; Yelp restaurant reviews dataset consisting 50000 positive/negative reviews split into train/test sets . For both datasets , they compare estimated SV scores produced using amortized vs KS methods finding similar results across all metrics indicating accuracy & reliability offered by former while significantly outperforming latter terms computational efficiency & scalability . Additionally , they also analyze impact different hyperparameters have on accuracy & stability estimations generated using amortized approach concluding best results were achieved when RNN encoder had 2 layers & MLP regressor had 4 layers respectively .

Conclusion

Overall , this paper presents efficient & effective approach estimating Shapley values neural text classification models enabling practitioners interpret understand them more effectively efficiently before . Proposed amortized model reduces computation time significantly while providing stable estimations making valuable tool interpreting understanding these models better than ever before

Created on 03 Jul. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

57.3%

An empirical study of the effect of background data size on the stability of …

cs.LG

51.0%

Enlarging Instance-specific and Class-specific Information for Open-set Actio…

cs.CV

50.5%

Efficiently Scaling Transformer Inference

cs.LG

49.5%

Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Ke…

cs.SD

48.7%

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

cs.LG

48.3%

Eliminating Sentiment Bias for Aspect-Level Sentiment Classification with Uns…

cs.CL

48.3%

TextMI: Textualize Multimodal Information for Integrating Non-verbal Cues in …

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.