On the Robustness of Explanations of Deep Neural Network Models: A Survey

AI-generated keywords: Explainability Deep Neural Network (DNN) Attributional Attack Robustness Responsible Use

AI-generated Key Points

⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

Responsible and trustworthy use of machine learning models requires explainability
Deep Neural Network (DNN) models are increasingly used in risk-sensitive and safety-critical domains
Many methods have been proposed to explain the decisions made by DNN models
However, explanations can be distorted or attacked by minor input perturbations
There has been no effort to assimilate the different methods and metrics proposed to study the robustness of explanations of DNN models
The paper titled "On the Robustness of Explanations of Deep Neural Network Models: A Survey" presents a comprehensive survey of methods that study, understand, attack, and defend explanations of DNN models
The paper also provides a detailed review of different metrics used to evaluate explanation methods while describing attributional attack and defense methods
The authors conclude with lessons and takeaways for the community towards ensuring robust explanations of DNN model predictions

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Amlan Jyoti, Karthik Balaji Ganesh, Manoj Gayala, Nandita Lakshmi Tunuguntla, Sandesh Kamath, Vineeth N Balasubramanian

arXiv: 2211.04780v1 - DOI (cs.LG)

Under Review ACM Computing Surveys "Special Issue on Trustworthy AI"

License: CC BY-NC-ND 4.0

Abstract: Explainability has been widely stated as a cornerstone of the responsible and trustworthy use of machine learning models. With the ubiquitous use of Deep Neural Network (DNN) models expanding to risk-sensitive and safety-critical domains, many methods have been proposed to explain the decisions of these models. Recent years have also seen concerted efforts that have shown how such explanations can be distorted (attacked) by minor input perturbations. While there have been many surveys that review explainability methods themselves, there has been no effort hitherto to assimilate the different methods and metrics proposed to study the robustness of explanations of DNN models. In this work, we present a comprehensive survey of methods that study, understand, attack, and defend explanations of DNN models. We also present a detailed review of different metrics used to evaluate explanation methods, as well as describe attributional attack and defense methods. We conclude with lessons and take-aways for the community towards ensuring robust explanations of DNN model predictions.

Submitted to arXiv on 09 Nov. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2211.04780v1

⚠This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

Comprehensive Summary
Key points
Layman's Summary
Blog article

The responsible and trustworthy use of machine learning models requires explainability, which has been widely recognized as a cornerstone. With the increasing use of Deep Neural Network (DNN) models in risk-sensitive and safety-critical domains, many methods have been proposed to explain the decisions made by these models. However, recent years have seen concerted efforts that demonstrate how such explanations can be distorted or attacked by minor input perturbations. While there have been many surveys that review explainability methods themselves, there has been no effort hitherto to assimilate the different methods and metrics proposed to study the robustness of explanations of DNN models. In this work titled "On the Robustness of Explanations of Deep Neural Network Models: A Survey," authors Amlan Jyoti, Karthik Balaji Ganesh, Manoj Gayala, Nandita Lakshmi Tunuguntla, Sandesh Kamath, and Vineeth N Balasubramanian present a comprehensive survey of methods that study, understand, attack, and defend explanations of DNN models. The paper also provides a detailed review of different metrics used to evaluate explanation methods while describing attributional attack and defense methods. The authors conclude with lessons and takeaways for the community towards ensuring robust explanations of DNN model predictions. Overall, this work highlights the importance of understanding the robustness of explanations provided by DNN models to ensure their responsible and trustworthy use in various domains.

- Responsible and trustworthy use of machine learning models requires explainability
- Deep Neural Network (DNN) models are increasingly used in risk-sensitive and safety-critical domains
- Many methods have been proposed to explain the decisions made by DNN models
- However, explanations can be distorted or attacked by minor input perturbations
- There has been no effort to assimilate the different methods and metrics proposed to study the robustness of explanations of DNN models
- The paper titled "On the Robustness of Explanations of Deep Neural Network Models: A Survey" presents a comprehensive survey of methods that study, understand, attack, and defend explanations of DNN models
- The paper also provides a detailed review of different metrics used to evaluate explanation methods while describing attributional attack and defense methods
- The authors conclude with lessons and takeaways for the community towards ensuring robust explanations of DNN model predictions

1. Machine learning models need to be explained in a way that is responsible and trustworthy. 2. Deep Neural Network (DNN) models are used in important areas where safety is crucial. 3. There are different ways to explain the decisions made by DNN models. 4. Sometimes explanations can be changed or attacked by small changes to the input. 5. A new paper has reviewed many methods for studying and defending explanations of DNN models. Definitions- Responsible: doing things in a way that is safe and good for others - Trustworthy: being someone or something that can be relied on - Machine learning: using computers to learn from data and make predictions or decisions - Explainability: being able to understand how a machine learning model makes its decisions - Deep Neural Network (DNN): a type of machine learning model that is modeled after the human brain - Robustness: being strong enough to resist attacks or changes - Metrics: ways of measuring something, like accuracy or speed - Attributional attack/defense methods: ways of testing whether an explanation for a decision made by a machine learning model is accurate

Understanding the Robustness of Explanations for Deep Neural Network Models: A Survey

Deep Neural Networks (DNNs) are increasingly being used in risk-sensitive and safety-critical domains, prompting the need to understand their decisions. Explainability has been widely recognized as a cornerstone of responsible and trustworthy use of machine learning models, yet recent years have seen concerted efforts that demonstrate how explanations can be distorted or attacked by minor input perturbations. In this article, we will review a survey paper titled “On the Robustness of Explanations of Deep Neural Network Models: A Survey” by Amlan Jyoti et al., which provides an overview of methods that study, understand, attack, and defend explanations provided by DNN models.

Explainability Methods

The authors begin with an overview of explainability methods for DNN models. These include attributional approaches such as Gradient-weighted Class Activation Mapping (Grad-CAM), Layerwise Relevance Propagation (LRP), Integrated Gradients (IG), and SmoothGrad; model agnostic approaches such as Local Interpretable Model-agnostic Explanations (LIME); counterfactual explanation techniques; and post hoc explanation techniques such as Anchors. The authors also discuss metrics used to evaluate these methods including fidelity metrics like mean absolute error (MAE) and structural similarity index measure (SSIM); interpretability metrics like feature importance scores; robustness metrics like adversarial accuracy score; and trustworthiness metrics like trust score.

Attributional Attack & Defense Methods

Next, the authors review different attributional attack & defense methods for DNN models. Attributional attacks involve manipulating inputs to change the output prediction while preserving its correctness from the model's perspective. Examples include saliency masking attacks where parts of an image are masked out so that they do not contribute to the final prediction or gradient masking attacks where gradients are manipulated to change attributions without changing predictions. On the other hand, attributional defenses involve making changes in either training or inference time so that attributions remain unchanged even when inputs are manipulated slightly. Examples include regularization techniques such as weight decay or data augmentation strategies such as adding noise to images during training time so that they become more resilient against manipulation at inference time.

Conclusion & Takeaways

In conclusion, this survey paper provides a comprehensive overview of explainability methods for DNN models along with different attack & defense strategies aimed at ensuring robust explanations for them in various domains. It also reviews various evaluation metrics used to assess these methods including fidelity measures like MAE/SSIM; interpretability measures like feature importance scores; robustness measures like adversarial accuracy score; and trustworthiness measures like trust score. Finally, it offers several lessons learned from existing research on this topic along with takeaways for future work towards ensuring responsible use of machine learning models through reliable explanations for their decisions.

Created on 11 Apr. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

⚠The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Similar papers summarized with our AI tools

72.9%

MEMO: Test Time Robustness via Adaptation and Augmentation

cs.LG

70.8%

LMExplainer: a Knowledge-Enhanced Explainer for Language Models

cs.CL

70.0%

Toward an understanding of the properties of neural network approaches for su…

astro-ph.IM

69.6%

Architectural Backdoors in Neural Networks

cs.LG

68.4%

Large language models effectively leverage document-level context for literar…

cs.CL

67.8%

When Spectral Modeling Meets Convolutional Networks: A Method for Discovering…

astro-ph.GA

67.8%

Analysis of Deep Learning Architectures and Efficacy of Detecting Forest Fires

cs.CV

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.