What Does BERT Look At? An Analysis of BERT's Attention

AI-generated keywords: BERT Attention Syntax Coreference Probing

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Authors explore attention mechanisms of large pre-trained neural networks like BERT in NLP
  • Previous studies focused on model outputs and internal vector representations
  • Novel methods proposed for analyzing attention mechanisms of BERT
  • BERT's attention heads exhibit distinct patterns, including delimiter tokens, positional offsets, or attending over entire sentence
  • Heads within the same layer often display similar behaviors
  • Certain attention heads align well with linguistic concepts such as syntax and coreference
  • Attention-based probing classifier used to support analysis
  • BERT's attention contains valuable syntactic information
  • Research expands understanding of how pre-trained neural networks learn from unlabeled data through attention mechanisms
  • Findings demonstrate alignment of BERT's attention with linguistic notions of syntax and coreference
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D. Manning

BlackBoxNLP 2019

Abstract: Large pre-trained neural networks such as BERT have had great recent success in NLP, motivating a growing body of research investigating what aspects of language they are able to learn from unlabeled data. Most recent analysis has focused on model outputs (e.g., language model surprisal) or internal vector representations (e.g., probing classifiers). Complementary to these works, we propose methods for analyzing the attention mechanisms of pre-trained models and apply them to BERT. BERT's attention heads exhibit patterns such as attending to delimiter tokens, specific positional offsets, or broadly attending over the whole sentence, with heads in the same layer often exhibiting similar behaviors. We further show that certain attention heads correspond well to linguistic notions of syntax and coreference. For example, we find heads that attend to the direct objects of verbs, determiners of nouns, objects of prepositions, and coreferent mentions with remarkably high accuracy. Lastly, we propose an attention-based probing classifier and use it to further demonstrate that substantial syntactic information is captured in BERT's attention.

Submitted to arXiv on 11 Jun. 2019

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 1906.04341v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In their paper titled "What Does BERT Look At? An Analysis of BERT's Attention," authors Kevin Clark, Urvashi Khandelwal, Omer Levy, and Christopher D. Manning explore the attention mechanisms of large pre-trained neural networks like BERT in the field of Natural Language Processing (NLP). These networks have achieved significant success in learning from unlabeled data, prompting researchers to investigate the language aspects they can capture. While previous studies have primarily focused on model outputs and internal vector representations, this research proposes novel methods for analyzing the attention mechanisms of pre-trained models, specifically BERT. The authors observe that BERT's attention heads exhibit distinct patterns, including attending to delimiter tokens, specific positional offsets, or broadly attending over the entire sentence. Interestingly, heads within the same layer often display similar behaviors. Moreover, the study demonstrates that certain attention heads align well with linguistic concepts such as syntax and coreference. For example, some heads accurately attend to direct objects of verbs, determiners of nouns, objects of prepositions and coreferent mentions. This finding suggests that BERT's attention captures substantial syntactic information. To further support their analysis, the authors propose an attention-based probing classifier. By utilizing this classifier they provide additional evidence that BERT's attention contains valuable syntactic information. Overall this research expands our understanding of how large pre-trained neural networks like BERT learn from unlabeled data by investigating their attention mechanisms. The findings highlight specific patterns exhibited by BERT's attention heads and demonstrate their alignment with linguistic notions of syntax and coreference. This study contributes to advancing NLP research and sheds light on the capabilities of pre-trained models in capturing important language features through attention mechanisms.
Created on 01 Oct. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.