Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations

AI-generated keywords: Large Language Models (LLMs) Sparse Autoencoders (SAEs) Jacobian SAEs (JSAEs) Computational Sparsity Interpretability

AI-generated Key Points

  • Researchers aimed to enhance understanding of computations performed by LLMs beyond just their representations
  • Introduced Jacobian Sparse Autoencoders (JSAEs) to induce sparsity in input and output activations as well as computational connections between them
  • Devised an efficient method for computing Jacobians in LLMs, enabling considerable computational sparsity without compromising performance
  • JSAEs identified semantically meaningful computational units within LLMs, such as phrases like "this text is in German"
  • Pre-trained LLMs exhibit significantly more sparse computational connections compared to randomly initialized transformers, indicating learned property during training
  • JSAEs offer potential for better understanding transformer operations and enhancing interpretability of deep learning models
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Lucy Farnik, Tim Lawson, Conor Houghton, Laurence Aitchison

License: CC BY 4.0

Abstract: Sparse autoencoders (SAEs) have been successfully used to discover sparse and human-interpretable representations of the latent activations of LLMs. However, we would ultimately like to understand the computations performed by LLMs and not just their representations. The extent to which SAEs can help us understand computations is unclear because they are not designed to "sparsify" computations in any sense, only latent activations. To solve this, we propose Jacobian SAEs (JSAEs), which yield not only sparsity in the input and output activations of a given model component but also sparsity in the computation (formally, the Jacobian) connecting them. With a na\"ive implementation, the Jacobians in LLMs would be computationally intractable due to their size. One key technical contribution is thus finding an efficient way of computing Jacobians in this setup. We find that JSAEs extract a relatively large degree of computational sparsity while preserving downstream LLM performance approximately as well as traditional SAEs. We also show that Jacobians are a reasonable proxy for computational sparsity because MLPs are approximately linear when rewritten in the JSAE basis. Lastly, we show that JSAEs achieve a greater degree of computational sparsity on pre-trained LLMs than on the equivalent randomized LLM. This shows that the sparsity of the computational graph appears to be a property that LLMs learn through training, and suggests that JSAEs might be more suitable for understanding learned transformer computations than standard SAEs.

Submitted to arXiv on 25 Feb. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2502.18147v2

In the study "Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations," researchers aimed to enhance the understanding of computations performed by beyond just their representations. While have been effective in uncovering sparse and interpretable latent activations of LLMs, they do not specifically target sparsity in computations. To address this limitation, the researchers introduced , which not only induce sparsity in input and output activations but also in the computational connections () between them. One significant technical contribution of this research was devising an efficient method for computing Jacobians in LLMs, as traditional approaches would be computationally prohibitive due to their size. The results demonstrated that JSAEs can achieve a considerable degree of computational sparsity while maintaining LLM performance comparable to conventional SAEs. Moreover, the study showed that serve as a reasonable proxy for computational sparsity, particularly evident in Multilayer Perceptrons (MLPs) when expressed in the JSAE basis. By analyzing "max-activating" examples of JSAEs, the researchers verified that these models can identify semantically meaningful computational units within LLMs. For instance, specific output SAE latents were found to correspond to phrases such as "this text is in German," computed based on input latents representing tokens common in German text or related to historical events like the Third Reich. Furthermore, comparisons with randomly initialized transformers revealed that pre-trained LLMs exhibit significantly more sparse , indicating that computational sparsity is a learned property during training. This contrasts with previous findings showing similar interpretability scores for SAEs on random and pre-trained transformers. The study also provided insights into how JSAEs extract information about complex learned computations and highlighted their potential for understanding transformer operations better than standard SAEs. In conclusion, "Jacobian Sparse Autoencoders" presents a novel approach to enhancing interpretability and understanding of computations within LLMs through sparsity-inducing techniques like JSAEs. The findings underscore the importance of considering not just representations but also the underlying computations in advancing our comprehension of deep learning models.
Created on 23 Jul. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.