Linear Representations of Political Perspective Emerge in Large Language Models
AI-generated Key Points
- Large language models (LLMs) accurately generate text reflecting various subjective human perspectives, including political viewpoints.
- This study analyzes how LLMs capture liberal and conservative stances in American politics by examining their activation space.
- Highly predictive attention heads are located in the middle layers of transformer-based LLMs, encoding high-level concepts and tasks.
- Probes trained to predict ideology can accurately predict news outlets' slant based on generated text.
- Linear interventions applied to attention heads can steer model outputs towards a more liberal or conservative stance.
- Human annotators rated essays generated by LLMs with high correlation between human and GPT ratings, validating the use of GPT-4o for rating all essays.
- LLMs exhibit linear representations of political perspective and highlight their high-level linear representation of American political ideology.
- Recent advances in interpretability allow for identifying, monitoring, and influencing subjective perspectives embedded in generated text by LLMs.
Authors: Junsol Kim, James Evans, Aaron Schein
Abstract: Large language models (LLMs) have demonstrated the ability to generate text that realistically reflects a range of different subjective human perspectives. This paper studies how LLMs are seemingly able to reflect more liberal versus more conservative viewpoints among other political perspectives in American politics. We show that LLMs possess linear representations of political perspectives within activation space, wherein more similar perspectives are represented closer together. To do so, we probe the attention heads across the layers of three open transformer-based LLMs (Llama-2-7b-chat, Mistral-7b-instruct, Vicuna-7b). We first prompt models to generate text from the perspectives of different U.S. lawmakers. We then identify sets of attention heads whose activations linearly predict those lawmakers' DW-NOMINATE scores, a widely-used and validated measure of political ideology. We find that highly predictive heads are primarily located in the middle layers, often speculated to encode high-level concepts and tasks. Using probes only trained to predict lawmakers' ideology, we then show that the same probes can predict measures of news outlets' slant from the activations of models prompted to simulate text from those news outlets. These linear probes allow us to visualize, interpret, and monitor ideological stances implicitly adopted by an LLM as it generates open-ended responses. Finally, we demonstrate that by applying linear interventions to these attention heads, we can steer the model outputs toward a more liberal or conservative stance. Overall, our research suggests that LLMs possess a high-level linear representation of American political ideology and that by leveraging recent advances in mechanistic interpretability, we can identify, monitor, and steer the subjective perspective underlying generated text.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.