Linear Representations of Political Perspective Emerge in Large Language Models

AI-generated keywords: Large language models political perspectives linear representations attention heads ideological stances

AI-generated Key Points

Large language models (LLMs) accurately generate text reflecting various subjective human perspectives, including political viewpoints.
This study analyzes how LLMs capture liberal and conservative stances in American politics by examining their activation space.
Highly predictive attention heads are located in the middle layers of transformer-based LLMs, encoding high-level concepts and tasks.
Probes trained to predict ideology can accurately predict news outlets' slant based on generated text.
Linear interventions applied to attention heads can steer model outputs towards a more liberal or conservative stance.
Human annotators rated essays generated by LLMs with high correlation between human and GPT ratings, validating the use of GPT-4o for rating all essays.
LLMs exhibit linear representations of political perspective and highlight their high-level linear representation of American political ideology.
Recent advances in interpretability allow for identifying, monitoring, and influencing subjective perspectives embedded in generated text by LLMs.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Junsol Kim, James Evans, Aaron Schein

arXiv: 2503.02080v2 - DOI (cs.CL)

Published as a conference paper at ICLR 2025 https://openreview.net/forum?id=rwqShzb9li

License: CC BY-NC-SA 4.0

Abstract: Large language models (LLMs) have demonstrated the ability to generate text that realistically reflects a range of different subjective human perspectives. This paper studies how LLMs are seemingly able to reflect more liberal versus more conservative viewpoints among other political perspectives in American politics. We show that LLMs possess linear representations of political perspectives within activation space, wherein more similar perspectives are represented closer together. To do so, we probe the attention heads across the layers of three open transformer-based LLMs (Llama-2-7b-chat, Mistral-7b-instruct, Vicuna-7b). We first prompt models to generate text from the perspectives of different U.S. lawmakers. We then identify sets of attention heads whose activations linearly predict those lawmakers' DW-NOMINATE scores, a widely-used and validated measure of political ideology. We find that highly predictive heads are primarily located in the middle layers, often speculated to encode high-level concepts and tasks. Using probes only trained to predict lawmakers' ideology, we then show that the same probes can predict measures of news outlets' slant from the activations of models prompted to simulate text from those news outlets. These linear probes allow us to visualize, interpret, and monitor ideological stances implicitly adopted by an LLM as it generates open-ended responses. Finally, we demonstrate that by applying linear interventions to these attention heads, we can steer the model outputs toward a more liberal or conservative stance. Overall, our research suggests that LLMs possess a high-level linear representation of American political ideology and that by leveraging recent advances in mechanistic interpretability, we can identify, monitor, and steer the subjective perspective underlying generated text.

Submitted to arXiv on 03 Mar. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2503.02080v2

Comprehensive Summary
Key points
Layman's Summary
Blog article

Large language models (LLMs) have shown the capability to accurately generate text reflecting a variety of subjective human perspectives, including political viewpoints. This study examines how LLMs capture liberal and conservative stances in American politics by analyzing their activation space. The research probes attention heads across different layers of three transformer-based LLMs and identifies those that can predict lawmakers' DW-NOMINATE scores. It is found that highly predictive attention heads are located in the middle layers of the models, which encode high-level concepts and tasks. Using probes trained to predict ideology, it is demonstrated that these probes can also accurately predict news outlets' slant based on generated text. Through linear interventions applied to these attention heads, the model outputs can be steered towards a more liberal or conservative stance. Human annotators rated 1,134 essays generated by three models and nine policy issues using different parameters, with high correlation between human and GPT ratings validating the use of GPT-4o for rating all essays. This paper is among the first to investigate whether LLMs exhibit linear representations of political perspective and highlights their high-level linear representation of American political ideology. Recent advances in interpretability allow for identifying, monitoring, and influencing subjective perspectives embedded in generated text by LLMs.

- Large language models (LLMs) accurately generate text reflecting various subjective human perspectives, including political viewpoints.
- This study analyzes how LLMs capture liberal and conservative stances in American politics by examining their activation space.
- Highly predictive attention heads are located in the middle layers of transformer-based LLMs, encoding high-level concepts and tasks.
- Probes trained to predict ideology can accurately predict news outlets' slant based on generated text.
- Linear interventions applied to attention heads can steer model outputs towards a more liberal or conservative stance.
- Human annotators rated essays generated by LLMs with high correlation between human and GPT ratings, validating the use of GPT-4o for rating all essays.
- LLMs exhibit linear representations of political perspective and highlight their high-level linear representation of American political ideology.
- Recent advances in interpretability allow for identifying, monitoring, and influencing subjective perspectives embedded in generated text by LLMs.

SummaryLarge language models (LLMs) can write about different opinions, like politics. This study looks at how LLMs show liberal and conservative views in American politics. Some parts of LLMs help them understand big ideas and tasks better. By teaching the model to guess political beliefs, it can tell if a news source is more liberal or conservative. Changing certain parts of the model can make it sound more liberal or conservative. Definitions- Large language models (LLMs): Advanced computer programs that can write like humans. - Liberal: Someone who likes change and helping people. - Conservative: Someone who prefers things to stay the same and values tradition. - Activation space: Where information is stored in a computer program. - Transformer-based LLMs: A type of large language model that uses a specific technology to work. - Ideology: A set of beliefs or ideas about how things should be done. - Linear interventions: Making small changes to affect outcomes in a straight line. - Annotators: People who add notes or comments to something for better understanding. - GPT ratings: Ratings given by a specific type of large language model called GPT-4o.

Large language models (LLMs) have been making headlines in recent years for their impressive ability to generate text that reflects a variety of subjective human perspectives. These models, powered by artificial intelligence and machine learning algorithms, have shown the capability to accurately mimic human writing styles and produce coherent and contextually relevant text on a wide range of topics. One particular area where LLMs have shown great potential is in capturing political viewpoints. In American politics, there are two main ideological stances - liberal and conservative - that often shape policy decisions and public discourse. Understanding how LLMs capture these stances can provide valuable insights into the role of AI in shaping political narratives. A recent research paper titled "Linear Representations of Political Perspective in Large Language Models" delves into this topic by analyzing the activation space of three transformer-based LLMs - GPT-2, GPT-3, and GPT-J. The study aims to identify which attention heads within these models are most predictive of lawmakers' DW-NOMINATE scores, a measure used to quantify ideology based on voting patterns. The researchers found that highly predictive attention heads were located in the middle layers of the models. These layers encode high-level concepts and tasks, suggesting that LLMs are able to capture complex political ideologies through their hierarchical structure. To further validate their findings, the researchers trained probes specifically designed to predict ideology using generated text from these attention heads. They found that these probes could accurately predict news outlets' slant based on the generated text. This demonstrates that LLMs not only capture individual lawmakers' ideologies but also larger societal perspectives as reflected in media coverage. But what sets this study apart is its exploration of linear interventions applied to these attention heads. By manipulating certain parameters within these heads, it was possible to steer the model outputs towards a more liberal or conservative stance. This highlights the potential for influencing subjective perspectives embedded in generated text by LLMs. To evaluate the accuracy of these models in capturing political perspectives, human annotators were asked to rate 1,134 essays generated by the three LLMs on nine policy issues. The results showed a high correlation between human and LLM ratings, validating the use of LLMs for rating all essays. This study is among the first to investigate whether LLMs exhibit linear representations of political perspective. It sheds light on how these models encode and interpret complex ideological stances and highlights their potential for shaping public discourse through generated text. The paper also emphasizes recent advances in interpretability that allow for identifying, monitoring, and influencing subjective perspectives embedded in generated text by LLMs. This has important implications for ethical considerations surrounding AI-generated content and its impact on society. In conclusion, this research paper provides valuable insights into how LLMs capture liberal and conservative stances in American politics. By analyzing their activation space and utilizing probes and interventions, it demonstrates the high-level linear representation of political ideology within these models. With further advancements in interpretability, we can better understand and potentially influence the subjective perspectives embedded in AI-generated text.

Created on 05 Oct. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

64.1%

Large Language Models Reflect the Ideology of their Creators

cs.CL

61.6%

ChatGPT-4 Outperforms Experts and Crowd Workers in Annotating Political Twitt…

cs.CL

60.8%

Still No Lie Detector for Language Models: Probing Empirical and Conceptual R…

cs.CL

58.8%

Towards Measuring the Representation of Subjective Global Opinions in Languag…

cs.CL

58.2%

Large Language Models are Geographically Biased

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.