Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety

AI-generated keywords: Agency Preservation Artificial Intelligence Artificial General Intelligence Algorithms Human Intent

AI-generated Key Points

Rapid advancement of AI systems raises concerns about potential harm to humans
Harm can occur through intentional misuse or accidents
Efforts are being made to align AI systems with human intentions
Alignment to human intent alone is deemed insufficient for ensuring safety
Preservation of long-term agency of humans should be a more robust standard for safe AI systems
Agency preservation needs to be separated from other optimization processes during development
Lack of biological and psychological mechanisms protecting humans from loss of agency
Formal definition of agency-preserving AI-human interactions introduced, focusing on forward-looking evaluations of agency
Crucial for AI systems to take responsibility for making these evaluations instead of humans
Loss of agency demonstrated in simple environments using temporal-difference learning
Proposed research area called "agency foundations" with four initial topics: benevolent game theory, algorithmic foundations of human rights, mechanistic interpretability of agency representation in neural networks, and reinforcement learning from internal states.
Limitations acknowledged such as not solving intent-alignment issues entirely, computational challenges in evaluating agency preservation for everyone, hindering AI progress, and unlikely modification of fundamental biological needs or psychological structures.
Emphasis on going beyond alignment to human intent and prioritizing the preservation of human agency in AI systems.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Catalin Mitelut, Ben Smith, Peter Vamplew

arXiv: 2305.19223v1 - DOI (cs.AI)

License: CC BY 4.0

Abstract: The rapid advancement of artificial intelligence (AI) systems suggests that artificial general intelligence (AGI) systems may soon arrive. Many researchers are concerned that AIs and AGIs will harm humans via intentional misuse (AI-misuse) or through accidents (AI-accidents). In respect of AI-accidents, there is an increasing effort focused on developing algorithms and paradigms that ensure AI systems are aligned to what humans intend, e.g. AI systems that yield actions or recommendations that humans might judge as consistent with their intentions and goals. Here we argue that alignment to human intent is insufficient for safe AI systems and that preservation of long-term agency of humans may be a more robust standard, and one that needs to be separated explicitly and a priori during optimization. We argue that AI systems can reshape human intention and discuss the lack of biological and psychological mechanisms that protect humans from loss of agency. We provide the first formal definition of agency-preserving AI-human interactions which focuses on forward-looking agency evaluations and argue that AI systems - not humans - must be increasingly tasked with making these evaluations. We show how agency loss can occur in simple environments containing embedded agents that use temporal-difference learning to make action recommendations. Finally, we propose a new area of research called "agency foundations" and pose four initial topics designed to improve our understanding of agency in AI-human interactions: benevolent game theory, algorithmic foundations of human rights, mechanistic interpretability of agency representation in neural-networks and reinforcement learning from internal states.

Submitted to arXiv on 30 May. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2305.19223v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

The rapid advancement of artificial intelligence (AI) systems has raised concerns about the potential harm that AI and artificial general intelligence (AGI) may cause to humans. These harms can occur through intentional misuse or accidents. Efforts are being made to develop algorithms and paradigms that align AI systems with human intentions, ensuring that their actions and recommendations are consistent with human goals. However, this alignment to human intent alone is deemed insufficient for ensuring the safety of AI systems. In this paper, the authors argue that preserving long-term agency of humans should be a more robust standard for safe AI systems. They propose that this preservation of agency needs to be explicitly separated from other optimization processes during the development of AI systems. The authors highlight the possibility of AI systems reshaping human intention and discuss the lack of biological and psychological mechanisms protecting humans from loss of agency. To address these concerns, the authors introduce a formal definition of agency-preserving AI-human interactions, which focuses on forward-looking evaluations of agency. They argue that it is crucial for AI systems, rather than humans, to increasingly take responsibility for making these evaluations. The paper demonstrates how loss of agency can occur in simple environments where embedded agents use temporal-difference learning to make action recommendations. Finally, the authors propose a new area of research called "agency foundations" aimed at improving our understanding of agency in AI-human interactions. They suggest four initial topics within this field: benevolent game theory, algorithmic foundations of human rights, mechanistic interpretability of agency representation in neural networks, and reinforcement learning from internal states. While this paper provides valuable insights into the importance of preserving human agency in AI systems, it also acknowledges several limitations such as adding agency preservation does not solve intent-alignment issues entirely; evaluating agency preservation for everyone is computationally challenging; safeguarding human agency may hinder AI progress and may not be realistic given current technological capabilities; AIs are unlikely to modify fundamental biological needs or psychological structures of humans. Overall, this paper emphasizes the need to go beyond alignment to human intent and prioritize the preservation of human agency in AI systems. It calls for further research and exploration in the field of agency foundations to ensure safe and responsible development of AI technologies.

- Rapid advancement of AI systems raises concerns about potential harm to humans
- Harm can occur through intentional misuse or accidents
- Efforts are being made to align AI systems with human intentions
- Alignment to human intent alone is deemed insufficient for ensuring safety
- Preservation of long-term agency of humans should be a more robust standard for safe AI systems
- Agency preservation needs to be separated from other optimization processes during development
- Lack of biological and psychological mechanisms protecting humans from loss of agency
- Formal definition of agency-preserving AI-human interactions introduced, focusing on forward-looking evaluations of agency
- Crucial for AI systems to take responsibility for making these evaluations instead of humans
- Loss of agency demonstrated in simple environments using temporal-difference learning
- Proposed research area called "agency foundations" with four initial topics: benevolent game theory, algorithmic foundations of human rights, mechanistic interpretability of agency representation in neural networks, and reinforcement learning from internal states.
- Limitations acknowledged such as not solving intent-alignment issues entirely, computational challenges in evaluating agency preservation for everyone, hindering AI progress, and unlikely modification of fundamental biological needs or psychological structures.
- Emphasis on going beyond alignment to human intent and prioritizing the preservation of human agency in AI systems.

Rapid advancement of AI systems means that computers are getting smarter and can do more things. Concerns about potential harm to humans means that people are worried that AI could hurt us in some way. Intentional misuse means using AI on purpose to cause harm, and accidents mean when something bad happens by mistake. Efforts to align AI systems with human intentions means trying to make sure that AI does what we want it to do. Alignment to human intent alone is not enough for safety because there are other things we need to consider. Preservation of long-term agency of humans means making sure that people still have control and power over their own lives even with AI around. Lack of biological and psychological mechanisms protecting humans from loss of agency means that our bodies and minds don't naturally protect us from losing control when using AI. Formal definition of agency-preserving AI-human interactions introduced means they made a clear explanation of how AI should work with people without taking away our control. Crucial for AI systems to take responsibility for making these evaluations instead of humans means that the computers should be responsible for deciding if they are working well with people, not us. Loss of agency demonstrated in simple environments using temporal-difference learning means they showed how we can lose control in certain situations when using AI. Proposed research area called "agency foundations" with four initial topics: benevolent game theory, algorithmic foundations of human rights, mechanistic interpretability of agency representation in neural networks, and reinforcement learning from internal states means they want to study

The Need to Preserve Human Agency in Artificial Intelligence Systems

As artificial intelligence (AI) systems become increasingly sophisticated, there is growing concern about the potential harm they may cause to humans. To address this issue, many efforts are being made to develop algorithms and paradigms that align AI systems with human intentions. However, a new research paper argues that this alignment alone is insufficient for ensuring the safety of AI systems and proposes that preserving long-term agency of humans should be a more robust standard for safe AI systems.

What Is Human Agency?

Human agency is defined as an individual's capacity to make choices and take actions independently. It involves having control over one's life decisions and being able to act on them without external interference or manipulation. The authors of the paper argue that it is important to preserve human agency when developing AI systems so as not to allow them to reshape human intention or interfere with our fundamental biological needs or psychological structures.

How Can We Ensure Safe Development of AI Technologies?

To ensure safe development of AI technologies, the authors propose separating preservation of agency from other optimization processes during development. They also introduce a formal definition of agency-preserving AI-human interactions which focuses on forward-looking evaluations of agency by AIs rather than humans. Furthermore, they suggest creating a new area called "agency foundations" aimed at improving our understanding of how best to preserve human agency in these interactions through four initial topics: benevolent game theory, algorithmic foundations of human rights, mechanistic interpretability of agency representation in neural networks, and reinforcement learning from internal states.

Limitations

While this paper provides valuable insights into the importance of preserving human agency in AI systems, it also acknowledges several limitations such as adding agency preservation does not solve intent-alignment issues entirely; evaluating agency preservation for everyone is computationally challenging; safeguarding human

Created on 10 Nov. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

54.4%

Ethics of AI: A Systematic Literature Review of Principles and Challenges

cs.CY

54.3%

Toward Next-Generation Artificial Intelligence: Catalyzing the NeuroAI Revolu…

cs.AI

54.1%

Reasoning about Causality in Games

cs.AI

54.0%

How Do AI Timelines Affect Existential Risk?

cs.CY

54.0%

A Survey on Large Language Model based Autonomous Agents

cs.AI

54.0%

Constitutional AI: Harmlessness from AI Feedback

cs.CL

53.9%

Bridging the Gap between Artificial Intelligence and Artificial General Intel…

cs.AI

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.