Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety

AI-generated keywords: Agency Preservation Artificial Intelligence Artificial General Intelligence Algorithms Human Intent

AI-generated Key Points

  • Rapid advancement of AI systems raises concerns about potential harm to humans
  • Harm can occur through intentional misuse or accidents
  • Efforts are being made to align AI systems with human intentions
  • Alignment to human intent alone is deemed insufficient for ensuring safety
  • Preservation of long-term agency of humans should be a more robust standard for safe AI systems
  • Agency preservation needs to be separated from other optimization processes during development
  • Lack of biological and psychological mechanisms protecting humans from loss of agency
  • Formal definition of agency-preserving AI-human interactions introduced, focusing on forward-looking evaluations of agency
  • Crucial for AI systems to take responsibility for making these evaluations instead of humans
  • Loss of agency demonstrated in simple environments using temporal-difference learning
  • Proposed research area called "agency foundations" with four initial topics: benevolent game theory, algorithmic foundations of human rights, mechanistic interpretability of agency representation in neural networks, and reinforcement learning from internal states.
  • Limitations acknowledged such as not solving intent-alignment issues entirely, computational challenges in evaluating agency preservation for everyone, hindering AI progress, and unlikely modification of fundamental biological needs or psychological structures.
  • Emphasis on going beyond alignment to human intent and prioritizing the preservation of human agency in AI systems.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Catalin Mitelut, Ben Smith, Peter Vamplew

License: CC BY 4.0

Abstract: The rapid advancement of artificial intelligence (AI) systems suggests that artificial general intelligence (AGI) systems may soon arrive. Many researchers are concerned that AIs and AGIs will harm humans via intentional misuse (AI-misuse) or through accidents (AI-accidents). In respect of AI-accidents, there is an increasing effort focused on developing algorithms and paradigms that ensure AI systems are aligned to what humans intend, e.g. AI systems that yield actions or recommendations that humans might judge as consistent with their intentions and goals. Here we argue that alignment to human intent is insufficient for safe AI systems and that preservation of long-term agency of humans may be a more robust standard, and one that needs to be separated explicitly and a priori during optimization. We argue that AI systems can reshape human intention and discuss the lack of biological and psychological mechanisms that protect humans from loss of agency. We provide the first formal definition of agency-preserving AI-human interactions which focuses on forward-looking agency evaluations and argue that AI systems - not humans - must be increasingly tasked with making these evaluations. We show how agency loss can occur in simple environments containing embedded agents that use temporal-difference learning to make action recommendations. Finally, we propose a new area of research called "agency foundations" and pose four initial topics designed to improve our understanding of agency in AI-human interactions: benevolent game theory, algorithmic foundations of human rights, mechanistic interpretability of agency representation in neural-networks and reinforcement learning from internal states.

Submitted to arXiv on 30 May. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2305.19223v1

The rapid advancement of artificial intelligence (AI) systems has raised concerns about the potential harm that AI and artificial general intelligence (AGI) may cause to humans. These harms can occur through intentional misuse or accidents. Efforts are being made to develop algorithms and paradigms that align AI systems with human intentions, ensuring that their actions and recommendations are consistent with human goals. However, this alignment to human intent alone is deemed insufficient for ensuring the safety of AI systems. In this paper, the authors argue that preserving long-term agency of humans should be a more robust standard for safe AI systems. They propose that this preservation of agency needs to be explicitly separated from other optimization processes during the development of AI systems. The authors highlight the possibility of AI systems reshaping human intention and discuss the lack of biological and psychological mechanisms protecting humans from loss of agency. To address these concerns, the authors introduce a formal definition of agency-preserving AI-human interactions, which focuses on forward-looking evaluations of agency. They argue that it is crucial for AI systems, rather than humans, to increasingly take responsibility for making these evaluations. The paper demonstrates how loss of agency can occur in simple environments where embedded agents use temporal-difference learning to make action recommendations. Finally, the authors propose a new area of research called "agency foundations" aimed at improving our understanding of agency in AI-human interactions. They suggest four initial topics within this field: benevolent game theory, algorithmic foundations of human rights, mechanistic interpretability of agency representation in neural networks, and reinforcement learning from internal states. While this paper provides valuable insights into the importance of preserving human agency in AI systems, it also acknowledges several limitations such as adding agency preservation does not solve intent-alignment issues entirely; evaluating agency preservation for everyone is computationally challenging; safeguarding human agency may hinder AI progress and may not be realistic given current technological capabilities; AIs are unlikely to modify fundamental biological needs or psychological structures of humans. Overall, this paper emphasizes the need to go beyond alignment to human intent and prioritize the preservation of human agency in AI systems. It calls for further research and exploration in the field of agency foundations to ensure safe and responsible development of AI technologies.
Created on 10 Nov. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.