Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety
AI-generated Key Points
- Rapid advancement of AI systems raises concerns about potential harm to humans
- Harm can occur through intentional misuse or accidents
- Efforts are being made to align AI systems with human intentions
- Alignment to human intent alone is deemed insufficient for ensuring safety
- Preservation of long-term agency of humans should be a more robust standard for safe AI systems
- Agency preservation needs to be separated from other optimization processes during development
- Lack of biological and psychological mechanisms protecting humans from loss of agency
- Formal definition of agency-preserving AI-human interactions introduced, focusing on forward-looking evaluations of agency
- Crucial for AI systems to take responsibility for making these evaluations instead of humans
- Loss of agency demonstrated in simple environments using temporal-difference learning
- Proposed research area called "agency foundations" with four initial topics: benevolent game theory, algorithmic foundations of human rights, mechanistic interpretability of agency representation in neural networks, and reinforcement learning from internal states.
- Limitations acknowledged such as not solving intent-alignment issues entirely, computational challenges in evaluating agency preservation for everyone, hindering AI progress, and unlikely modification of fundamental biological needs or psychological structures.
- Emphasis on going beyond alignment to human intent and prioritizing the preservation of human agency in AI systems.
Authors: Catalin Mitelut, Ben Smith, Peter Vamplew
Abstract: The rapid advancement of artificial intelligence (AI) systems suggests that artificial general intelligence (AGI) systems may soon arrive. Many researchers are concerned that AIs and AGIs will harm humans via intentional misuse (AI-misuse) or through accidents (AI-accidents). In respect of AI-accidents, there is an increasing effort focused on developing algorithms and paradigms that ensure AI systems are aligned to what humans intend, e.g. AI systems that yield actions or recommendations that humans might judge as consistent with their intentions and goals. Here we argue that alignment to human intent is insufficient for safe AI systems and that preservation of long-term agency of humans may be a more robust standard, and one that needs to be separated explicitly and a priori during optimization. We argue that AI systems can reshape human intention and discuss the lack of biological and psychological mechanisms that protect humans from loss of agency. We provide the first formal definition of agency-preserving AI-human interactions which focuses on forward-looking agency evaluations and argue that AI systems - not humans - must be increasingly tasked with making these evaluations. We show how agency loss can occur in simple environments containing embedded agents that use temporal-difference learning to make action recommendations. Finally, we propose a new area of research called "agency foundations" and pose four initial topics designed to improve our understanding of agency in AI-human interactions: benevolent game theory, algorithmic foundations of human rights, mechanistic interpretability of agency representation in neural-networks and reinforcement learning from internal states.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.