Notes tagged with "alignment": Nonlinear Function

16 notes tagged with "alignment"

AI safety

AI safety, as a term, is sterile and hard to get excited about. Preventing catastrophe is important, but doesn't motivate me, since [ the…

Tagged with: #ai#morality#alignment

Goodhart's law

The law says that: when a measure becomes a target, it ceases to be a good measure . One can distinguish four types of Goodhart problems…

Tagged with: #how-to-think#alignment

cooperative inverse reinforcement learning

References: Cooperative Inverse Reinforcement Learning The Off-Switch Game Incorrigibility in the CIRL Framework The CIRL setting models…

Tagged with: #machine-learning#reinforcement-learning#alignment

deceptive alignment

The idea is that a [ mesa optimizer|mesa-optimizing ] policy with access to sufficient information about the world (e.g., web search) might…

Tagged with: #alignment

embedded agent

Notes on Abram Demski and Scott Garrabrant's sequence on Embedded Agency Embedded Agents : Classic models of rational [ agency ], such as…

Tagged with: #alignment#ai#buddhism

love is value alignment

What does it mean to love someone? Of course this question has as many answers as there are people, and probably more. But here's one view…

Tagged with: #ai#alignment#relationships

objectives are big

A very incomplete and maybe nonsensical intuition I want to explore. Classically, people talk about very simple [ reward ] functions like…

Tagged with: #ai#reinforcement-learning#alignment

ontological crisis

How do we maintain values when our models of the world shift? If someone's goal in life is to "do God's will", and then they come to believe…

Tagged with: #alignment#ai

reward funnel

When thinking about the [ reward ] function for a real-world AI system, there is always some causal process that determines reward. For…

Tagged with: #alignment#reinforcement-learning

reward uncertainty

See also: [ cooperative inverse reinforcement learning ], [ love is value alignment ]

Tagged with: #reinforcement-learning#alignment

reward

stray thoughts about reward functions (probably related to the [ agent ] abstraction and the [ intentional stance ]) one can make a…

Tagged with: #ai#reinforcment-learning#alignment

safe objective

Language is a really natural way to tell AI systems what we want them to do. Some current examples: [ GPT ]-3 and successors (InstructGPT…

Tagged with: #alignment

value aligned language game

Suppose I have an agent that generates text. I want it to generate text that is [ value alignment|aligned ] with human values. Approaches…

Tagged with: #alignment

value alignment

Tagged with: #alignment

value learning

Notes on the Alignment Forum's Value Learning sequence curated by Rohin Shah. ambitious value learning : the idea of learning 'the human…

Tagged with: #alignment

worldly objective

This may be a central point of confusion: how do we define AI systems that have preferences about the real world , so that their goals and…

Tagged with: #alignment#ai#buddhism

See All tags