Modified:
behavioral cloning
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.Links to this note
reinforcement learning from human feedback
see: [ steering language models ], [ direct preference optimization ] We are given a bunch of pairwise preference evaluations, of the form…
direct preference optimization
References: Direct Preference Optimization: Your Language Model is Secretly a Reward Model This seems like a compelling reframing of…
value learning
Notes on the Alignment Forum's Value Learning sequence curated by Rohin Shah. ambitious value learning : the idea of learning 'the human…
simulator AI
References: https://generative.ink/posts/simulators/ It seems pretty clear that the intelligence emerging from [ language model ]s is not…
DAgger
The problem of [ exposure bias ] (where an autoregressive sequence model goes off the rails of its training distribution) comes up as a…
decoding
References: Holtzman et al. (2020), The Curious Case of Neural Text Degeneration https://arxiv.org/abs/1904.09751 How should we actually…
decision transformer
paper: Chen, Lu, et al. 2021, https://arxiv.org/abs/2106.01345 Trajectories are represented as sequences: where is the return-to-go, i.e…
value aligned language game
Suppose I have an agent that generates text. I want it to generate text that is [ value alignment|aligned ] with human values. Approaches…
tractable approximations to utilitarianism
There are three main approaches to moral philosophy: [ utilitarian ]ism: you should feed a starving person because it will increase 'global…
AI research landscape
As of April 2021: Giant [ transformer ]s work better than anyone has a right to expect. GPT3 is fucking amazing. [ DALL-E ] clearly has some…