Created:
Modified:

behavioral cloning

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

Links to this note

reinforcement learning from human feedback

see: [ steering language models ], [ direct preference optimization ] We are given a bunch of pairwise preference evaluations, of the form…

direct preference optimization

References: Direct Preference Optimization: Your Language Model is Secretly a Reward Model This seems like a compelling reframing of…

value learning

Notes on the Alignment Forum's Value Learning sequence curated by Rohin Shah. ambitious value learning : the idea of learning 'the human…

simulator AI

References: https://generative.ink/posts/simulators/ It seems pretty clear that the intelligence emerging from [ language model ]s is not…

DAgger

The problem of [ exposure bias ] (where an autoregressive sequence model goes off the rails of its training distribution) comes up as a…

decoding

References: Holtzman et al. (2020), The Curious Case of Neural Text Degeneration https://arxiv.org/abs/1904.09751 How should we actually…

decision transformer

paper: Chen, Lu, et al. 2021, https://arxiv.org/abs/2106.01345 Trajectories are represented as sequences: where is the return-to-go, i.e…

value aligned language game

Suppose I have an agent that generates text. I want it to generate text that is [ value alignment|aligned ] with human values. Approaches…

tractable approximations to utilitarianism

There are three main approaches to moral philosophy: [ utilitarian ]ism: you should feed a starving person because it will increase 'global…

AI research landscape

As of April 2021: Giant [ transformer ]s work better than anyone has a right to expect. GPT3 is fucking amazing. [ DALL-E ] clearly has some…