Created:
Modified:

gradient descent

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

Links to this note

Lyapunov function

Used in analyzing the stability of an equilibrium of a dynamical system. A Lyapunov function is a scalar-valued function of the state space…

gradient clipping

Why do we clip gradients in deep learning? When is it important and what is the right way to do it? It seems like the standard recipe used…

diffusion model

Diffusion models for image generation were independently invented at least twice: in a discrete-time variational inference framework…

fixed point

We say that is a fixed point of an update rule if . Update rules can often (though not necessarily) be seen as defining an…

deep deterministic policy gradient

Deep deterministic policy gradient (DDPG) is an interesting RL algorithm with a somewhat misleading name. Although its name indicates that…

generative flow network

Many objects can be generated by a sequence of actions. For example: Generating language by adding one word at a time Generating a molecule…

fast weights

On an evolutionary timescale, it's useful to evolve structures that can learn quickly. The nervous system is an evolved organ system for…

ill-conditioned

Multiple senses: An 'ill-conditioned matrix' has a large ratio between its largest and smallest eigenvalue (more generally, see what is a…

natural gradient

We don't typically think of it this way, but you can derive a [ gradient descent ] step as finding the point that minimizes a linearized…